116 lines
4.8 KiB
Markdown
116 lines
4.8 KiB
Markdown
|
---
|
|||
|
title: "This conversation totally didn't happen at Microsoft"
|
|||
|
slug: microsoft-git
|
|||
|
date: 2023-12-07T14:00:00+02:00
|
|||
|
---
|
|||
|
|
|||
|
Similarity to real–world events and character names is coincidental.
|
|||
|
|
|||
|
Characters, Microsoft employees:
|
|||
|
|
|||
|
* *Amy:* a high-level executive. Ex-JPMorgan. Pragmatic.
|
|||
|
* *Harry:* an engineer in Developer Services team. His organization
|
|||
|
owns code hosting, developer tools and CI infrastructure. A good listener.
|
|||
|
|
|||
|
# 2015 — the beginning of Git at Microsoft
|
|||
|
|
|||
|
Exchange between Harry and Amy in a parking lot of a chilly Redmond morning:
|
|||
|
|
|||
|
- *Harry*: Amy, our Skype colleagues from Tallinn have been using git since
|
|||
|
2006 and are making fun of us for using perforce in
|
|||
|
2015. Our ex-AWS colleagues take offense, since they know Estonians are
|
|||
|
right. In fact, everyone takes offense, because nobody likes to admit
|
|||
|
Estonians are right. Git is a tad too slow for large repos, preventing quick
|
|||
|
migration. Do you mind if I ask my team to take a look into this?
|
|||
|
- *Amy*: sure, go ahead, Harry. I don't care about version control, do what you
|
|||
|
think is right as long as it works for everyone.
|
|||
|
|
|||
|
Harry starts poking at git to make it work better for larger repositories.
|
|||
|
|
|||
|
# late 2016 — money pressure and GVFS
|
|||
|
|
|||
|
Harry and his team implements partial clone (later renamed to sparse checkout).
|
|||
|
With careful hand-holding, crossed fingers and during a good weather, Visual
|
|||
|
Studio can now load the partially-cloned Windows repository without crashing.
|
|||
|
Excitement grows. Friendly, congratulatory exchanges between Estonians and
|
|||
|
Redmondians take place. Engineers get excited thinking the migration is "soon".
|
|||
|
|
|||
|
Harry and Amy again:
|
|||
|
|
|||
|
- *Amy:* Harry, how's that git thing going? I said I don't care about version
|
|||
|
control, but for some reason I do now.
|
|||
|
- *Harry:* pretty well, why?
|
|||
|
- *Amy:* just curious, what would it take to migrate the whole company to git?
|
|||
|
- *Harry:* the tooling is robust and we are ready to migrate. One last thing
|
|||
|
--- Windows and Office repositories are in the hundreds of gigabytes. About
|
|||
|
50k people will need get their laptops' disks replaced. Oh, and we will kill
|
|||
|
the office network while they download the initial clone. With good planning,
|
|||
|
we should be good in a month or two.
|
|||
|
- *Amy:* sounds like $20 million for the disks and lost productivity while this
|
|||
|
chaos settles down. Any other ideas?
|
|||
|
- *Harry:* our central repositories are in the basement, and the office
|
|||
|
connectivity is quite good. Maybe we can use shallow clones.
|
|||
|
- *Amy:* whatever that means. If it helps, try to make it happen.
|
|||
|
|
|||
|
Harry scrambles to do something about it, creates GVFS. Open sources it.
|
|||
|
Everyone understands it's a temporary solution, so lives wit it. People use
|
|||
|
their git.
|
|||
|
|
|||
|
# 2017 — migration is over and problems with GVFS
|
|||
|
|
|||
|
Migration is over for the last repository. People are complaining about GVFS,
|
|||
|
but at least they are on git. Amy did not spend her political capital on
|
|||
|
procurement, so she is happy.
|
|||
|
|
|||
|
GVFS is open-source, but only sort-of. It requires many Microsoft assumptions
|
|||
|
(e.g. don't even try MacOS), but companies cargo-cult GVFS and struggle with it
|
|||
|
anyway, because it's Microsoft.
|
|||
|
|
|||
|
# 2018 — and later: github acquisition and Scalar
|
|||
|
|
|||
|
Microsoft buys github. Estonians no longer have anything to make fun of, so
|
|||
|
they fall back to poking the flies on their office windows. Harry has an eye on
|
|||
|
replacing GVFS.
|
|||
|
|
|||
|
Harry's team keeps improving git. Rewrites GVFS to C and renames it to
|
|||
|
`scalar`. To take revenge of Estonians, Harry's colleague Theodoric bets that
|
|||
|
he can put microsoft-specific code into upstream git. He wins:
|
|||
|
|
|||
|
https://github.com/git/git/blob/v2.35.0/contrib/scalar/scalar.c#L144
|
|||
|
|
|||
|
# Late 2023
|
|||
|
|
|||
|
MS taught their developers to use `scalar`. Dozens of other companies who
|
|||
|
believe their repositories are big clone the Microsoft's workflow. However,
|
|||
|
their git repositories are not in the basement of their office. So many people
|
|||
|
unknowingly pay the price of calling into github every few seconds.
|
|||
|
|
|||
|
The speed of light is did not change over the last decade. If your git
|
|||
|
repository is on another continent, it will still take at least 100ms for the
|
|||
|
round-trip (plus whatever outage your git provider has this minute). Cost of
|
|||
|
SSD is ~$100/TB, this keeps decreasing.
|
|||
|
|
|||
|
`scalar.c` has been "made official" and moved from contrib to top-level. But
|
|||
|
the azure ghosts are still with us:
|
|||
|
|
|||
|
https://github.com/git/git/blob/v2.43.0/scalar.c#L145
|
|||
|
|
|||
|
# Takeaways
|
|||
|
|
|||
|
Try this if you think your repo is big:
|
|||
|
|
|||
|
```
|
|||
|
git clone -c feature.manyFiles=true git@<...>
|
|||
|
```
|
|||
|
|
|||
|
And forget shallow clones. Sparse checkouts are pretty decently done, so if
|
|||
|
your repository allows that, it may be a good thing to try.
|
|||
|
|
|||
|
Also have a look at `git maintenance` and `git config core.fsmonitor`.
|
|||
|
|
|||
|
If you eye a large company for a solution, think about their context. Your
|
|||
|
repository probably doesn't weigh hundreds of gigabytes, and it will not cost
|
|||
|
$20 million to procure larger disks for developers.
|
|||
|
|
|||
|
|