jakstys.lt/content/log/2023/microsoft-git.md

116 lines
4.8 KiB
Markdown
Raw Normal View History

2023-12-07 14:24:52 +02:00
---
title: "This conversation totally didn't happen at Microsoft"
slug: microsoft-git
date: 2023-12-07T14:00:00+02:00
---
Similarity to realworld events and character names is coincidental.
Characters, Microsoft employees:
* *Amy:* a high-level executive. Ex-JPMorgan. Pragmatic.
* *Harry:* an engineer in Developer Services team. His organization
owns code hosting, developer tools and CI infrastructure. A good listener.
# 2015 — the beginning of Git at Microsoft
Exchange between Harry and Amy in a parking lot of a chilly Redmond morning:
- *Harry*: Amy, our Skype colleagues from Tallinn have been using git since
2006 and are making fun of us for using perforce in
2015. Our ex-AWS colleagues take offense, since they know Estonians are
right. In fact, everyone takes offense, because nobody likes to admit
Estonians are right. Git is a tad too slow for large repos, preventing quick
migration. Do you mind if I ask my team to take a look into this?
- *Amy*: sure, go ahead, Harry. I don't care about version control, do what you
think is right as long as it works for everyone.
Harry starts poking at git to make it work better for larger repositories.
# late 2016 — money pressure and GVFS
Harry and his team implements partial clone (later renamed to sparse checkout).
With careful hand-holding, crossed fingers and during a good weather, Visual
Studio can now load the partially-cloned Windows repository without crashing.
Excitement grows. Friendly, congratulatory exchanges between Estonians and
Redmondians take place. Engineers get excited thinking the migration is "soon".
Harry and Amy again:
- *Amy:* Harry, how's that git thing going? I said I don't care about version
control, but for some reason I do now.
- *Harry:* pretty well, why?
- *Amy:* just curious, what would it take to migrate the whole company to git?
- *Harry:* the tooling is robust and we are ready to migrate. One last thing
--- Windows and Office repositories are in the hundreds of gigabytes. About
50k people will need get their laptops' disks replaced. Oh, and we will kill
the office network while they download the initial clone. With good planning,
we should be good in a month or two.
- *Amy:* sounds like $20 million for the disks and lost productivity while this
chaos settles down. Any other ideas?
- *Harry:* our central repositories are in the basement, and the office
connectivity is quite good. Maybe we can use shallow clones.
- *Amy:* whatever that means. If it helps, try to make it happen.
Harry scrambles to do something about it, creates GVFS. Open sources it.
Everyone understands it's a temporary solution, so lives wit it. People use
their git.
# 2017 — migration is over and problems with GVFS
Migration is over for the last repository. People are complaining about GVFS,
but at least they are on git. Amy did not spend her political capital on
procurement, so she is happy.
GVFS is open-source, but only sort-of. It requires many Microsoft assumptions
(e.g. don't even try MacOS), but companies cargo-cult GVFS and struggle with it
anyway, because it's Microsoft.
# 2018 — and later: github acquisition and Scalar
Microsoft buys github. Estonians no longer have anything to make fun of, so
they fall back to poking the flies on their office windows. Harry has an eye on
replacing GVFS.
Harry's team keeps improving git. Rewrites GVFS to C and renames it to
`scalar`. To take revenge of Estonians, Harry's colleague Theodoric bets that
he can put microsoft-specific code into upstream git. He wins:
https://github.com/git/git/blob/v2.35.0/contrib/scalar/scalar.c#L144
# Late 2023
MS taught their developers to use `scalar`. Dozens of other companies who
believe their repositories are big clone the Microsoft's workflow. However,
their git repositories are not in the basement of their office. So many people
unknowingly pay the price of calling into github every few seconds.
The speed of light is did not change over the last decade. If your git
repository is on another continent, it will still take at least 100ms for the
round-trip (plus whatever outage your git provider has this minute). Cost of
SSD is ~$100/TB, this keeps decreasing.
`scalar.c` has been "made official" and moved from contrib to top-level. But
the azure ghosts are still with us:
https://github.com/git/git/blob/v2.43.0/scalar.c#L145
# Takeaways
Try this if you think your repo is big:
```
git clone -c feature.manyFiles=true git@<...>
```
And forget shallow clones. Sparse checkouts are pretty decently done, so if
your repository allows that, it may be a good thing to try.
Also have a look at `git maintenance` and `git config core.fsmonitor`.
If you eye a large company for a solution, think about their context. Your
repository probably doesn't weigh hundreds of gigabytes, and it will not cost
$20 million to procure larger disks for developers.