116 lines
4.8 KiB
Markdown
116 lines
4.8 KiB
Markdown
---
|
||
title: "This conversation totally didn't happen at Microsoft"
|
||
slug: microsoft-git
|
||
date: 2023-12-07T14:00:00+02:00
|
||
---
|
||
|
||
Similarity to real–world events and character names is coincidental.
|
||
|
||
Characters, Microsoft employees:
|
||
|
||
* *Amy:* a high-level executive. Ex-JPMorgan. Pragmatic.
|
||
* *Harry:* an engineer in Developer Services team. His organization
|
||
owns code hosting, developer tools and CI infrastructure. A good listener.
|
||
|
||
# 2015 — the beginning of Git at Microsoft
|
||
|
||
Exchange between Harry and Amy in a parking lot of a chilly Redmond morning:
|
||
|
||
- *Harry*: Amy, our Skype colleagues from Tallinn have been using git since
|
||
2006 and are making fun of us for using perforce in
|
||
2015. Our ex-AWS colleagues take offense, since they know Estonians are
|
||
right. In fact, everyone takes offense, because nobody likes to admit
|
||
Estonians are right. Git is a tad too slow for large repos, preventing quick
|
||
migration. Do you mind if I ask my team to take a look into this?
|
||
- *Amy*: sure, go ahead, Harry. I don't care about version control, do what you
|
||
think is right as long as it works for everyone.
|
||
|
||
Harry starts poking at git to make it work better for larger repositories.
|
||
|
||
# late 2016 — money pressure and GVFS
|
||
|
||
Harry and his team implements partial clone (later renamed to sparse checkout).
|
||
With careful hand-holding, crossed fingers and during a good weather, Visual
|
||
Studio can now load the partially-cloned Windows repository without crashing.
|
||
Excitement grows. Friendly, congratulatory exchanges between Estonians and
|
||
Redmondians take place. Engineers get excited thinking the migration is "soon".
|
||
|
||
Harry and Amy again:
|
||
|
||
- *Amy:* Harry, how's that git thing going? I said I don't care about version
|
||
control, but for some reason I do now.
|
||
- *Harry:* pretty well, why?
|
||
- *Amy:* just curious, what would it take to migrate the whole company to git?
|
||
- *Harry:* the tooling is robust and we are ready to migrate. One last thing
|
||
--- Windows and Office repositories are in the hundreds of gigabytes. About
|
||
50k people will need get their laptops' disks replaced. Oh, and we will kill
|
||
the office network while they download the initial clone. With good planning,
|
||
we should be good in a month or two.
|
||
- *Amy:* sounds like $20 million for the disks and lost productivity while this
|
||
chaos settles down. Any other ideas?
|
||
- *Harry:* our central repositories are in the basement, and the office
|
||
connectivity is quite good. Maybe we can use shallow clones.
|
||
- *Amy:* whatever that means. If it helps, try to make it happen.
|
||
|
||
Harry scrambles to do something about it, creates GVFS. Open sources it.
|
||
Everyone understands it's a temporary solution, so lives wit it. People use
|
||
their git.
|
||
|
||
# 2017 — migration is over and problems with GVFS
|
||
|
||
Migration is over for the last repository. People are complaining about GVFS,
|
||
but at least they are on git. Amy did not spend her political capital on
|
||
procurement, so she is happy.
|
||
|
||
GVFS is open-source, but only sort-of. It requires many Microsoft assumptions
|
||
(e.g. don't even try MacOS), but companies cargo-cult GVFS and struggle with it
|
||
anyway, because it's Microsoft.
|
||
|
||
# 2018 — and later: github acquisition and Scalar
|
||
|
||
Microsoft buys github. Estonians no longer have anything to make fun of, so
|
||
they fall back to poking the flies on their office windows. Harry has an eye on
|
||
replacing GVFS.
|
||
|
||
Harry's team keeps improving git. Rewrites GVFS to C and renames it to
|
||
`scalar`. To take revenge of Estonians, Harry's colleague Theodoric bets that
|
||
he can put microsoft-specific code into upstream git. He wins:
|
||
|
||
https://github.com/git/git/blob/v2.35.0/contrib/scalar/scalar.c#L144
|
||
|
||
# Late 2023
|
||
|
||
MS taught their developers to use `scalar`. Dozens of other companies who
|
||
believe their repositories are big clone the Microsoft's workflow. However,
|
||
their git repositories are not in the basement of their office. So many people
|
||
unknowingly pay the price of calling into github every few seconds.
|
||
|
||
The speed of light is did not change over the last decade. If your git
|
||
repository is on another continent, it will still take at least 100ms for the
|
||
round-trip (plus whatever outage your git provider has this minute). Cost of
|
||
SSD is ~$100/TB, this keeps decreasing.
|
||
|
||
`scalar.c` has been "made official" and moved from contrib to top-level. But
|
||
the azure ghosts are still with us:
|
||
|
||
https://github.com/git/git/blob/v2.43.0/scalar.c#L145
|
||
|
||
# Takeaways
|
||
|
||
Try this if you think your repo is big:
|
||
|
||
```
|
||
git clone -c feature.manyFiles=true git@<...>
|
||
```
|
||
|
||
And forget shallow clones. Sparse checkouts are pretty decently done, so if
|
||
your repository allows that, it may be a good thing to try.
|
||
|
||
Also have a look at `git maintenance` and `git config core.fsmonitor`.
|
||
|
||
If you eye a large company for a solution, think about their context. Your
|
||
repository probably doesn't weigh hundreds of gigabytes, and it will not cost
|
||
$20 million to procure larger disks for developers.
|
||
|
||
|