diff --git a/content/log/2023/microsoft-git.md b/content/log/2023/microsoft-git.md new file mode 100644 index 0000000..cef2fad --- /dev/null +++ b/content/log/2023/microsoft-git.md @@ -0,0 +1,115 @@ +--- +title: "This conversation totally didn't happen at Microsoft" +slug: microsoft-git +date: 2023-12-07T14:00:00+02:00 +--- + +Similarity to real–world events and character names is coincidental. + +Characters, Microsoft employees: + +* *Amy:* a high-level executive. Ex-JPMorgan. Pragmatic. +* *Harry:* an engineer in Developer Services team. His organization + owns code hosting, developer tools and CI infrastructure. A good listener. + +# 2015 — the beginning of Git at Microsoft + +Exchange between Harry and Amy in a parking lot of a chilly Redmond morning: + +- *Harry*: Amy, our Skype colleagues from Tallinn have been using git since + 2006 and are making fun of us for using perforce in + 2015. Our ex-AWS colleagues take offense, since they know Estonians are + right. In fact, everyone takes offense, because nobody likes to admit + Estonians are right. Git is a tad too slow for large repos, preventing quick + migration. Do you mind if I ask my team to take a look into this? +- *Amy*: sure, go ahead, Harry. I don't care about version control, do what you + think is right as long as it works for everyone. + +Harry starts poking at git to make it work better for larger repositories. + +# late 2016 — money pressure and GVFS + +Harry and his team implements partial clone (later renamed to sparse checkout). +With careful hand-holding, crossed fingers and during a good weather, Visual +Studio can now load the partially-cloned Windows repository without crashing. +Excitement grows. Friendly, congratulatory exchanges between Estonians and +Redmondians take place. Engineers get excited thinking the migration is "soon". + +Harry and Amy again: + +- *Amy:* Harry, how's that git thing going? I said I don't care about version + control, but for some reason I do now. +- *Harry:* pretty well, why? +- *Amy:* just curious, what would it take to migrate the whole company to git? +- *Harry:* the tooling is robust and we are ready to migrate. One last thing + --- Windows and Office repositories are in the hundreds of gigabytes. About + 50k people will need get their laptops' disks replaced. Oh, and we will kill + the office network while they download the initial clone. With good planning, + we should be good in a month or two. +- *Amy:* sounds like $20 million for the disks and lost productivity while this + chaos settles down. Any other ideas? +- *Harry:* our central repositories are in the basement, and the office + connectivity is quite good. Maybe we can use shallow clones. +- *Amy:* whatever that means. If it helps, try to make it happen. + +Harry scrambles to do something about it, creates GVFS. Open sources it. +Everyone understands it's a temporary solution, so lives wit it. People use +their git. + +# 2017 — migration is over and problems with GVFS + +Migration is over for the last repository. People are complaining about GVFS, +but at least they are on git. Amy did not spend her political capital on +procurement, so she is happy. + +GVFS is open-source, but only sort-of. It requires many Microsoft assumptions +(e.g. don't even try MacOS), but companies cargo-cult GVFS and struggle with it +anyway, because it's Microsoft. + +# 2018 — and later: github acquisition and Scalar + +Microsoft buys github. Estonians no longer have anything to make fun of, so +they fall back to poking the flies on their office windows. Harry has an eye on +replacing GVFS. + +Harry's team keeps improving git. Rewrites GVFS to C and renames it to +`scalar`. To take revenge of Estonians, Harry's colleague Theodoric bets that +he can put microsoft-specific code into upstream git. He wins: + +https://github.com/git/git/blob/v2.35.0/contrib/scalar/scalar.c#L144 + +# Late 2023 + +MS taught their developers to use `scalar`. Dozens of other companies who +believe their repositories are big clone the Microsoft's workflow. However, +their git repositories are not in the basement of their office. So many people +unknowingly pay the price of calling into github every few seconds. + +The speed of light is did not change over the last decade. If your git +repository is on another continent, it will still take at least 100ms for the +round-trip (plus whatever outage your git provider has this minute). Cost of +SSD is ~$100/TB, this keeps decreasing. + +`scalar.c` has been "made official" and moved from contrib to top-level. But +the azure ghosts are still with us: + +https://github.com/git/git/blob/v2.43.0/scalar.c#L145 + +# Takeaways + +Try this if you think your repo is big: + +``` +git clone -c feature.manyFiles=true git@<...> +``` + +And forget shallow clones. Sparse checkouts are pretty decently done, so if +your repository allows that, it may be a good thing to try. + +Also have a look at `git maintenance` and `git config core.fsmonitor`. + +If you eye a large company for a solution, think about their context. Your +repository probably doesn't weigh hundreds of gigabytes, and it will not cost +$20 million to procure larger disks for developers. + +