diff --git a/assets/_/2022/uber-zig-abg.png b/assets/_/2022/uber-zig-abg.png new file mode 100644 index 0000000..ac1689b Binary files /dev/null and b/assets/_/2022/uber-zig-abg.png differ diff --git a/assets/_/2022/uber-zig-deposit.png b/assets/_/2022/uber-zig-deposit.png new file mode 100644 index 0000000..a5dcae7 Binary files /dev/null and b/assets/_/2022/uber-zig-deposit.png differ diff --git a/assets/_/2022/uber-zig-frank-tweet.jpg b/assets/_/2022/uber-zig-frank-tweet.jpg new file mode 100644 index 0000000..b91583d Binary files /dev/null and b/assets/_/2022/uber-zig-frank-tweet.jpg differ diff --git a/assets/_/2022/uber-zig-gm-221.png b/assets/_/2022/uber-zig-gm-221.png new file mode 100644 index 0000000..4cbcd4d Binary files /dev/null and b/assets/_/2022/uber-zig-gm-221.png differ diff --git a/assets/_/2022/uber-zig-landed.png b/assets/_/2022/uber-zig-landed.png new file mode 100644 index 0000000..46e3b7a Binary files /dev/null and b/assets/_/2022/uber-zig-landed.png differ diff --git a/assets/_/2022/uber-zig-zcc-gocode.png b/assets/_/2022/uber-zig-zcc-gocode.png new file mode 100644 index 0000000..6191b04 Binary files /dev/null and b/assets/_/2022/uber-zig-zcc-gocode.png differ diff --git a/content/log/2022/how-uber-uses-zig.md b/content/log/2022/how-uber-uses-zig.md new file mode 100644 index 0000000..ae2401d --- /dev/null +++ b/content/log/2022/how-uber-uses-zig.md @@ -0,0 +1,354 @@ +--- +title: "How Uber Uses Zig" +date: 2022-05-20T16:51:21+03:00 +slug: how-uber-uses-zig +draft: true +--- + +Disclaimer: I work at Uber and am partially responsible for bringing `zig cc` +to serious internal use. Opinions are mine, this blog post is not affiliated +with Uber. + +I talked at [Zig Milan][zig-milan] meetup about "Onboarding Zig at Uber". +This post is a little about "how Uber uses Zig", and more about "my experience +of bringing Zig to Uber", from both technical and social aspects. + +The video is [here][milan-youtube]. The rest of the post is a loose +transcript, with some commentary and errata. + +{{My talk title, picture taken by @jedisct1}} + +TLDR: + +* Uber uses zig to compile it's C/C++ code. Now only in [Go + Monorepo][go-monorepo] via [bazel-zig-cc][bazel-zig-cc], with inconcrete + ideas to expand use of `zig cc` to other monorepos. +* Uber does not have any plans to use zig-the-language. +* Uber signed a support agreement with Zig Software Foundation (ZSF) to + prioritize bug fixes. The contract value is disclosed in the ZSF financial + reports. +* Thanks my team, the Go Monorepo team, the Go Platform team, my director, + finance, legal, and of course Zig Software Foundation for making this + relationship happen. The relationship has been fruitful so far. + +{{}} + +## About Uber's tech stack + +Uber started in 2010, has clocked over 15 billion trips, and made lots of cool +and innovative tech for it to happen. General-purpose "allowed" server-side +languages are Go and Java, with Python and Node allowed for specific use cases +(like front-end for Node and Python for data analysis/ML). Use of other +languages in back-end code is minimal. + +Our go monorepo is larger than Linux kernel[^1], and worked on by a couple of +thousand engineers. To sum up, it is size-able. + +## How does Uber use Zig? + +{{Abhinav Gupta: we're using Zig's C toolchain only, not the language. It's not fully rolled out yet, but among other things, it'll enable cross compilation of C based code (as well as Go code that uses CGo). It'll drop the dependency on the system's C compiler.}} + +I can't tell this better than my colleague [Abhinav Gupta][abg] from the Go +Platform team (the transcript is available in the "alt" attribute): + +At this point of the presentation, since I explained (thanks abg!) how Uber +uses zig, I could end the talk. But you all came in for the process, so after +an uncomfortable pause, I decided to tell more about it. + +{{}} + +## History + +Pre-2018 Uber's Go services lived in their separate repositories. In 2018[^2] +services started moving to Go monorepo en masse. My team was among the first +wave --- I still remember the complexity. + +### 2019: asks for a hermetic toolchain + +At the time, the Go monorepo already used a hermetic Go toolchain. That means +it would download the Go SDK as part of the build process. Therefore, on +whichever environment a Go build was running, it always used the same version +of Go. + +{{A Jira task asking for a hermetic C++ toolchain.}} + +C++ toolchain is a collection of programs to compile C/C++ code. Our Go code +uses quite a bit of [CGo][cgo], so it needs a C/C++ compiler. Go then links the +Go and C parts to the final executable. + +The C++ toolchain was not hermetic since the start of Go monorepo: Bazel would +use whatever it found on the system. That meant clang on MacOS, gcc (whatever +version) on Linux. Setting up C++ toolchain in Bazel is a lot of work (think +person-months for our monorepo), there was no immediate need, and it also was +not painful *enough* to be picked up. + +At this point it is important to understand the limitations of a non-hermetic +C++ toolchain: +- Cannot cross-compile. So we can't compile Linux executables on a Mac if they + have CGo (which is most of our service code). This was worked around by... + not cross-compiling. +- CGo executables would link to a glibc version that was found on the system. + That means: when upgrading the OS (multi-month effort), the build fleet must + be upgraded last. Otherwise, if build host runs a newer glibc than a + production host, the resulting binary will link against a newer glibc + version, which is incompatible to the old one still on a production host. +- We couldn't use new compilers, which have better optimizations, because we + were running an older OS on the build fleet (backporting only the compiler, + but not glibc, carries it's own risks). + +All of these issues were annoying, but not enough to invest into the toolchain. + +### 2020 Dec: need musl + +I was working on a toy project that is built with Bazel and uses CGo. I wanted +my binary to be static, but Bazel is not easily offering that. I spent a couple +of evenings creating a Bazel toolchain on top of [musl.cc](https://musl.cc), +but didn't go far, because at the time I wasn't able to make sense out of the +Bazel's toolchain documentation, and I didn't find a good example to rely on. + +### 2021 Jan: discovering `zig cc` + +In January of 2021 I found Andrew Kelley's blog post [`zig cc`: a Powerful +Drop-In Replacement for GCC/Clang][zig-cc-andrewrk]. I recommend reading the +article; it changed how I think about compilers (and it will help you +understand the remaining article better, because I gave the talk to the Zig +audience). To sum up the Andrew's article, `zig cc` has the following +advantages: + +- Fully hermetic C/C++ compiler in ~40MB tarball. +- Can link against a glibc version that was provided as a command-line argument + (e.g. `-target x86_64-linux-gnu.2.28` will compile for x86_64 Linux and link + against glibc 2.28). +- Host and target are decoupled. The setup is the same for both linux-aarch64 + and darwin-x86_64 targets, regardless of the host. +- Linking with musl is "just a different libc version": `-target + x86_64-linux-musl`. + +I started messing around with `zig cc`. I compiled random programs, reported +issues. I thought about making this a [bazel toolchain][bazel-toolchain], but +there were quite a few blocking bugs or missing features. One of them was lack +of `zig ar`, which Bazel relies on. + +### 2021 Feb: asking for attention + +I [reported bugs][zig-motiejus-issues] to zig. Nothing happened for a +week. I donated $50/month, expecting "the zig folks" to prioritize what I've +reported. A week of silence again. And then I dropped the bomb in +`#zig:libera.chat`: + +``` + What is the protocol to "claim" the dev hours once donated? + ZSF only accepts no-strings-attached donations + did you get a different impression somewhere? +``` + +Oops. At the time I hoped that whoever notice the conversation will immediately +forget it. Well, here it is again, more than a year later, over here, for your +enjoyment. + +### 2021 June: bazel-zig-cc and Uber's Go monorepo + +In June of 2021 [Adam Bouhenguel][ajbouh] created a [working bazel-zig-cc +prototype][ajbouh/bazel-zig-cc]. The basics worked, but it still lacked some +features. Andrew later implemented `zig ar`[^3], which was the last missing +piece to a truly workable bazel-zig-cc. I integrated `zig ar`, polished the +documentation and [announced my fork of bazel-zig-cc to the Zig mailing +list][bazel-zig-cc-ga]. At this point it was usable for my toy project. Win! + +A few weeks after the announcement I created a "WIP DIFF" for Uber's Go +monorepo: just used my onboarding instructions and naïvely submitted it to our +CI. It failed almost all tests. + +{{A diff titled \}} + +Most of the failures were caused by dependencies on system libraries. At this +point it was clear that, to truly onboard bazel-zig-cc and compile **all** it's +C/C++ code, there needs to be quite a lot of investment to remove the +dependency on system libraries and undoing of a lot of tech debt. + +### 2021 End: recap + +- Various places at Uber would benefit from a hermetic C++ cross-compiler, but + it's not funded due to a large investment and not enough justification. +- bazel-zig-cc kinda works, but both bazel-zig-cc and zig cc have known bugs. + - Donations don't "help" for `zig cc`, and I can't realistically implement + them. I tried with `zig ar`, a trivial front-end for llvm's ld, and failed. +- The monorepo-onboarding diff was simmering and waiting for it's time. +- Once an issue had been identified as a Zig issue, getting attention from Zig + developers was unpredictable. Some issues got resolved within days, some took + more than 6 months. + +### 2021 End: Uber needs a cross-compiler + +I was tasked to evaluate arm64 for Uber. Evaluation details aside, I needed to +compile software for linux-arm64. Lots of it! Since most of our low-level infra +is in Go monorepo, I needed a cross-compiler there first. + +A business reason for a cross-compiler landed on my lap. Now now both time and +money can be invested there. Having a "WIP diff" with `zig cc` was a good +start, but was still very far from over: teams were not convinced it's the +right thing to do, the diff was too much of a prototype, and both zig-cc and +bazel-zig-cc needed lots of work before they could be used at any capacity at +Uber. + +When onboarding such a technology in a large corporation, the most important +thing to manage is risk. As zig is a novel technology (not even 1.0!), it was +truly unusual to suggest compiling all of our C and C++ code with it. We should +be planning to stick with it for at least a decade. Questions were raised and +evaluated with great care and scrutiny. For that I am truly grateful to the Go +Monorepo team, especially Ken Micklas, for doing the work and research on this +unproven prototype. + +### Evaluation of different compilers + +Given that we now needed a cross-compiler, we had two candidates: + +- [grailbio/bazel-toolchain][grailbio/bazel-toolchain]. Uses a vanilla clang. + No risk. Well understood. Obviously safe and correct solution. +- [~motiejus/bazel-zig-cc][bazel-zig-cc]: uses `zig cc`. Buggy, risky, unsafe, + uncertain, used-by-nobody, but quite a tempting solution. + +`zig cc` provides a few extra features on top of `bazel-toolchain`: +- configurable glibc version. In grailbio case you would need a sysroot + (basically, a chroot with the system libraries, so the programs can be linked + against them), which needs to be maintained. +- a working, albeit still buggy, hermetic (cross-)compiler for OSX. + +Glibc we can handle in either case. However, `bazel-toolchain` will unlikely +ever have a way to compile to OSX, let alone cross-compile. Relying on the +system compiler is undesirable on developer laptops, and Go Platform feels that +first-hand, especially during OSX upgrades. + +The prospect of a hermetic toolchain for OSX targets tripped the scales towards +`zig cc`, with all it's warts, risks and instability. + +There was another, attention problem: if we were considering to use zig in a +serious capacity, we knew we will hit problems, but unlikely have the expertise +to solve them. How can we, as a BigCorp, de-risk the engagement question, +making sure that bugs important to us are handled timely? We were sure of good +intentions of ZSF: it was obvious that, if we find and report a legitimate bug, +it would get fixed. But how can we put an upper bound on latency? + +### Money + +$50 donation does not help, perhaps a large service contract would? I asked +around if we could spend some money to de-risk our "cross-compiler". Getting a +green light from the management took about 10 minutes; drafting, approving and +signing the contract took about 2 months. + +Contract terms were roughly as follows: +- Uber reports issues to github.com/ziglang/zig and pings Loris. +- Loris assigns it to someone in ZSF. +- Hack hack hack hack hack. +- When done, Loris enters the number of hours worked on the issue. + +Uber has the right to *time* of ZSF members. We have no decision or voting +power whatsoever with regards to Zig. We have right to offer suggestions, but +they have been and will be treated just like from any other third-party +bystander. We did not ask for special rights, it's explicit in the contract, +and we don't want that. + +The contract was signed, the wire transfer completed, and in 2022 January we +hpad: + +- A service contract with ZSF that promised to prioritize issues that we've + registered. +- A commitment from Go Platform team to make our C++ toolchain cross-compiling + and hermetic. + +{{Wire of $52800 from Uber to Zig Software Foundation}} + +## 2022 and beyond + +In Feb 2022 the toolchain was gated behind a command-line flag +(`--config=hermetic-cc`). As of Feb 2022, you can invoke `zig cc` in Uber's go +monorepo without requiring a custom patch. + +{{WIP DIFF onboarding the monorepo was landed}} + +Timeline of 2022 so far: + +- In April, around my talk in Milan, we shipped the first Debian package + compiled with zig-cc to production. +- In May we have enabled `zig cc` for all our Debian packages. +- In H2 we expect to compile all our cgo code with `zig cc` and make + the `--config=hermetic-cc` a default. +- In H2 we expect to move [bazel-zig-cc][bazel-zig-cc] under github.com/uber. + +We have opened a number of issues to Zig, and, as of writing, all of them have +been resolved. Some were handled by ZSF alone, some were more involved and +required collaboration between ZSF, Uber and Go developers. + +## Summary + +I started preparing for the presentation hoping I can give "a runbook" how to +adopt Zig at a big company. However, there is no runbook; my effort to onboard +zig-cc could have failed due to many many reasons. + +Looking back, I think the most important reasons for success is a killer +feature at the right time. In our case, there were two: glibc version selection +without a sysroot and cross-compiling to OSX. + +## Appendix + +I forgot to flip to the last slide in the presentation. Here it is: + +``` + +{ + +``` + +If compilers or adopting software for other CPU architectures (and/or living in +the Eastern Europe) is your thing, my team in Vilnius is hiring. Also, my +sister teams in Seattle and Bay Area are hiring too. Ping me. + +[^1]: Errata: I incorrectly said "by an order of magnitude". The order of + magnitude is the same. +[^2]: Errata: I said Go was the first monorepo. Go was 4'th. +[^3]: Errata: I said Jakub implemented `zig ar`. Correction: Andrew + implemented, Jakub reviewed. + +[zig-milan]: https://zig.news/kristoff/zig-milan-party-2022-final-info-schedule-1jc1 +[abg]: https://abhinavg.net/ +[go-monorepo]: https://eng.uber.com/go-monorepo-bazel/ +[bazel-zig-cc]: https://sr.ht/~motiejus/bazel-zig-cc/ +[cgo]: https://godocs.io/cmd/cgo +[zig-cc-andrewrk]: https://andrewkelley.me/post/zig-cc-powerful-drop-in-replacement-gcc-clang.html +[bazel-toolchain]: https://bazel.build/docs/toolchains +[ajbouh]: https://github.com/ajbouh/ +[ajbouh/bazel-zig-cc]: https://github.com/ajbouh/bazel-zig-cc/ +[bazel-zig-cc-ga]: https://lists.sr.ht/~andrewrk/ziglang/%3C20210811104907.qahogqbdjs4trihn%40mtpad.i.jakstys.lt%3E +[grailbio/bazel-toolchain]: https://github.com/grailbio/bazel-toolchain +[milan-youtube]: https://www.youtube.com/watch?v=SCj2J3HcEfc +[zig-motiejus-issues]: https://github.com/ziglang/zig/issues?q=author%3Amotiejus+sort%3Acreated-asc