jakstys.lt/content/log/2022/how-uber-uses-zig.md

374 lines
16 KiB
Markdown
Raw Normal View History

2022-05-09 05:44:27 +03:00
---
title: "How Uber Uses Zig"
2022-05-23 16:06:13 +03:00
date: 2022-05-23T16:06:05+03:00
2022-05-09 05:44:27 +03:00
slug: how-uber-uses-zig
draft: true
---
Disclaimer: I work at Uber and am partially responsible for bringing `zig cc`
to serious internal use. Opinions are mine, this blog post is not affiliated
with Uber.
2022-05-21 18:11:40 +03:00
I talked at the [Zig Milan][zig-milan] meetup about "Onboarding Zig at Uber".
2022-05-09 05:44:27 +03:00
This post is a little about "how Uber uses Zig", and more about "my experience
of bringing Zig to Uber", from both technical and social aspects.
2022-05-21 18:11:40 +03:00
<big>The video is [here][milan-youtube]</big>. The rest of the post is a loose
2022-05-09 05:44:27 +03:00
transcript, with some commentary and errata.
{{<img src="_/2022/uber-zig-frank-tweet.jpg"
alt="My talk title, picture taken by @jedisct1"
caption="@mo_kelione is still my temporary twitter handle from 2009."
class="right"
half="true"
hint="photo"
>}}
TLDR:
2022-05-21 18:11:40 +03:00
* Uber uses Zig to compile its C/C++ code. Now only in the [Go
2022-05-21 10:44:22 +03:00
Monorepo][go-monorepo] via [bazel-zig-cc][bazel-zig-cc], with plans to
possibly expand use of `zig cc` to other languages that need a C/C++
toolchain.
2022-05-21 18:11:40 +03:00
* Main selling points of C/C++ toolchain on top of zig-cc over the
alternatives: configurable versions of glibc and macOS cross-compilation.
2022-05-21 10:44:22 +03:00
* Uber does not have any plans to use zig-the-language yet.
2022-05-09 05:44:27 +03:00
* Uber signed a support agreement with Zig Software Foundation (ZSF) to
prioritize bug fixes. The contract value is disclosed in the ZSF financial
reports.
2022-05-21 10:44:22 +03:00
* Thanks to my team, the Go Monorepo team, the Go Platform team, my director,
2022-05-09 05:44:27 +03:00
finance, legal, and of course Zig Software Foundation for making this
relationship happen. The relationship has been fruitful so far.
{{<div-clear>}}
## About Uber's tech stack
Uber started in 2010, has clocked over 15 billion trips, and made lots of cool
and innovative tech for it to happen. General-purpose "allowed" server-side
languages are Go and Java, with Python and Node allowed for specific use cases
2022-05-23 16:04:18 +03:00
(like front-end for Node and Python for data analysis/ML). C++ is used for a few
low level libraries. Use of other languages in back-end code is minimal.
2022-05-09 05:44:27 +03:00
2022-05-21 10:44:22 +03:00
Our Go Monorepo is larger than Linux kernel[^1], and worked on by a couple of
thousand engineers. In short, it's big.
2022-05-09 05:44:27 +03:00
## How does Uber use Zig?
{{<img src="_/2022/uber-zig-abg.png"
alt="Abhinav Gupta: we're using Zig's C toolchain only, not the language. It's not fully rolled out yet, but among other things, it'll enable cross compilation of C based code (as well as Go code that uses CGo). It'll drop the dependency on the system's C compiler."
caption="Abhinav's TLDR of the presentation."
class="right"
half="true"
hint="graph"
>}}
2022-05-21 18:11:40 +03:00
I can't say this better than my colleague [Abhinav Gupta][abg] from the Go
Platform team (a transcript is available in the "alt" attribute):
2022-05-09 05:44:27 +03:00
At this point of the presentation, since I explained (thanks abg!) how Uber
2022-05-21 18:11:40 +03:00
uses Zig, I could end the talk. But you all came in for the process, so after
2022-05-09 05:44:27 +03:00
an uncomfortable pause, I decided to tell more about it.
{{<div-clear>}}
## History
Pre-2018 Uber's Go services lived in their separate repositories. In 2018[^2]
services started moving to Go monorepo en masse. My team was among the first
wave --- I still remember the complexity.
### 2019: asks for a hermetic toolchain
2022-05-21 10:44:22 +03:00
At the time, the Go monorepo already used a hermetic Go toolchain. Therefore,
the Go compiler used to build the monorepo was unaffected by the compiler
installed on the system, if any. Therefore, on whichever environment a Go build
2022-05-23 16:04:18 +03:00
was running, it always used the same version of Go. Bazel docs [explain this
better than me][bazel-hermetic].
2022-05-09 05:44:27 +03:00
{{<img src="_/2022/uber-zig-gm-221.png"
alt="A Jira task asking for a hermetic C++ toolchain."
caption="This was created in 2019 and did not see much movement."
hint="graph"
>}}
2022-05-21 10:44:22 +03:00
A C++ toolchain is a collection of programs to compile C/C++ code. It is
unavoidable for some our Go code to use [CGo][cgo], so it needs a C/C++
compiler. Go then links the Go and C parts to the final executable.
2022-05-09 05:44:27 +03:00
The C++ toolchain was not hermetic since the start of Go monorepo: Bazel would
2022-05-23 16:04:18 +03:00
use whatever it found on the system. That meant Clang on macOS, GCC (whatever
2022-05-21 10:44:22 +03:00
version) on Linux. Setting up a hermetic C++ toolchain in Bazel is a lot of
work (think person-months for our monorepo), there was no immediate need, and
it also was not painful *enough* to be picked up.
2022-05-09 05:44:27 +03:00
At this point it is important to understand the limitations of a non-hermetic
C++ toolchain:
- Cannot cross-compile. So we can't compile Linux executables on a Mac if they
2022-05-21 18:11:40 +03:00
require CGo (which many of our services do). This was worked around by... not
2022-05-21 10:44:22 +03:00
cross-compiling.
2022-05-09 05:44:27 +03:00
- CGo executables would link to a glibc version that was found on the system.
That means: when upgrading the OS (multi-month effort), the build fleet must
be upgraded last. Otherwise, if build host runs a newer glibc than a
production host, the resulting binary will link against a newer glibc
version, which is incompatible to the old one still on a production host.
- We couldn't use new compilers, which have better optimizations, because we
were running an older OS on the build fleet (backporting only the compiler,
but not glibc, carries it's own risks).
2022-05-21 10:44:22 +03:00
- Official binaries for newer versions of Go are built against a more recent
version of GCC than some of our build machines. We had to work around this by
compiling Go from source on these machines.
2022-05-09 05:44:27 +03:00
All of these issues were annoying, but not enough to invest into the toolchain.
### 2020 Dec: need musl
2022-05-23 16:04:18 +03:00
I was working on a non-Uber-related toy project that is built with Bazel and
uses CGo. I wanted my binary to be static, but Bazel does not make that easy. I
spent a couple of evenings creating a Bazel toolchain on top of
[musl.cc](https://musl.cc), but didn't go far, because at the time I wasn't
able to make sense out of the Bazel's toolchain documentation, and I didn't
find a good example to rely on.
2022-05-09 05:44:27 +03:00
### 2021 Jan: discovering `zig cc`
In January of 2021 I found Andrew Kelley's blog post [`zig cc`: a Powerful
Drop-In Replacement for GCC/Clang][zig-cc-andrewrk]. I recommend reading the
article; it changed how I think about compilers (and it will help you
2022-05-21 18:11:40 +03:00
understand the remaining article better, because I gave the talk to a Zig
2022-05-09 05:44:27 +03:00
audience). To sum up the Andrew's article, `zig cc` has the following
advantages:
2022-05-23 16:04:18 +03:00
- Fully hermetic C/C++ compiler in ~40MB tarball. This is an order of magnitude
smaller than the standard Clang distributions.
2022-05-09 05:44:27 +03:00
- Can link against a glibc version that was provided as a command-line argument
(e.g. `-target x86_64-linux-gnu.2.28` will compile for x86_64 Linux and link
against glibc 2.28).
2022-05-23 16:04:18 +03:00
- Host and target are decoupled. The setup is the same for both `linux-aarch64`
and `darwin-x86_64` targets, regardless of the host.
2022-05-09 05:44:27 +03:00
- Linking with musl is "just a different libc version": `-target
x86_64-linux-musl`.
I started messing around with `zig cc`. I compiled random programs, reported
issues. I thought about making this a [bazel toolchain][bazel-toolchain], but
there were quite a few blocking bugs or missing features. One of them was lack
of `zig ar`, which Bazel relies on.
### 2021 Feb: asking for attention
2022-05-21 18:11:40 +03:00
I [reported bugs][zig-motiejus-issues] to Zig. Nothing happened for a
week. I donated $50/month, expecting "the Zig folks" to prioritize what I've
2022-05-09 05:44:27 +03:00
reported. A week of silence again. And then I dropped the bomb in
`#zig:libera.chat`:
```
<motiejus> What is the protocol to "claim" the dev hours once donated?
<andrewrk> ZSF only accepts no-strings-attached donations
<andrewrk> did you get a different impression somewhere?
```
2022-05-21 18:11:40 +03:00
Oops. At the time I hoped that whoever notice the conversation would immediately
2022-05-09 05:44:27 +03:00
forget it. Well, here it is again, more than a year later, over here, for your
enjoyment.
### 2021 June: bazel-zig-cc and Uber's Go monorepo
In June of 2021 [Adam Bouhenguel][ajbouh] created a [working bazel-zig-cc
prototype][ajbouh/bazel-zig-cc]. The basics worked, but it still lacked some
features. Andrew later implemented `zig ar`[^3], which was the last missing
piece to a truly workable bazel-zig-cc. I integrated `zig ar`, polished the
documentation and [announced my fork of bazel-zig-cc to the Zig mailing
list][bazel-zig-cc-ga]. At this point it was usable for my toy project. Win!
A few weeks after the announcement I created a "WIP DIFF" for Uber's Go
monorepo: just used my onboarding instructions and naïvely submitted it to our
CI. It failed almost all tests.
{{<img src="_/2022/uber-zig-zcc-gocode.png"
alt="A diff titled \"zig c++ toolchain\". Started in July 1, 2021"
caption="Onboarding bazel-zig-cc to Uber's Go monorepo."
hint="graph"
class="right"
half="true"
>}}
Most of the failures were caused by dependencies on system libraries. At this
point it was clear that, to truly onboard bazel-zig-cc and compile **all** it's
C/C++ code, there needs to be quite a lot of investment to remove the
2022-05-21 18:11:40 +03:00
dependency on system libraries and undoing of a lot of technical debt.
2022-05-09 05:44:27 +03:00
### 2021 End: recap
- Various places at Uber would benefit from a hermetic C++ cross-compiler, but
2022-05-21 18:11:40 +03:00
it's not funded due to requiring a large investment and not naving enough
justification.
2022-05-09 05:44:27 +03:00
- bazel-zig-cc kinda works, but both bazel-zig-cc and zig cc have known bugs.
2022-05-21 10:44:22 +03:00
- I can't realistically implement the necessary changes or bug fixes. I tried
2022-05-23 16:04:18 +03:00
implementing `zig ar`, a trivial front-end for LLVM's `ar`, and failed.
2022-05-09 05:44:27 +03:00
- Once an issue had been identified as a Zig issue, getting attention from Zig
developers was unpredictable. Some issues got resolved within days, some took
2022-05-21 18:11:40 +03:00
more than 6 months, and donations din't change `zig cc` priorities.
2022-05-21 10:44:22 +03:00
- The monorepo-onboarding diff was simmering and waiting for it's time.
2022-05-09 05:44:27 +03:00
### 2021 End: Uber needs a cross-compiler
I was tasked to evaluate arm64 for Uber. Evaluation details aside, I needed to
compile software for linux-arm64. Lots of it! Since most of our low-level infra
2022-05-21 18:11:40 +03:00
is in the Go monorepo, I needed a cross-compiler there first.
2022-05-09 05:44:27 +03:00
A business reason for a cross-compiler landed on my lap. Now now both time and
money can be invested there. Having a "WIP diff" with `zig cc` was a good
2022-05-21 18:11:40 +03:00
start, but was still very far from over: teams were not convinced it was the
2022-05-09 05:44:27 +03:00
right thing to do, the diff was too much of a prototype, and both zig-cc and
bazel-zig-cc needed lots of work before they could be used at any capacity at
Uber.
When onboarding such a technology in a large corporation, the most important
2022-05-21 18:11:40 +03:00
thing to manage is risk. As Zig is a novel technology (not even 1.0!), it was
2022-05-09 05:44:27 +03:00
truly unusual to suggest compiling all of our C and C++ code with it. We should
be planning to stick with it for at least a decade. Questions were raised and
evaluated with great care and scrutiny. For that I am truly grateful to the Go
2022-05-21 10:44:22 +03:00
Monorepo team, especially [Ken Micklas][kmicklas], for doing the work and
research on this unproven prototype.
2022-05-09 05:44:27 +03:00
### Evaluation of different compilers
Given that we now needed a cross-compiler, we had two candidates:
2022-05-23 16:04:18 +03:00
- [grailbio/bazel-toolchain][grailbio/bazel-toolchain]. Uses a vanilla Clang.
2022-05-09 05:44:27 +03:00
No risk. Well understood. Obviously safe and correct solution.
- [~motiejus/bazel-zig-cc][bazel-zig-cc]: uses `zig cc`. Buggy, risky, unsafe,
uncertain, used-by-nobody, but quite a tempting solution.
`zig cc` provides a few extra features on top of `bazel-toolchain`:
2022-05-21 18:11:40 +03:00
- configurable glibc version. With `grailbio` you would need a sysroot
2022-05-09 05:44:27 +03:00
(basically, a chroot with the system libraries, so the programs can be linked
2022-05-21 18:11:40 +03:00
against them), which would need to be maintained.
2022-05-21 10:44:22 +03:00
- a working, albeit still buggy, hermetic (cross-)compiler for macOS.
2022-05-09 05:44:27 +03:00
2022-05-21 18:11:40 +03:00
We would be able to handle glibc with either, however, `grailbio` is unlikely
to ever have a way to compile to macOS, let alone cross-compile. Relying on the
system compiler is undesirable on developer laptops, and the Go Platform feels
that first-hand, especially during macOS upgrades.
2022-05-09 05:44:27 +03:00
2022-05-21 18:11:40 +03:00
The prospect of a hermetic toolchain for macOS targets tipped the scales
2022-05-21 10:44:22 +03:00
towards `zig cc`, with all its warts, risks and instability.
2022-05-09 05:44:27 +03:00
2022-05-21 18:11:40 +03:00
There was still another problem, one of attention: if we were considering the
use of Zig in a serious capacity, we knew we will hit problems, but would be
unlikely to have the expertise to solve them. How can we, as a BigCorp, de-risk
the engagement question, making sure that bugs important to us are handled
timely? We were sure of good intentions of ZSF: it was obvious that, if we find
and report a legitimate bug, it would get fixed. But how can we put an upper
bound on latency?
2022-05-09 05:44:27 +03:00
### Money
$50 donation does not help, perhaps a large service contract would? I asked
around if we could spend some money to de-risk our "cross-compiler". Getting a
green light from the management took about 10 minutes; drafting, approving and
signing the contract took about 2 months.
Contract terms were roughly as follows:
- Uber reports issues to github.com/ziglang/zig and pings Loris.
- Loris assigns it to someone in ZSF.
- Hack hack hack hack hack.
- When done, Loris enters the number of hours worked on the issue.
2022-05-21 18:11:40 +03:00
Uber has a right to ZSF members' *time*. We have no decision or voting power
whatsoever with regards to Zig. We have right to offer suggestions, but they
have been and will be treated just like from any other third-party bystander.
We did not ask for special rights, it's explicit in the contract, and we don't
want that.
2022-05-09 05:44:27 +03:00
The contract was signed, the wire transfer completed, and in 2022 January we
2022-05-21 10:44:22 +03:00
had:
2022-05-09 05:44:27 +03:00
- A service contract with ZSF that promised to prioritize issues that we've
registered.
- A commitment from Go Platform team to make our C++ toolchain cross-compiling
and hermetic.
{{<img src="_/2022/uber-zig-deposit.png"
2022-05-21 18:11:40 +03:00
alt="Wire of $52800 from Uber to Zig Software Foundation"
2022-05-09 05:44:27 +03:00
caption="The amount of money that changed hands is public, because ZSF is a nonprofit."
hint="graph"
>}}
## 2022 and beyond
In Feb 2022 the toolchain was gated behind a command-line flag
2022-05-21 10:44:22 +03:00
(`--config=hermetic-cc`). As of Feb 2022, you can invoke `zig cc` in Uber's Go
Monorepo without requiring a custom patch.
2022-05-09 05:44:27 +03:00
{{<img src="_/2022/uber-zig-landed.png"
alt="WIP DIFF onboarding the monorepo was landed"
caption="Proof of our submitqueue landed my WIP DIFF."
hint="graph"
>}}
Timeline of 2022 so far:
- In April, around my talk in Milan, we shipped the first Debian package
compiled with zig-cc to production.
- In May we have enabled `zig cc` for all our Debian packages.
- In H2 we expect to compile all our cgo code with `zig cc` and make
the `--config=hermetic-cc` a default.
- In H2 we expect to move [bazel-zig-cc][bazel-zig-cc] under github.com/uber.
We have opened a number of issues to Zig, and, as of writing, all of them have
been resolved. Some were handled by ZSF alone, some were more involved and
required collaboration between ZSF, Uber and Go developers.
## Summary
I started preparing for the presentation hoping I can give "a runbook" how to
adopt Zig at a big company. However, there is no runbook; my effort to onboard
zig-cc could have failed due to many many reasons.
Looking back, I think the most important reasons for success is a killer
feature at the right time. In our case, there were two: glibc version selection
2022-05-21 10:44:22 +03:00
without a sysroot and cross-compiling to macOS.
2022-05-09 05:44:27 +03:00
## Appendix
I forgot to flip to the last slide in the presentation. Here it is:
```
{
```
If compilers or adopting software for other CPU architectures (and/or living in
the Eastern Europe) is your thing, my team in Vilnius is hiring. Also, my
sister teams in Seattle and Bay Area are hiring too. Ping me.
2022-05-21 10:44:22 +03:00
Credits
-------
2022-05-23 16:05:12 +03:00
Many thanks Abhinav Gupta, Loris Cro and Ken Micklas for reading drafts of
this.
2022-05-21 10:44:22 +03:00
2022-05-09 05:44:27 +03:00
[^1]: Errata: I incorrectly said "by an order of magnitude". The order of
magnitude is the same.
[^2]: Errata: I said Go was the first monorepo. Go was 4'th.
[^3]: Errata: I said Jakub implemented `zig ar`. Correction: Andrew
implemented, Jakub reviewed.
[zig-milan]: https://zig.news/kristoff/zig-milan-party-2022-final-info-schedule-1jc1
[abg]: https://abhinavg.net/
[go-monorepo]: https://eng.uber.com/go-monorepo-bazel/
[bazel-zig-cc]: https://sr.ht/~motiejus/bazel-zig-cc/
[cgo]: https://godocs.io/cmd/cgo
[zig-cc-andrewrk]: https://andrewkelley.me/post/zig-cc-powerful-drop-in-replacement-gcc-clang.html
[bazel-toolchain]: https://bazel.build/docs/toolchains
[ajbouh]: https://github.com/ajbouh/
[ajbouh/bazel-zig-cc]: https://github.com/ajbouh/bazel-zig-cc/
[bazel-zig-cc-ga]: https://lists.sr.ht/~andrewrk/ziglang/%3C20210811104907.qahogqbdjs4trihn%40mtpad.i.jakstys.lt%3E
[grailbio/bazel-toolchain]: https://github.com/grailbio/bazel-toolchain
[milan-youtube]: https://www.youtube.com/watch?v=SCj2J3HcEfc
[zig-motiejus-issues]: https://github.com/ziglang/zig/issues?q=author%3Amotiejus+sort%3Acreated-asc
2022-05-21 10:44:22 +03:00
[kmicklas]: https://github.com/kmicklas
2022-05-23 16:04:18 +03:00
[bazel-hermetic]: https://bazel.build/concepts/hermeticity