how-uber-uses-zig
This commit is contained in:
parent
3778b35f12
commit
304c635b27
BIN
assets/_/2022/uber-zig-abg.png
Normal file
BIN
assets/_/2022/uber-zig-abg.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 92 KiB |
BIN
assets/_/2022/uber-zig-deposit.png
Normal file
BIN
assets/_/2022/uber-zig-deposit.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 38 KiB |
BIN
assets/_/2022/uber-zig-frank-tweet.jpg
Normal file
BIN
assets/_/2022/uber-zig-frank-tweet.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 84 KiB |
BIN
assets/_/2022/uber-zig-gm-221.png
Normal file
BIN
assets/_/2022/uber-zig-gm-221.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 62 KiB |
BIN
assets/_/2022/uber-zig-landed.png
Normal file
BIN
assets/_/2022/uber-zig-landed.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 38 KiB |
BIN
assets/_/2022/uber-zig-zcc-gocode.png
Normal file
BIN
assets/_/2022/uber-zig-zcc-gocode.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 14 KiB |
354
content/log/2022/how-uber-uses-zig.md
Normal file
354
content/log/2022/how-uber-uses-zig.md
Normal file
@ -0,0 +1,354 @@
|
||||
---
|
||||
title: "How Uber Uses Zig"
|
||||
date: 2022-05-20T16:51:21+03:00
|
||||
slug: how-uber-uses-zig
|
||||
draft: true
|
||||
---
|
||||
|
||||
Disclaimer: I work at Uber and am partially responsible for bringing `zig cc`
|
||||
to serious internal use. Opinions are mine, this blog post is not affiliated
|
||||
with Uber.
|
||||
|
||||
I talked at [Zig Milan][zig-milan] meetup about "Onboarding Zig at Uber".
|
||||
This post is a little about "how Uber uses Zig", and more about "my experience
|
||||
of bringing Zig to Uber", from both technical and social aspects.
|
||||
|
||||
<big>The video is [here][milan-youtube]</big>. The rest of the post is a loose
|
||||
transcript, with some commentary and errata.
|
||||
|
||||
{{<img src="_/2022/uber-zig-frank-tweet.jpg"
|
||||
alt="My talk title, picture taken by @jedisct1"
|
||||
caption="@mo_kelione is still my temporary twitter handle from 2009."
|
||||
class="right"
|
||||
half="true"
|
||||
hint="photo"
|
||||
>}}
|
||||
|
||||
TLDR:
|
||||
|
||||
* Uber uses zig to compile it's C/C++ code. Now only in [Go
|
||||
Monorepo][go-monorepo] via [bazel-zig-cc][bazel-zig-cc], with inconcrete
|
||||
ideas to expand use of `zig cc` to other monorepos.
|
||||
* Uber does not have any plans to use zig-the-language.
|
||||
* Uber signed a support agreement with Zig Software Foundation (ZSF) to
|
||||
prioritize bug fixes. The contract value is disclosed in the ZSF financial
|
||||
reports.
|
||||
* Thanks my team, the Go Monorepo team, the Go Platform team, my director,
|
||||
finance, legal, and of course Zig Software Foundation for making this
|
||||
relationship happen. The relationship has been fruitful so far.
|
||||
|
||||
{{<div-clear>}}
|
||||
|
||||
## About Uber's tech stack
|
||||
|
||||
Uber started in 2010, has clocked over 15 billion trips, and made lots of cool
|
||||
and innovative tech for it to happen. General-purpose "allowed" server-side
|
||||
languages are Go and Java, with Python and Node allowed for specific use cases
|
||||
(like front-end for Node and Python for data analysis/ML). Use of other
|
||||
languages in back-end code is minimal.
|
||||
|
||||
Our go monorepo is larger than Linux kernel[^1], and worked on by a couple of
|
||||
thousand engineers. To sum up, it is size-able.
|
||||
|
||||
## How does Uber use Zig?
|
||||
|
||||
{{<img src="_/2022/uber-zig-abg.png"
|
||||
alt="Abhinav Gupta: we're using Zig's C toolchain only, not the language. It's not fully rolled out yet, but among other things, it'll enable cross compilation of C based code (as well as Go code that uses CGo). It'll drop the dependency on the system's C compiler."
|
||||
caption="Abhinav's TLDR of the presentation."
|
||||
class="right"
|
||||
half="true"
|
||||
hint="graph"
|
||||
>}}
|
||||
|
||||
I can't tell this better than my colleague [Abhinav Gupta][abg] from the Go
|
||||
Platform team (the transcript is available in the "alt" attribute):
|
||||
|
||||
At this point of the presentation, since I explained (thanks abg!) how Uber
|
||||
uses zig, I could end the talk. But you all came in for the process, so after
|
||||
an uncomfortable pause, I decided to tell more about it.
|
||||
|
||||
{{<div-clear>}}
|
||||
|
||||
## History
|
||||
|
||||
Pre-2018 Uber's Go services lived in their separate repositories. In 2018[^2]
|
||||
services started moving to Go monorepo en masse. My team was among the first
|
||||
wave --- I still remember the complexity.
|
||||
|
||||
### 2019: asks for a hermetic toolchain
|
||||
|
||||
At the time, the Go monorepo already used a hermetic Go toolchain. That means
|
||||
it would download the Go SDK as part of the build process. Therefore, on
|
||||
whichever environment a Go build was running, it always used the same version
|
||||
of Go.
|
||||
|
||||
{{<img src="_/2022/uber-zig-gm-221.png"
|
||||
alt="A Jira task asking for a hermetic C++ toolchain."
|
||||
caption="This was created in 2019 and did not see much movement."
|
||||
hint="graph"
|
||||
>}}
|
||||
|
||||
C++ toolchain is a collection of programs to compile C/C++ code. Our Go code
|
||||
uses quite a bit of [CGo][cgo], so it needs a C/C++ compiler. Go then links the
|
||||
Go and C parts to the final executable.
|
||||
|
||||
The C++ toolchain was not hermetic since the start of Go monorepo: Bazel would
|
||||
use whatever it found on the system. That meant clang on MacOS, gcc (whatever
|
||||
version) on Linux. Setting up C++ toolchain in Bazel is a lot of work (think
|
||||
person-months for our monorepo), there was no immediate need, and it also was
|
||||
not painful *enough* to be picked up.
|
||||
|
||||
At this point it is important to understand the limitations of a non-hermetic
|
||||
C++ toolchain:
|
||||
- Cannot cross-compile. So we can't compile Linux executables on a Mac if they
|
||||
have CGo (which is most of our service code). This was worked around by...
|
||||
not cross-compiling.
|
||||
- CGo executables would link to a glibc version that was found on the system.
|
||||
That means: when upgrading the OS (multi-month effort), the build fleet must
|
||||
be upgraded last. Otherwise, if build host runs a newer glibc than a
|
||||
production host, the resulting binary will link against a newer glibc
|
||||
version, which is incompatible to the old one still on a production host.
|
||||
- We couldn't use new compilers, which have better optimizations, because we
|
||||
were running an older OS on the build fleet (backporting only the compiler,
|
||||
but not glibc, carries it's own risks).
|
||||
|
||||
All of these issues were annoying, but not enough to invest into the toolchain.
|
||||
|
||||
### 2020 Dec: need musl
|
||||
|
||||
I was working on a toy project that is built with Bazel and uses CGo. I wanted
|
||||
my binary to be static, but Bazel is not easily offering that. I spent a couple
|
||||
of evenings creating a Bazel toolchain on top of [musl.cc](https://musl.cc),
|
||||
but didn't go far, because at the time I wasn't able to make sense out of the
|
||||
Bazel's toolchain documentation, and I didn't find a good example to rely on.
|
||||
|
||||
### 2021 Jan: discovering `zig cc`
|
||||
|
||||
In January of 2021 I found Andrew Kelley's blog post [`zig cc`: a Powerful
|
||||
Drop-In Replacement for GCC/Clang][zig-cc-andrewrk]. I recommend reading the
|
||||
article; it changed how I think about compilers (and it will help you
|
||||
understand the remaining article better, because I gave the talk to the Zig
|
||||
audience). To sum up the Andrew's article, `zig cc` has the following
|
||||
advantages:
|
||||
|
||||
- Fully hermetic C/C++ compiler in ~40MB tarball.
|
||||
- Can link against a glibc version that was provided as a command-line argument
|
||||
(e.g. `-target x86_64-linux-gnu.2.28` will compile for x86_64 Linux and link
|
||||
against glibc 2.28).
|
||||
- Host and target are decoupled. The setup is the same for both linux-aarch64
|
||||
and darwin-x86_64 targets, regardless of the host.
|
||||
- Linking with musl is "just a different libc version": `-target
|
||||
x86_64-linux-musl`.
|
||||
|
||||
I started messing around with `zig cc`. I compiled random programs, reported
|
||||
issues. I thought about making this a [bazel toolchain][bazel-toolchain], but
|
||||
there were quite a few blocking bugs or missing features. One of them was lack
|
||||
of `zig ar`, which Bazel relies on.
|
||||
|
||||
### 2021 Feb: asking for attention
|
||||
|
||||
I [reported bugs][zig-motiejus-issues] to zig. Nothing happened for a
|
||||
week. I donated $50/month, expecting "the zig folks" to prioritize what I've
|
||||
reported. A week of silence again. And then I dropped the bomb in
|
||||
`#zig:libera.chat`:
|
||||
|
||||
```
|
||||
<motiejus> What is the protocol to "claim" the dev hours once donated?
|
||||
<andrewrk> ZSF only accepts no-strings-attached donations
|
||||
<andrewrk> did you get a different impression somewhere?
|
||||
```
|
||||
|
||||
Oops. At the time I hoped that whoever notice the conversation will immediately
|
||||
forget it. Well, here it is again, more than a year later, over here, for your
|
||||
enjoyment.
|
||||
|
||||
### 2021 June: bazel-zig-cc and Uber's Go monorepo
|
||||
|
||||
In June of 2021 [Adam Bouhenguel][ajbouh] created a [working bazel-zig-cc
|
||||
prototype][ajbouh/bazel-zig-cc]. The basics worked, but it still lacked some
|
||||
features. Andrew later implemented `zig ar`[^3], which was the last missing
|
||||
piece to a truly workable bazel-zig-cc. I integrated `zig ar`, polished the
|
||||
documentation and [announced my fork of bazel-zig-cc to the Zig mailing
|
||||
list][bazel-zig-cc-ga]. At this point it was usable for my toy project. Win!
|
||||
|
||||
A few weeks after the announcement I created a "WIP DIFF" for Uber's Go
|
||||
monorepo: just used my onboarding instructions and naïvely submitted it to our
|
||||
CI. It failed almost all tests.
|
||||
|
||||
{{<img src="_/2022/uber-zig-zcc-gocode.png"
|
||||
alt="A diff titled \"zig c++ toolchain\". Started in July 1, 2021"
|
||||
caption="Onboarding bazel-zig-cc to Uber's Go monorepo."
|
||||
hint="graph"
|
||||
class="right"
|
||||
half="true"
|
||||
>}}
|
||||
|
||||
Most of the failures were caused by dependencies on system libraries. At this
|
||||
point it was clear that, to truly onboard bazel-zig-cc and compile **all** it's
|
||||
C/C++ code, there needs to be quite a lot of investment to remove the
|
||||
dependency on system libraries and undoing of a lot of tech debt.
|
||||
|
||||
### 2021 End: recap
|
||||
|
||||
- Various places at Uber would benefit from a hermetic C++ cross-compiler, but
|
||||
it's not funded due to a large investment and not enough justification.
|
||||
- bazel-zig-cc kinda works, but both bazel-zig-cc and zig cc have known bugs.
|
||||
- Donations don't "help" for `zig cc`, and I can't realistically implement
|
||||
them. I tried with `zig ar`, a trivial front-end for llvm's ld, and failed.
|
||||
- The monorepo-onboarding diff was simmering and waiting for it's time.
|
||||
- Once an issue had been identified as a Zig issue, getting attention from Zig
|
||||
developers was unpredictable. Some issues got resolved within days, some took
|
||||
more than 6 months.
|
||||
|
||||
### 2021 End: Uber needs a cross-compiler
|
||||
|
||||
I was tasked to evaluate arm64 for Uber. Evaluation details aside, I needed to
|
||||
compile software for linux-arm64. Lots of it! Since most of our low-level infra
|
||||
is in Go monorepo, I needed a cross-compiler there first.
|
||||
|
||||
A business reason for a cross-compiler landed on my lap. Now now both time and
|
||||
money can be invested there. Having a "WIP diff" with `zig cc` was a good
|
||||
start, but was still very far from over: teams were not convinced it's the
|
||||
right thing to do, the diff was too much of a prototype, and both zig-cc and
|
||||
bazel-zig-cc needed lots of work before they could be used at any capacity at
|
||||
Uber.
|
||||
|
||||
When onboarding such a technology in a large corporation, the most important
|
||||
thing to manage is risk. As zig is a novel technology (not even 1.0!), it was
|
||||
truly unusual to suggest compiling all of our C and C++ code with it. We should
|
||||
be planning to stick with it for at least a decade. Questions were raised and
|
||||
evaluated with great care and scrutiny. For that I am truly grateful to the Go
|
||||
Monorepo team, especially Ken Micklas, for doing the work and research on this
|
||||
unproven prototype.
|
||||
|
||||
### Evaluation of different compilers
|
||||
|
||||
Given that we now needed a cross-compiler, we had two candidates:
|
||||
|
||||
- [grailbio/bazel-toolchain][grailbio/bazel-toolchain]. Uses a vanilla clang.
|
||||
No risk. Well understood. Obviously safe and correct solution.
|
||||
- [~motiejus/bazel-zig-cc][bazel-zig-cc]: uses `zig cc`. Buggy, risky, unsafe,
|
||||
uncertain, used-by-nobody, but quite a tempting solution.
|
||||
|
||||
`zig cc` provides a few extra features on top of `bazel-toolchain`:
|
||||
- configurable glibc version. In grailbio case you would need a sysroot
|
||||
(basically, a chroot with the system libraries, so the programs can be linked
|
||||
against them), which needs to be maintained.
|
||||
- a working, albeit still buggy, hermetic (cross-)compiler for OSX.
|
||||
|
||||
Glibc we can handle in either case. However, `bazel-toolchain` will unlikely
|
||||
ever have a way to compile to OSX, let alone cross-compile. Relying on the
|
||||
system compiler is undesirable on developer laptops, and Go Platform feels that
|
||||
first-hand, especially during OSX upgrades.
|
||||
|
||||
The prospect of a hermetic toolchain for OSX targets tripped the scales towards
|
||||
`zig cc`, with all it's warts, risks and instability.
|
||||
|
||||
There was another, attention problem: if we were considering to use zig in a
|
||||
serious capacity, we knew we will hit problems, but unlikely have the expertise
|
||||
to solve them. How can we, as a BigCorp, de-risk the engagement question,
|
||||
making sure that bugs important to us are handled timely? We were sure of good
|
||||
intentions of ZSF: it was obvious that, if we find and report a legitimate bug,
|
||||
it would get fixed. But how can we put an upper bound on latency?
|
||||
|
||||
### Money
|
||||
|
||||
$50 donation does not help, perhaps a large service contract would? I asked
|
||||
around if we could spend some money to de-risk our "cross-compiler". Getting a
|
||||
green light from the management took about 10 minutes; drafting, approving and
|
||||
signing the contract took about 2 months.
|
||||
|
||||
Contract terms were roughly as follows:
|
||||
- Uber reports issues to github.com/ziglang/zig and pings Loris.
|
||||
- Loris assigns it to someone in ZSF.
|
||||
- Hack hack hack hack hack.
|
||||
- When done, Loris enters the number of hours worked on the issue.
|
||||
|
||||
Uber has the right to *time* of ZSF members. We have no decision or voting
|
||||
power whatsoever with regards to Zig. We have right to offer suggestions, but
|
||||
they have been and will be treated just like from any other third-party
|
||||
bystander. We did not ask for special rights, it's explicit in the contract,
|
||||
and we don't want that.
|
||||
|
||||
The contract was signed, the wire transfer completed, and in 2022 January we
|
||||
hpad:
|
||||
|
||||
- A service contract with ZSF that promised to prioritize issues that we've
|
||||
registered.
|
||||
- A commitment from Go Platform team to make our C++ toolchain cross-compiling
|
||||
and hermetic.
|
||||
|
||||
{{<img src="_/2022/uber-zig-deposit.png"
|
||||
alt="Wire of $52800 from Uber to Zig Software Foundation"
|
||||
caption="The amount of money that changed hands is public, because ZSF is a nonprofit."
|
||||
hint="graph"
|
||||
>}}
|
||||
|
||||
## 2022 and beyond
|
||||
|
||||
In Feb 2022 the toolchain was gated behind a command-line flag
|
||||
(`--config=hermetic-cc`). As of Feb 2022, you can invoke `zig cc` in Uber's go
|
||||
monorepo without requiring a custom patch.
|
||||
|
||||
{{<img src="_/2022/uber-zig-landed.png"
|
||||
alt="WIP DIFF onboarding the monorepo was landed"
|
||||
caption="Proof of our submitqueue landed my WIP DIFF."
|
||||
hint="graph"
|
||||
>}}
|
||||
|
||||
Timeline of 2022 so far:
|
||||
|
||||
- In April, around my talk in Milan, we shipped the first Debian package
|
||||
compiled with zig-cc to production.
|
||||
- In May we have enabled `zig cc` for all our Debian packages.
|
||||
- In H2 we expect to compile all our cgo code with `zig cc` and make
|
||||
the `--config=hermetic-cc` a default.
|
||||
- In H2 we expect to move [bazel-zig-cc][bazel-zig-cc] under github.com/uber.
|
||||
|
||||
We have opened a number of issues to Zig, and, as of writing, all of them have
|
||||
been resolved. Some were handled by ZSF alone, some were more involved and
|
||||
required collaboration between ZSF, Uber and Go developers.
|
||||
|
||||
## Summary
|
||||
|
||||
I started preparing for the presentation hoping I can give "a runbook" how to
|
||||
adopt Zig at a big company. However, there is no runbook; my effort to onboard
|
||||
zig-cc could have failed due to many many reasons.
|
||||
|
||||
Looking back, I think the most important reasons for success is a killer
|
||||
feature at the right time. In our case, there were two: glibc version selection
|
||||
without a sysroot and cross-compiling to OSX.
|
||||
|
||||
## Appendix
|
||||
|
||||
I forgot to flip to the last slide in the presentation. Here it is:
|
||||
|
||||
```
|
||||
|
||||
{
|
||||
|
||||
```
|
||||
|
||||
If compilers or adopting software for other CPU architectures (and/or living in
|
||||
the Eastern Europe) is your thing, my team in Vilnius is hiring. Also, my
|
||||
sister teams in Seattle and Bay Area are hiring too. Ping me.
|
||||
|
||||
[^1]: Errata: I incorrectly said "by an order of magnitude". The order of
|
||||
magnitude is the same.
|
||||
[^2]: Errata: I said Go was the first monorepo. Go was 4'th.
|
||||
[^3]: Errata: I said Jakub implemented `zig ar`. Correction: Andrew
|
||||
implemented, Jakub reviewed.
|
||||
|
||||
[zig-milan]: https://zig.news/kristoff/zig-milan-party-2022-final-info-schedule-1jc1
|
||||
[abg]: https://abhinavg.net/
|
||||
[go-monorepo]: https://eng.uber.com/go-monorepo-bazel/
|
||||
[bazel-zig-cc]: https://sr.ht/~motiejus/bazel-zig-cc/
|
||||
[cgo]: https://godocs.io/cmd/cgo
|
||||
[zig-cc-andrewrk]: https://andrewkelley.me/post/zig-cc-powerful-drop-in-replacement-gcc-clang.html
|
||||
[bazel-toolchain]: https://bazel.build/docs/toolchains
|
||||
[ajbouh]: https://github.com/ajbouh/
|
||||
[ajbouh/bazel-zig-cc]: https://github.com/ajbouh/bazel-zig-cc/
|
||||
[bazel-zig-cc-ga]: https://lists.sr.ht/~andrewrk/ziglang/%3C20210811104907.qahogqbdjs4trihn%40mtpad.i.jakstys.lt%3E
|
||||
[grailbio/bazel-toolchain]: https://github.com/grailbio/bazel-toolchain
|
||||
[milan-youtube]: https://www.youtube.com/watch?v=SCj2J3HcEfc
|
||||
[zig-motiejus-issues]: https://github.com/ziglang/zig/issues?q=author%3Amotiejus+sort%3Acreated-asc
|
Loading…
Reference in New Issue
Block a user