diff --git a/content/log/2022/how-uber-uses-zig.md b/content/log/2022/how-uber-uses-zig.md index ae2401d..99bbab2 100644 --- a/content/log/2022/how-uber-uses-zig.md +++ b/content/log/2022/how-uber-uses-zig.md @@ -26,14 +26,15 @@ transcript, with some commentary and errata. TLDR: -* Uber uses zig to compile it's C/C++ code. Now only in [Go - Monorepo][go-monorepo] via [bazel-zig-cc][bazel-zig-cc], with inconcrete - ideas to expand use of `zig cc` to other monorepos. -* Uber does not have any plans to use zig-the-language. +* Uber uses zig to compile its C/C++ code. Now only in the [Go + Monorepo][go-monorepo] via [bazel-zig-cc][bazel-zig-cc], with plans to + possibly expand use of `zig cc` to other languages that need a C/C++ + toolchain. +* Uber does not have any plans to use zig-the-language yet. * Uber signed a support agreement with Zig Software Foundation (ZSF) to prioritize bug fixes. The contract value is disclosed in the ZSF financial reports. -* Thanks my team, the Go Monorepo team, the Go Platform team, my director, +* Thanks to my team, the Go Monorepo team, the Go Platform team, my director, finance, legal, and of course Zig Software Foundation for making this relationship happen. The relationship has been fruitful so far. @@ -47,8 +48,8 @@ languages are Go and Java, with Python and Node allowed for specific use cases (like front-end for Node and Python for data analysis/ML). Use of other languages in back-end code is minimal. -Our go monorepo is larger than Linux kernel[^1], and worked on by a couple of -thousand engineers. To sum up, it is size-able. +Our Go Monorepo is larger than Linux kernel[^1], and worked on by a couple of +thousand engineers. In short, it's big. ## How does Uber use Zig? @@ -77,10 +78,10 @@ wave --- I still remember the complexity. ### 2019: asks for a hermetic toolchain -At the time, the Go monorepo already used a hermetic Go toolchain. That means -it would download the Go SDK as part of the build process. Therefore, on -whichever environment a Go build was running, it always used the same version -of Go. +At the time, the Go monorepo already used a hermetic Go toolchain. Therefore, +the Go compiler used to build the monorepo was unaffected by the compiler +installed on the system, if any. Therefore, on whichever environment a Go build +was running, it always used the same version of Go. {{A Jira task asking for a hermetic C++ toolchain.}} -C++ toolchain is a collection of programs to compile C/C++ code. Our Go code -uses quite a bit of [CGo][cgo], so it needs a C/C++ compiler. Go then links the -Go and C parts to the final executable. +A C++ toolchain is a collection of programs to compile C/C++ code. It is +unavoidable for some our Go code to use [CGo][cgo], so it needs a C/C++ +compiler. Go then links the Go and C parts to the final executable. The C++ toolchain was not hermetic since the start of Go monorepo: Bazel would -use whatever it found on the system. That meant clang on MacOS, gcc (whatever -version) on Linux. Setting up C++ toolchain in Bazel is a lot of work (think -person-months for our monorepo), there was no immediate need, and it also was -not painful *enough* to be picked up. +use whatever it found on the system. That meant clang on macOS, gcc (whatever +version) on Linux. Setting up a hermetic C++ toolchain in Bazel is a lot of +work (think person-months for our monorepo), there was no immediate need, and +it also was not painful *enough* to be picked up. At this point it is important to understand the limitations of a non-hermetic C++ toolchain: - Cannot cross-compile. So we can't compile Linux executables on a Mac if they - have CGo (which is most of our service code). This was worked around by... - not cross-compiling. + have CGo (which many of our services do). This was worked around by... not + cross-compiling. - CGo executables would link to a glibc version that was found on the system. That means: when upgrading the OS (multi-month effort), the build fleet must be upgraded last. Otherwise, if build host runs a newer glibc than a @@ -111,15 +112,18 @@ C++ toolchain: - We couldn't use new compilers, which have better optimizations, because we were running an older OS on the build fleet (backporting only the compiler, but not glibc, carries it's own risks). +- Official binaries for newer versions of Go are built against a more recent + version of GCC than some of our build machines. We had to work around this by + compiling Go from source on these machines. All of these issues were annoying, but not enough to invest into the toolchain. ### 2020 Dec: need musl I was working on a toy project that is built with Bazel and uses CGo. I wanted -my binary to be static, but Bazel is not easily offering that. I spent a couple -of evenings creating a Bazel toolchain on top of [musl.cc](https://musl.cc), -but didn't go far, because at the time I wasn't able to make sense out of the +my binary to be static, but Bazel does not make that easy. I spent a couple of +evenings creating a Bazel toolchain on top of [musl.cc](https://musl.cc), but +didn't go far, because at the time I wasn't able to make sense out of the Bazel's toolchain documentation, and I didn't find a good example to rely on. ### 2021 Jan: discovering `zig cc` @@ -193,12 +197,12 @@ dependency on system libraries and undoing of a lot of tech debt. - Various places at Uber would benefit from a hermetic C++ cross-compiler, but it's not funded due to a large investment and not enough justification. - bazel-zig-cc kinda works, but both bazel-zig-cc and zig cc have known bugs. - - Donations don't "help" for `zig cc`, and I can't realistically implement - them. I tried with `zig ar`, a trivial front-end for llvm's ld, and failed. -- The monorepo-onboarding diff was simmering and waiting for it's time. +- I can't realistically implement the necessary changes or bug fixes. I tried + implementing `zig ar`, a trivial front-end for llvm's `ar`, and failed. - Once an issue had been identified as a Zig issue, getting attention from Zig developers was unpredictable. Some issues got resolved within days, some took - more than 6 months. + more than 6 months. Donations don't change `zig cc` priorities. +- The monorepo-onboarding diff was simmering and waiting for it's time. ### 2021 End: Uber needs a cross-compiler @@ -218,8 +222,8 @@ thing to manage is risk. As zig is a novel technology (not even 1.0!), it was truly unusual to suggest compiling all of our C and C++ code with it. We should be planning to stick with it for at least a decade. Questions were raised and evaluated with great care and scrutiny. For that I am truly grateful to the Go -Monorepo team, especially Ken Micklas, for doing the work and research on this -unproven prototype. +Monorepo team, especially [Ken Micklas][kmicklas], for doing the work and +research on this unproven prototype. ### Evaluation of different compilers @@ -234,22 +238,23 @@ Given that we now needed a cross-compiler, we had two candidates: - configurable glibc version. In grailbio case you would need a sysroot (basically, a chroot with the system libraries, so the programs can be linked against them), which needs to be maintained. -- a working, albeit still buggy, hermetic (cross-)compiler for OSX. +- a working, albeit still buggy, hermetic (cross-)compiler for macOS. Glibc we can handle in either case. However, `bazel-toolchain` will unlikely -ever have a way to compile to OSX, let alone cross-compile. Relying on the +ever have a way to compile to macOS, let alone cross-compile. Relying on the system compiler is undesirable on developer laptops, and Go Platform feels that -first-hand, especially during OSX upgrades. +first-hand, especially during macOS upgrades. -The prospect of a hermetic toolchain for OSX targets tripped the scales towards -`zig cc`, with all it's warts, risks and instability. +The prospect of a hermetic toolchain for macOS targets tripped the scales +towards `zig cc`, with all its warts, risks and instability. -There was another, attention problem: if we were considering to use zig in a -serious capacity, we knew we will hit problems, but unlikely have the expertise -to solve them. How can we, as a BigCorp, de-risk the engagement question, -making sure that bugs important to us are handled timely? We were sure of good -intentions of ZSF: it was obvious that, if we find and report a legitimate bug, -it would get fixed. But how can we put an upper bound on latency? +There was another, attention problem: if we were considering the use of Zig in +a serious capacity, we knew we will hit problems, but would be unlikely to have +the expertise to solve them. How can we, as a BigCorp, de-risk the engagement +question, making sure that bugs important to us are handled timely? We were +sure of good intentions of ZSF: it was obvious that, if we find and report a +legitimate bug, it would get fixed. But how can we put an upper bound on +latency? ### Money @@ -271,7 +276,7 @@ bystander. We did not ask for special rights, it's explicit in the contract, and we don't want that. The contract was signed, the wire transfer completed, and in 2022 January we -hpad: +had: - A service contract with ZSF that promised to prioritize issues that we've registered. @@ -287,8 +292,8 @@ hpad: ## 2022 and beyond In Feb 2022 the toolchain was gated behind a command-line flag -(`--config=hermetic-cc`). As of Feb 2022, you can invoke `zig cc` in Uber's go -monorepo without requiring a custom patch. +(`--config=hermetic-cc`). As of Feb 2022, you can invoke `zig cc` in Uber's Go +Monorepo without requiring a custom patch. {{