1
Fork 0

abg's comments

main
Motiejus Jakštys 2022-05-21 10:44:22 +03:00
parent 304c635b27
commit 80580a9968
1 changed files with 55 additions and 44 deletions

View File

@ -26,14 +26,15 @@ transcript, with some commentary and errata.
TLDR:
* Uber uses zig to compile it's C/C++ code. Now only in [Go
Monorepo][go-monorepo] via [bazel-zig-cc][bazel-zig-cc], with inconcrete
ideas to expand use of `zig cc` to other monorepos.
* Uber does not have any plans to use zig-the-language.
* Uber uses zig to compile its C/C++ code. Now only in the [Go
Monorepo][go-monorepo] via [bazel-zig-cc][bazel-zig-cc], with plans to
possibly expand use of `zig cc` to other languages that need a C/C++
toolchain.
* Uber does not have any plans to use zig-the-language yet.
* Uber signed a support agreement with Zig Software Foundation (ZSF) to
prioritize bug fixes. The contract value is disclosed in the ZSF financial
reports.
* Thanks my team, the Go Monorepo team, the Go Platform team, my director,
* Thanks to my team, the Go Monorepo team, the Go Platform team, my director,
finance, legal, and of course Zig Software Foundation for making this
relationship happen. The relationship has been fruitful so far.
@ -47,8 +48,8 @@ languages are Go and Java, with Python and Node allowed for specific use cases
(like front-end for Node and Python for data analysis/ML). Use of other
languages in back-end code is minimal.
Our go monorepo is larger than Linux kernel[^1], and worked on by a couple of
thousand engineers. To sum up, it is size-able.
Our Go Monorepo is larger than Linux kernel[^1], and worked on by a couple of
thousand engineers. In short, it's big.
## How does Uber use Zig?
@ -77,10 +78,10 @@ wave --- I still remember the complexity.
### 2019: asks for a hermetic toolchain
At the time, the Go monorepo already used a hermetic Go toolchain. That means
it would download the Go SDK as part of the build process. Therefore, on
whichever environment a Go build was running, it always used the same version
of Go.
At the time, the Go monorepo already used a hermetic Go toolchain. Therefore,
the Go compiler used to build the monorepo was unaffected by the compiler
installed on the system, if any. Therefore, on whichever environment a Go build
was running, it always used the same version of Go.
{{<img src="_/2022/uber-zig-gm-221.png"
alt="A Jira task asking for a hermetic C++ toolchain."
@ -88,21 +89,21 @@ of Go.
hint="graph"
>}}
C++ toolchain is a collection of programs to compile C/C++ code. Our Go code
uses quite a bit of [CGo][cgo], so it needs a C/C++ compiler. Go then links the
Go and C parts to the final executable.
A C++ toolchain is a collection of programs to compile C/C++ code. It is
unavoidable for some our Go code to use [CGo][cgo], so it needs a C/C++
compiler. Go then links the Go and C parts to the final executable.
The C++ toolchain was not hermetic since the start of Go monorepo: Bazel would
use whatever it found on the system. That meant clang on MacOS, gcc (whatever
version) on Linux. Setting up C++ toolchain in Bazel is a lot of work (think
person-months for our monorepo), there was no immediate need, and it also was
not painful *enough* to be picked up.
use whatever it found on the system. That meant clang on macOS, gcc (whatever
version) on Linux. Setting up a hermetic C++ toolchain in Bazel is a lot of
work (think person-months for our monorepo), there was no immediate need, and
it also was not painful *enough* to be picked up.
At this point it is important to understand the limitations of a non-hermetic
C++ toolchain:
- Cannot cross-compile. So we can't compile Linux executables on a Mac if they
have CGo (which is most of our service code). This was worked around by...
not cross-compiling.
have CGo (which many of our services do). This was worked around by... not
cross-compiling.
- CGo executables would link to a glibc version that was found on the system.
That means: when upgrading the OS (multi-month effort), the build fleet must
be upgraded last. Otherwise, if build host runs a newer glibc than a
@ -111,15 +112,18 @@ C++ toolchain:
- We couldn't use new compilers, which have better optimizations, because we
were running an older OS on the build fleet (backporting only the compiler,
but not glibc, carries it's own risks).
- Official binaries for newer versions of Go are built against a more recent
version of GCC than some of our build machines. We had to work around this by
compiling Go from source on these machines.
All of these issues were annoying, but not enough to invest into the toolchain.
### 2020 Dec: need musl
I was working on a toy project that is built with Bazel and uses CGo. I wanted
my binary to be static, but Bazel is not easily offering that. I spent a couple
of evenings creating a Bazel toolchain on top of [musl.cc](https://musl.cc),
but didn't go far, because at the time I wasn't able to make sense out of the
my binary to be static, but Bazel does not make that easy. I spent a couple of
evenings creating a Bazel toolchain on top of [musl.cc](https://musl.cc), but
didn't go far, because at the time I wasn't able to make sense out of the
Bazel's toolchain documentation, and I didn't find a good example to rely on.
### 2021 Jan: discovering `zig cc`
@ -193,12 +197,12 @@ dependency on system libraries and undoing of a lot of tech debt.
- Various places at Uber would benefit from a hermetic C++ cross-compiler, but
it's not funded due to a large investment and not enough justification.
- bazel-zig-cc kinda works, but both bazel-zig-cc and zig cc have known bugs.
- Donations don't "help" for `zig cc`, and I can't realistically implement
them. I tried with `zig ar`, a trivial front-end for llvm's ld, and failed.
- The monorepo-onboarding diff was simmering and waiting for it's time.
- I can't realistically implement the necessary changes or bug fixes. I tried
implementing `zig ar`, a trivial front-end for llvm's `ar`, and failed.
- Once an issue had been identified as a Zig issue, getting attention from Zig
developers was unpredictable. Some issues got resolved within days, some took
more than 6 months.
more than 6 months. Donations don't change `zig cc` priorities.
- The monorepo-onboarding diff was simmering and waiting for it's time.
### 2021 End: Uber needs a cross-compiler
@ -218,8 +222,8 @@ thing to manage is risk. As zig is a novel technology (not even 1.0!), it was
truly unusual to suggest compiling all of our C and C++ code with it. We should
be planning to stick with it for at least a decade. Questions were raised and
evaluated with great care and scrutiny. For that I am truly grateful to the Go
Monorepo team, especially Ken Micklas, for doing the work and research on this
unproven prototype.
Monorepo team, especially [Ken Micklas][kmicklas], for doing the work and
research on this unproven prototype.
### Evaluation of different compilers
@ -234,22 +238,23 @@ Given that we now needed a cross-compiler, we had two candidates:
- configurable glibc version. In grailbio case you would need a sysroot
(basically, a chroot with the system libraries, so the programs can be linked
against them), which needs to be maintained.
- a working, albeit still buggy, hermetic (cross-)compiler for OSX.
- a working, albeit still buggy, hermetic (cross-)compiler for macOS.
Glibc we can handle in either case. However, `bazel-toolchain` will unlikely
ever have a way to compile to OSX, let alone cross-compile. Relying on the
ever have a way to compile to macOS, let alone cross-compile. Relying on the
system compiler is undesirable on developer laptops, and Go Platform feels that
first-hand, especially during OSX upgrades.
first-hand, especially during macOS upgrades.
The prospect of a hermetic toolchain for OSX targets tripped the scales towards
`zig cc`, with all it's warts, risks and instability.
The prospect of a hermetic toolchain for macOS targets tripped the scales
towards `zig cc`, with all its warts, risks and instability.
There was another, attention problem: if we were considering to use zig in a
serious capacity, we knew we will hit problems, but unlikely have the expertise
to solve them. How can we, as a BigCorp, de-risk the engagement question,
making sure that bugs important to us are handled timely? We were sure of good
intentions of ZSF: it was obvious that, if we find and report a legitimate bug,
it would get fixed. But how can we put an upper bound on latency?
There was another, attention problem: if we were considering the use of Zig in
a serious capacity, we knew we will hit problems, but would be unlikely to have
the expertise to solve them. How can we, as a BigCorp, de-risk the engagement
question, making sure that bugs important to us are handled timely? We were
sure of good intentions of ZSF: it was obvious that, if we find and report a
legitimate bug, it would get fixed. But how can we put an upper bound on
latency?
### Money
@ -271,7 +276,7 @@ bystander. We did not ask for special rights, it's explicit in the contract,
and we don't want that.
The contract was signed, the wire transfer completed, and in 2022 January we
hpad:
had:
- A service contract with ZSF that promised to prioritize issues that we've
registered.
@ -287,8 +292,8 @@ hpad:
## 2022 and beyond
In Feb 2022 the toolchain was gated behind a command-line flag
(`--config=hermetic-cc`). As of Feb 2022, you can invoke `zig cc` in Uber's go
monorepo without requiring a custom patch.
(`--config=hermetic-cc`). As of Feb 2022, you can invoke `zig cc` in Uber's Go
Monorepo without requiring a custom patch.
{{<img src="_/2022/uber-zig-landed.png"
alt="WIP DIFF onboarding the monorepo was landed"
@ -317,7 +322,7 @@ zig-cc could have failed due to many many reasons.
Looking back, I think the most important reasons for success is a killer
feature at the right time. In our case, there were two: glibc version selection
without a sysroot and cross-compiling to OSX.
without a sysroot and cross-compiling to macOS.
## Appendix
@ -333,6 +338,11 @@ If compilers or adopting software for other CPU architectures (and/or living in
the Eastern Europe) is your thing, my team in Vilnius is hiring. Also, my
sister teams in Seattle and Bay Area are hiring too. Ping me.
Credits
-------
Many thanks Abhinav Gupta for reading drafts of this.
[^1]: Errata: I incorrectly said "by an order of magnitude". The order of
magnitude is the same.
[^2]: Errata: I said Go was the first monorepo. Go was 4'th.
@ -352,3 +362,4 @@ sister teams in Seattle and Bay Area are hiring too. Ping me.
[grailbio/bazel-toolchain]: https://github.com/grailbio/bazel-toolchain
[milan-youtube]: https://www.youtube.com/watch?v=SCj2J3HcEfc
[zig-motiejus-issues]: https://github.com/ziglang/zig/issues?q=author%3Amotiejus+sort%3Acreated-asc
[kmicklas]: https://github.com/kmicklas