--- title: "Zig Reproduced Without Binaries" date: 2024-11-12T22:21:48+02:00 slug: zig-reproduced-without-binaries draft: true --- I decided to bootstrap zig without using binaries that are [checked in the repository](https://github.com/ziglang/zig/blob/0.13.0/stage1/zig1.wasm) and see if the resulting `zig1.wasm` in the latest zig release (0.13.0) is the same the one bootstrapped without those binaries. TLDR: `zig1.wasm` of the official 0.13.0 and our-hard-bootstrapped one are the same. Whoof, Zig is clean from [this famous attack vector][2], or at least there is nothing hiding in `zig1.wasm` that hasn't been in the checked-in sources: ``` $ sha256sum code/zig{,2}/stage1/zig1.wasm 127909fb8c9610ce3f296d8a48014546c0f85055115002fb3aba4d865dcdbb27 code/zig/stage1/zig1.wasm 127909fb8c9610ce3f296d8a48014546c0f85055115002fb3aba4d865dcdbb27 code/zig2/stage1/zig1.wasm ``` Many, many thanks to [Hilton Chain][1] for reasons I that will become clear later. # Official zig1.wasm Steps to acquire the official incarnation of `zig1.wasm` are straightforward: download zig, build `zig3` using the official instructions, use it to `update-zig1`: ``` git clone https://github.com/ziglang/zig; cd zig git checkout 0.13.0 mkdir build; pushd build cmake .. make -j install popd build/stage3/bin/zig build update-zig1 ``` Which results in an updated `code/zig/stage1/zig1.wasm`: ``` $ git diff --stat stage1/zig1.wasm | Bin 2675178 -> 2800926 bytes 1 file changed, 0 insertions(+), 0 deletions(-) ``` We will be comparing this `zig1.wasm` to the one bootstrapped in the next section. # Binary-free zig1.wasm Builting zig 0.13.0 without binaries is tricky, because to build zig 0.13.0, we need a `zig1.wasm`, which has been checked in and continuously updated since [late 2022](https://github.com/ziglang/zig/pull/13560): ``` commit 20d86d9c63476b6312b87dc5b0e4aa4822eb7717 Author: Andrew Kelley Date: 2022-11-13T01:35:20+02:00 add zig1.wasm.zst This commit adds a 637 KB binary file to the source repository. This commit does nothing else, so it should be replaced with a different commit before this branch is merged to avoid bloating the git repository. stage1/zig1.wasm.zst | Bin 0 -> 652012 bytes 1 file changed, 0 insertions(+), 0 deletions(-) ``` [Andrew's motivation][3] is legit from a Zig developer's perspective. However, checked-in binary blobs have trust issues, regardless of what we think about the author. The last commit that can[^1] be built without binary blobs is the parent of this one: ``` commit 28514476ef8c824c3d189d98f23d0f8d23e496ea Author: Andrew Kelley Date: 2022-11-01T05:29:55+02:00 remove `-fstage1` option After this commit, the self-hosted compiler does not offer the option to use stage1 as a backend anymore. ``` Once C++ implementation was removed, Zig is required to build Zig. This is a cyclic dependency, which Zig Core team breaks by continuously checking in *a* Zig implementation in wasm, the `zig1.wasm` file, which is used to build the compiler. Andrew suggests a motivated third-party to implement a [Zig interpreter][zig-interpreter] in non-zig that could break this chain. While that would be certainly be ideal, but nobody has built it yet 🤷. The steps to build "trusted"[^3] zig are roughly: 1. Build zig from the C++ implementation of the commit above (with hacks and tricks to make it [actually compile][4]). 2. Use previous step to build the first Zig self-hosted. 3. Proceed to the next step. When the updated zig does not build, find creative ways to build it anyway (or, when really stuck, ask @mlugg). 4. Goto 2 for [45+ times][5]. After reaching `0.11.0-1894-gb92e30ff0b`, which is two `zig1.wasm` updates away from 0.12.0, I received an email from Hilton Chain, titled `Thank you for the work on bootstrapping Zig!`, where they took my PoC, [re-created all of it in Guix DSL][6] and ran all the way to 0.13.0[^2]. This made me flabbergasted. I audited their script to see if it really deletes `zig1.wasm` at every checkout, ran it to produce `zig1.wasm` of `0.13.0` myself. Once I had `zig1.wasm` of 0.13.0, I did the same as I did in the official `zig1.wasm`: built zig3, used it to build `zig1.wasm`, and voilà, the hashes of the official `zig1.wasm` and the one built by myself and Hilton match. I am looking forward to Hilton landing his Zig work to Guix, so anyone can audit the build script and reproduce this exercise by themselves with an otherwise [bootstrappable][7] system. If anyone can trace origins of `zig1.wasm` and produce an identical version themselves, perhaps it's not too bad to have it checked in? [^1]: Not exactly. Some reverts and code movement is necessary. See the [`run` script][5] for details. [^2]: Their work is on a branch in Guix repository, which has `zig` in the title. I will not link it here, as it will be removed when it lands, but it should be easy to find for determined readers before it does. [^3]: We trust no-one except ourselves. [1]: https://ultrarare.space/ [2]: https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_ReflectionsonTrustingTrust.pdf [3]: https://ziglang.org/news/goodbye-cpp/ [4]: https://ziggit.dev/t/building-self-hosted-from-the-original-c-implementation/6607?u=motiejus [5]: https://git.jakstys.lt/motiejus/zig-repro/src/commit/7f37da6e75cab9d4637b8173d713f91853c9ef54/run#L1032-L1076 [6]: https://issues.guix.gnu.org/74217 [7]: https://guix.gnu.org/en/blog/2023/the-full-source-bootstrap-building-from-source-all-the-way-down/ [zig-interpreter]: https://ziggit.dev/t/building-self-hosted-from-the-original-c-implementation/6607/2?u=motiejus