diff --git a/content/log/2024/zig-reproduced-without-binaries.md b/content/log/2024/zig-reproduced-without-binaries.md new file mode 100644 index 0000000..5782726 --- /dev/null +++ b/content/log/2024/zig-reproduced-without-binaries.md @@ -0,0 +1,147 @@ +--- +title: "Zig Reproduced Without Binaries" +date: 2024-11-12T22:21:48+02:00 +slug: zig-reproduced-without-binaries +draft: true +--- + +I decided to bootstrap zig without using binaries that are [checked in the +repository](https://github.com/ziglang/zig/blob/0.13.0/stage1/zig1.wasm) and +see if the resulting `zig1.wasm` in the latest zig release (0.13.0) is the same +the one bootstrapped without those binaries. + +TLDR: `zig1.wasm` of the official 0.13.0 and our-hard-bootstrapped one are the +same. Whoof, Zig is clean from [this famous attack vector][2], or at least +there is nothing hiding in `zig1.wasm` that hasn't been in the checked-in +sources: + +``` +$ sha256sum code/zig{,2}/stage1/zig1.wasm +127909fb8c9610ce3f296d8a48014546c0f85055115002fb3aba4d865dcdbb27 code/zig/stage1/zig1.wasm +127909fb8c9610ce3f296d8a48014546c0f85055115002fb3aba4d865dcdbb27 code/zig2/stage1/zig1.wasm +``` + +Many, many thanks to [Hilton Chain][1] for reasons I that will become clear +later. + +# Official zig1.wasm + +Steps to acquire the official incarnation of `zig1.wasm` are straightforward: +download zig, build `zig3` using the official instructions, use it to +`update-zig1`: + +``` +git clone https://github.com/ziglang/zig; cd zig +git checkout 0.13.0 +mkdir build; pushd build + cmake .. + make -j install +popd +build/stage3/bin/zig build update-zig1 +``` + +Which results in an updated `code/zig/stage1/zig1.wasm`: + +``` +$ git diff --stat + stage1/zig1.wasm | Bin 2675178 -> 2800926 bytes + 1 file changed, 0 insertions(+), 0 deletions(-) +``` + +We will be comparing this `zig1.wasm` to the one bootstrapped in the next +section. + +# Binary-free zig1.wasm + +Builting zig 0.13.0 without binaries is tricky, because to build zig 0.13.0, we +need a `zig1.wasm`, which has been checked in and continuously updated since +[late 2022](https://github.com/ziglang/zig/pull/13560): + +``` +commit 20d86d9c63476b6312b87dc5b0e4aa4822eb7717 +Author: Andrew Kelley +Date: 2022-11-13T01:35:20+02:00 + + add zig1.wasm.zst + + This commit adds a 637 KB binary file to the source repository. This + commit does nothing else, so it should be replaced with a different + commit before this branch is merged to avoid bloating the git + repository. + + stage1/zig1.wasm.zst | Bin 0 -> 652012 bytes + 1 file changed, 0 insertions(+), 0 deletions(-) +``` + +[Andrew's motivation][3] is legit from a Zig developer's perspective. However, +checked-in binary blobs have trust issues, regardless of what we think about +the author. + +The last commit that can[^1] be built without binary blobs is the parent of +this one: + +``` +commit 28514476ef8c824c3d189d98f23d0f8d23e496ea +Author: Andrew Kelley +Date: 2022-11-01T05:29:55+02:00 + + remove `-fstage1` option + + After this commit, the self-hosted compiler does not offer the option to + use stage1 as a backend anymore. +``` + +Once C++ implementation was removed, Zig is required to build Zig. This is a +cyclic dependency, which Zig Core team breaks by continuously checking in *a* +Zig implementation in wasm, the `zig1.wasm` file, which is used to build the +compiler. + +Andrew suggests a motivated third-party to implement a [Zig +interpreter][zig-interpreter] in non-zig that could break this chain. While +that would be certainly be ideal, but nobody has built it yet 🤷. + +The steps to build "trusted"[^3] zig are roughly: + +1. Build zig from the C++ implementation of the commit above (with hacks and + tricks to make it [actually compile][4]). +2. Use previous step to build the first Zig self-hosted. +3. Proceed to the next step. When the updated zig does not build, find creative + ways to build it anyway (or, when really stuck, ask @mlugg). +4. Goto 2 for [45+ times][5]. + +After reaching `0.11.0-1894-gb92e30ff0b`, which is two `zig1.wasm` updates away +from 0.12.0, I received an email from Hilton Chain, titled `Thank you for the +work on bootstrapping Zig!`, where they took my PoC, [re-created all of it in +Guix DSL][6] and ran all the way to 0.13.0[^2]. This made me flabbergasted. + +I audited their script to see if it really deletes `zig1.wasm` at every +checkout, ran it to produce `zig1.wasm` of `0.13.0` myself. Once I had +`zig1.wasm` of 0.13.0, I did the same as I did in the official `zig1.wasm`: +built zig3, used it to build `zig1.wasm`, and voilà, the hashes of the official +`zig1.wasm` and the one built by myself and Hilton match. + +I am looking forward to Hilton landing his Zig work to Guix, so anyone can +audit the build script and reproduce this exercise by themselves with an +otherwise [bootstrappable][7] system. + +If anyone can trace origins of `zig1.wasm` and produce an identical version +themselves, perhaps it's not too bad to have it checked in? + +[^1]: Not exactly. Some reverts and code movement is necessary. See the [`run` + script][5] for details. + +[^2]: Their work is on a branch in Guix repository, which has `zig` in the + title. I will not link it here, as it will be removed when it lands, but it + should be easy to find for determined readers before it does. + +[^3]: We trust no-one except ourselves. + +[1]: https://ultrarare.space/ +[2]: https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_ReflectionsonTrustingTrust.pdf +[3]: https://ziglang.org/news/goodbye-cpp/ +[4]: https://ziggit.dev/t/building-self-hosted-from-the-original-c-implementation/6607?u=motiejus +[5]: https://git.jakstys.lt/motiejus/zig-repro/src/commit/7f37da6e75cab9d4637b8173d713f91853c9ef54/run#L1032-L1076 +[6]: https://issues.guix.gnu.org/74217 +[7]: https://guix.gnu.org/en/blog/2023/the-full-source-bootstrap-building-from-source-all-the-way-down/ + +[zig-interpreter]: https://ziggit.dev/t/building-self-hosted-from-the-original-c-implementation/6607/2?u=motiejus