jakstys.lt/content/log/2024/zig-reproduced-without-binaries.md

6.1 KiB

title date slug
Zig Reproduced Without Binaries 2024-11-12T22:21:48+02:00 zig-reproduced-without-binaries

I decided to bootstrap Zig without using binaries that are checked in the repository and answer if the resulting zig1.wasm in the latest Zig release (0.13.0) is the same the one bootstrapped without using those binaries.

TLDR: yes, they are the same:

$ sha256sum code/zig{,2}/stage1/zig1.wasm
127909fb8c9610ce3f296d8a48014546c0f85055115002fb3aba4d865dcdbb27  code/zig/stage1/zig1.wasm
127909fb8c9610ce3f296d8a48014546c0f85055115002fb3aba4d865dcdbb27  code/zig2/stage1/zig1.wasm

I can now confidently say (and you can also check, you don't need to trust me) that there is nothing hiding in zig1.wasm that hasn't been checked-in as a source file.

Many, many thanks to Hilton Chain for reasons I that will become clear later. The rest of this post walks through how I arrived to this claim.

Official zig1.wasm

Steps to acquire the official incarnation of zig1.wasm are straightforward: download Zig, build zig3 using the official instructions, use it to update-zig1:

git clone https://github.com/ziglang/zig; cd zig
git checkout 0.13.0
mkdir build; pushd build
  cmake ..
  make -j$(nproc) install
popd
build/stage3/bin/zig build update-zig1

Which results in an updated code/zig/stage1/zig1.wasm:

$ git diff --stat
 stage1/zig1.wasm | Bin 2675178 -> 2800926 bytes
 1 file changed, 0 insertions(+), 0 deletions(-)

We will be comparing this file to the one bootstrapped in the next section.

Binary-free zig1.wasm

Building Zig 0.13.0 without binaries is tricky, because to build Zig 0.13.0, we need a zig1.wasm, which has been checked in and continuously updated since late 2022:

commit 20d86d9c63476b6312b87dc5b0e4aa4822eb7717
Author: Andrew Kelley <andrew@ziglang.org>
Date:   2022-11-13T01:35:20+02:00

    add zig1.wasm.zst
    
    This commit adds a 637 KB binary file to the source repository. This
    commit does nothing else, so it should be replaced with a different
    commit before this branch is merged to avoid bloating the git
    repository.

 stage1/zig1.wasm.zst | Bin 0 -> 652012 bytes
 1 file changed, 0 insertions(+), 0 deletions(-)

Andrew's motivation is reasonable from a Zig developer's perspective. However, checked-in binary blobs have trust issues, regardless of what we think about the author.

The last commit that can1 be built without using binary blobs is the parent of this one:

commit 28514476ef8c824c3d189d98f23d0f8d23e496ea
Author: Andrew Kelley <andrew@ziglang.org>
Date:   2022-11-01T05:29:55+02:00

    remove `-fstage1` option

    After this commit, the self-hosted compiler does not offer the option to
    use stage1 as a backend anymore.

After this, Zig is required to build Zig. This is a cyclic dependency, which Zig Core team breaks by continuously checking in a Zig compiler in wasm, the zig1.wasm file, which is used to build the compiler.

Andrew suggests a motivated third-party to implement a Zig interpreter in non-Zig that could break this chain. While that would be certainly be ideal, nobody has built it yet 🤷.

The steps to build "trusted"2 Zig are roughly:

  1. Build Zig from the C++ implementation of the commit above (with hacks and tricks to make it actually compile).
  2. Use previous step to build the first Zig self-hosted.
  3. Proceed to the next step. When the updated Zig does not build, find creative ways to build it anyway (or, when really stuck, ask @mlugg).
  4. Goto 2 for 45+ times.

After reaching 0.11.0-1894-gb92e30ff0b, which is two zig1.wasm updates away from 0.12.0, I received an email from Hilton Chain, titled Thank you for the work on bootstrapping Zig!, where they took my PoC, re-created all of it in Guix DSL and ran all the way to 0.13.03. This made me flabbergasted.

I audited their script to see if it really deletes zig1.wasm at every checkout, ran it to produce zig1.wasm of 0.13.0 myself:

$ ./pre-inst-env guix build zig@0.13
< ... a few hours ... >
/gnu/store/mz95707dd7qmycpr1f0ndxhkmx3vdy1c-zig-0.13.0
/gnu/store/kqwq8sjgwi561sp78vfi6xkgm9i3wysk-zig-0.13.0-zig1
$ ls -l /gnu/store/kqwq8sjgwi561sp78vfi6xkgm9i3wysk-zig-0.13.0-zig1/bin/zig1.wasm 
-r--r--r-- 5 root root 2661492 Jan  1  1970 /gnu/store/kqwq8sjgwi561sp78vfi6xkgm9i3wysk-zig-0.13.0-zig1/bin/zig1.wasm

Once I had zig1.wasm of 0.13.0, I did the same as I did in the official zig1.wasm: built zig3, used it to build zig1.wasm, and voilà, the hashes of the official zig1.wasm and the one built here match.

Conclusions and open questions

I am looking forward to Hilton landing this to Guix, so anyone can audit the build script and reproduce this exercise by themselves with an otherwise bootstrappable system. If you don't trust Guix, what and whom do you trust?

If anyone can trace origins of zig1.wasm by producing an identical version themselves, perhaps it's not too bad to trust it and have it checked in?


  1. Not exactly. Some reverts and code movement is necessary. See the run script for details. ↩︎

  2. We trust no-one except ourselves and our little machine on our desk. ↩︎

  3. Their work is on a branch in Guix repository, which has zig in the title. I will not link it here, as it will be removed when it lands, but it should be easy to find for determined readers before it does. ↩︎