update agent instructions

2026-02-25 08:25:41 +00:00
parent e976f0eb6b
commit 2ddccd2c7e
1 changed files with 58 additions and 11 deletions
--- a/stage0/README.md
+++ b/stage0/README.md
@@ -2,17 +2,64 @@

 zig0 aspires to be an interpreter of zig 0.15.2 written in C.

-This is written with help from LLM:
+Except for the lexer (written by hand by yours truly), it's been written by an
+LLM.

- Lexer:
-  - Datastructures 100% human.
-  - Helper functions 100% human.
-  - Lexing functions 50/50 human/bot.
- Parser:
-  - Datastructures 100% human.
-  - Helper functions 50/50.
-  - Parser functions 5/95 human/bot.
- AstGen: TBD.
+The goal of stage0 is to be able to implement enough zig to be able to build
+`zig1.wasm`. For that we need:
+
+1. Lexer: DONE, written by hand by yours truly in late 2024.
+2. Parser: DONE, written mostly by an LLM.
+3. AstGen: DONE, written fully by an LLM.
+3. Sema: in progress.
+
+# Sema porting approach
+
+Goal: make `corpus_test.zig` skip over less tests. Rules:
+
+1. We have extensive AIR comparator: we generate AIR from the upstream Zig
+   compiler and compare it byte-by-byte (`Exceptions` below) to the C
+   implementation.
+2. Run Red/Green TDD. The first step of Red/Green TDD is expanding the test
+   suite by bumping the `num_passing` in `stage0/stages.zig`.
+3. Once test fails, we need to port enough code mechanically from Zig to C to
+   make it pass. Ground rules:
+    - Function names should match (except for `sema` prefix when appropriate).
+    - Function control flow should match.
+    - Data structures should be the same, C <-> Zig interop permitting. I.e.
+      struct definitions in `stage0/sema.h` should be, language permitting, the
+      same as in `src/Sema.zig`.
+4. Sometimes the changes to enable a single stage can be quite complex and
+can't be enabled in one go. Then we split it into smaller tractable problems,
+by adding tests to `sema_test.zig`.
+5. Once progress is made (e.g. more AIR matches between C and Zig), clean up
+and commit.
+
+Once a new test case has been enabled and passes, we enable the _next_ test
+case by bumping `num_passing` and repeat the process.
+
+## Exceptions
+
+C and Zig AIR must match byte-by-byte except:
+
+1. If floats don't round-trip through f64, we allow some imprecision. See
+   `astgen.c` and `Float Handling` (later in the README) to understand what to
+   do & why.
+2. Padding: Zig compiler leaves `undefined` bytes in some places where they are
+   never read (e.g. in case of shorter tags). In C we zero them out. Since
+   those bytes are `undefined` and never read, they can differ.
+
+## Cleaning Up
+
+1. Disable/skip failing tests (most likely the one that was enabled before).
+2. Remove or comment out all printf statements.
+3. Run `Quick test` (from below), ensure it passes and there is no extraneous
+   output.3. Run `More elaborate` test function (below) , ensure it passes and there is
+   no extraneous output.
+
+If a test fails, perhaps it's a regression? We don't want to commit _less_ than
+we started with. Go back and analyze. If it's not a regression, you did a poor
+job in 1. If it's not a test failure, but a formatting/linting issue, fix it.

 # Testing

@@ -51,7 +98,7 @@ gdb -batch \

 You are welcome to replace `-ex "bt full"` with anything other of interest.

-# Float handling
+# Float Handling

 Float literals are parsed with `strtold()` (C11 standard, portable). On
 x86-64 Linux, `long double` is 80-bit extended precision (63 fraction bits).