update agent instructions

This commit is contained in:
Motiejus Jakštys
2026-02-25 08:25:41 +00:00
parent e976f0eb6b
commit 2ddccd2c7e

View File

@@ -2,17 +2,64 @@
zig0 aspires to be an interpreter of zig 0.15.2 written in C.
This is written with help from LLM:
Except for the lexer (written by hand by yours truly), it's been written by an
LLM.
- Lexer:
- Datastructures 100% human.
- Helper functions 100% human.
- Lexing functions 50/50 human/bot.
- Parser:
- Datastructures 100% human.
- Helper functions 50/50.
- Parser functions 5/95 human/bot.
- AstGen: TBD.
The goal of stage0 is to be able to implement enough zig to be able to build
`zig1.wasm`. For that we need:
1. Lexer: DONE, written by hand by yours truly in late 2024.
2. Parser: DONE, written mostly by an LLM.
3. AstGen: DONE, written fully by an LLM.
3. Sema: in progress.
# Sema porting approach
Goal: make `corpus_test.zig` skip over less tests. Rules:
1. We have extensive AIR comparator: we generate AIR from the upstream Zig
compiler and compare it byte-by-byte (`Exceptions` below) to the C
implementation.
2. Run Red/Green TDD. The first step of Red/Green TDD is expanding the test
suite by bumping the `num_passing` in `stage0/stages.zig`.
3. Once test fails, we need to port enough code mechanically from Zig to C to
make it pass. Ground rules:
- Function names should match (except for `sema` prefix when appropriate).
- Function control flow should match.
- Data structures should be the same, C <-> Zig interop permitting. I.e.
struct definitions in `stage0/sema.h` should be, language permitting, the
same as in `src/Sema.zig`.
4. Sometimes the changes to enable a single stage can be quite complex and
can't be enabled in one go. Then we split it into smaller tractable problems,
by adding tests to `sema_test.zig`.
5. Once progress is made (e.g. more AIR matches between C and Zig), clean up
and commit.
Once a new test case has been enabled and passes, we enable the _next_ test
case by bumping `num_passing` and repeat the process.
## Exceptions
C and Zig AIR must match byte-by-byte except:
1. If floats don't round-trip through f64, we allow some imprecision. See
`astgen.c` and `Float Handling` (later in the README) to understand what to
do & why.
2. Padding: Zig compiler leaves `undefined` bytes in some places where they are
never read (e.g. in case of shorter tags). In C we zero them out. Since
those bytes are `undefined` and never read, they can differ.
## Cleaning Up
1. Disable/skip failing tests (most likely the one that was enabled before).
2. Remove or comment out all printf statements.
3. Run `Quick test` (from below), ensure it passes and there is no extraneous
output.3. Run `More elaborate` test function (below) , ensure it passes and there is
no extraneous output.
If a test fails, perhaps it's a regression? We don't want to commit _less_ than
we started with. Go back and analyze. If it's not a regression, you did a poor
job in 1. If it's not a test failure, but a formatting/linting issue, fix it.
# Testing
@@ -51,7 +98,7 @@ gdb -batch \
You are welcome to replace `-ex "bt full"` with anything other of interest.
# Float handling
# Float Handling
Float literals are parsed with `strtold()` (C11 standard, portable). On
x86-64 Linux, `long double` is 80-bit extended precision (63 fraction bits).