stage0: rewrite CLAUDE.md to lead with porting methodology

Restructure from warnings-first ("NEVER do X") to methodology-first
("port the function, not its output"). The core principle is now front
and center with a concrete 4-step method (find, read, translate, test).

Removes the 118-line "Closing the InternPool gap" section (status moved
to MEMORY.md), the anti-pattern section, and scattered duplications.
Consolidates mechanical copy rules into the methodology. 243→160 lines.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-03-01 17:07:21 +00:00
parent f2288a8e4b
commit f9a061e5e1

View File

@@ -11,11 +11,66 @@ corpus tests pass.
counter. Sema unit tests (stage0/sema_tests/) come first, then lib/ files.
- `stage0/sema_tests/` — focused unit tests for decomposing complex problems.
These are part of the main corpus and go through the full pipeline
(parser ZIR sema).
(parser -> ZIR -> sema).
- `stage0/stages_test.zig` — runs all stages for corpus files with
**bidirectional** AIR comparison (every C function must match Zig, AND
every Zig function must exist in C output).
## How to port
**Port the function, not its output.**
Every feature in `stage0/sema.c` must be a mechanical translation of an
upstream Zig function. When a test fails, your job is to find which upstream
function produces the missing behavior and translate it to C. Never reverse-
engineer the output and recreate it directly.
### The method
1. **Find the upstream function.** Read the test failure. Trace it back to
the Zig source (`src/Sema.zig`, `src/Zcu/PerThread.zig`). If the mismatch
shows a gap like `a=0xNN b=0xMM`, find which upstream function creates
entries a-through-b.
2. **Read it.** Understand the function's control flow. Note what it calls.
3. **Translate mechanically.** Port to C following these rules:
- Function names match (prefix `sema` when appropriate).
- Control flow matches.
- Data structures match (C <-> Zig interop permitting). Struct definitions
in `stage0/sema.h` should mirror `src/Sema.zig`.
- Add functions in the same order as the original Zig file.
- Entry order matters — the C sema must create entries in the same
sequence as the Zig compiler.
- Deduplication matters — `ipIntern` must return existing indices, not
create duplicates.
4. **Test.** `zig build test-zig0` — iterate until the test passes.
### Allowed hardcoding
**callconv = .c and wasm32-wasi target are OK** — this is a bootstrap
interpreter targeting only that platform. Don't make these configurable.
Everything else must be computed by porting the upstream functions that
produce the values. Do not hand-code enum tags, field indices, type indices,
or any values that come from running upstream logic.
### When you see a gap
When a test mismatch shows the C sema has fewer InternPool entries than
the Zig compiler: **find which upstream function creates those entries and
port it.** Do not create entries directly, resolve types by name, enumerate
namespaces, or force type evaluation to fill the gap. Every entry must be
a side effect of running the correctly-ported upstream function.
Key upstream functions that create IP entries:
- `createFileRootStruct` -> `type_struct` entry for a module's root
- `scanNamespace` -> `ptr_nav` entries for declarations
- `getStructType` / `getEnumType` -> type entries
- `ensureFileAnalyzed` -> recursive module processing
- `zirExport` -> forces resolution of exported declarations
## The loop
Repeat until all corpus tests in `stage0/corpus.zig` pass:
@@ -23,18 +78,11 @@ Repeat until all corpus tests in `stage0/corpus.zig` pass:
**Do NOT stop between iterations.** Each commit is a checkpoint, not a
stopping point. Continue looping until all tests pass. When unsure how
to proceed, the answer is always: follow what the upstream Zig compiler
does (`src/Sema.zig`, `src/Zcu/PerThread.zig`). Never stop to ask.
does. Never stop to ask.
1. **Bump.** Increment `num_passing` by 1 in `stage0/corpus.zig`.
2. **Run.** `zig build test-zig0` — observe the failure.
3. **Port.** Mechanically copy the needed logic from `src/Sema.zig` into
`stage0/sema.c` / `stage0/sema.h`. Ground rules:
- Function names should match (except for `sema` prefix when appropriate).
- Function control flow should match.
- Data structures should be the same, C <-> Zig interop permitting. I.e.
struct definitions in `stage0/sema.h` should be, language permitting, the
same as in `src/Sema.zig`.
- Add functions in the same order as in the original Zig file.
3. **Port.** Follow [How to port](#how-to-port) above.
4. **Test.** `zig build test-zig0` — iterate until the new test passes.
5. **Clean up & commit.** See [Cleaning Up](#cleaning-up).
6. **Go to 1.** Do NOT stop between iterations — keep looping until
@@ -43,8 +91,7 @@ does (`src/Sema.zig`, `src/Zcu/PerThread.zig`). Never stop to ask.
## When stuck
If a single corpus test requires too many new functions, causes unclear
failures, or needs multiple unrelated features at once — do NOT give up or
take shortcuts. Instead, decompose:
failures, or needs multiple unrelated features at once — decompose:
1. Create a focused test file in `stage0/sema_tests/` that isolates one piece
of the problem.
@@ -53,124 +100,8 @@ take shortcuts. Instead, decompose:
4. Repeat until enough pieces are in place for the corpus test to pass.
5. Return to [The loop](#the-loop).
## Closing the InternPool gap
### Current status (2026-03-01)
**num_passing = 5.** The first 5 sema unit tests pass (empty.zig,
const_decl.zig, empty_void_function.zig, type_identity_fn.zig,
reify_int.zig). reify_int.zig was fixed by handling `decl_val` in
2-instruction parameter type bodies.
**Next blocker: return_integer.zig** (index 6). Uses `return 42` which
interns a non-pre-interned value (42:u32). The IP index of this value
depends on the module-level entry count. The Zig compiler (even for
standalone files with `std_mod=null`) always creates std from `lib_dir`,
producing `func_ip=217` (215 base + 2 function body entries). The C sema
creates only ~135 entries. The gap of ~80 entries comes from the @export
trigger chain: `zirExport``resolveExportOptions``getBuiltinType`
`analyzeMemoizedState`, which resolves builtin types from
`std.builtin` as side effects. Port this chain to close the gap.
### Background
- Nav entries (declarations) are stored in a **separate** list from IP
Items — they do NOT consume IP indices. But `ptr_nav` entries (pointers
TO Navs) DO consume IP indices.
- The Zig compiler creates IP entries through `createFileRootStruct`
(in `src/Zcu/PerThread.zig`) and `scanNamespace`. These must be
ported to C.
- The IP items count at function analysis time is embedded in the
`.air` binary format as `func_ip`. When a test mismatch occurs, it
is displayed as `[zig_ip_base=N]` in the error output.
### The module-system porting loop
1. **Read `func_ip` from mismatch output.** Run `zig build test-zig0`
with `num_passing` bumped. The mismatch message includes
`[zig_ip_base=N]` — this is the Zig compiler's IP items count at
function analysis time. The gap between `a` and `b` IP refs tells
you how many entries the C sema is missing.
2. **Compare.** Note the mismatch: `a=0x???[ip] b=0x???[ip]`. The gap
`a b` is the number of missing IP entries.
3. **Port the next batch.** Identify what the Zig compiler creates for
the next ~10 IP entries (struct types, ptr_nav, enum types, etc.).
Port the corresponding logic from `src/Zcu/PerThread.zig` and
`src/Sema.zig` into `stage0/sema.c`. Key functions to port:
- `createFileRootStruct` → creates `type_struct` IP entry for a
module's root.
- `scanNamespace` → iterates declarations, creates `ptr_nav` entries
for each Nav.
- `getStructType` / `getEnumType` → creates type entries in IP.
- `ensureFileAnalyzed` → recursively processes imported modules.
- `zirExport` → forces resolution of exported declarations.
4. **Test.** `zig build test-zig0` — the gap should shrink.
5. **Clean up & commit** (see [Cleaning Up](#cleaning-up)), then
**immediately continue** to step 3. Do NOT stop here. Keep
`num_passing` at whatever value passes; don't bump it until the gap
reaches zero.
6. **Exit condition:** the gap is zero, `num_passing` is incremented,
and `zig build test-zig0` passes. Only then return to
[The loop](#the-loop) step 6.
### Important constraints
- Do NOT hardcode IP entries from a dump. The entries must be computed
from the ZIR, matching the Zig compiler's processing.
- **callconv = .c and wasm32-wasi target hardcoding are OK.** This is a
bootstrap interpreter targeting only that platform. Do not spend time
making these configurable.
- Do NOT hardcode other target-specific values (enum tags, field indices,
type indices, etc.) into sema.c. All such values must be computed by
porting the upstream Zig functions that produce them. Port the function,
not the output.
- Do NOT include generated `.c` files. All logic belongs in `sema.c`.
- Entry ORDER matters. The C sema must create entries in the same
order as the Zig compiler. Follow the Zig compiler's processing
sequence (struct type → scan declarations → process imports →
resolve comptime blocks).
- Deduplication matters. If the function body interns a value that was
already created during module-level analysis, `ipIntern` must return
the existing index (not create a duplicate).
- Port **functions**, not entries. The IP entries are a consequence of
running the upstream functions (`createFileRootStruct`,
`scanNamespace`, `ensureFileAnalyzed`, etc.) correctly. Port those
functions mechanically from `src/Zcu/PerThread.zig` to C; do not try
to understand every individual IP entry before starting to code.
- **NEVER fill the IP gap by pre-resolving types or declarations.**
Do NOT resolve declarations by name, enumerate namespaces, or force
type evaluation just to create IP entries and "fill up" the gap.
This applies to builtin types, std types, module declarations, or
any other source. Every IP entry must be created as a side effect
of honestly porting the upstream function that produces it. If the
Zig compiler creates entries through `zirExport``getBuiltinType`
`analyzeMemoizedState`, then port that chain. If it creates
entries through `scanNamespace``analyzeNav`, port that. Do not
invent shortcuts that produce the right count but through the wrong
mechanism.
- Time-box investigation. Spend at most ~10 minutes investigating a
problem before writing code. Use `--verbose-intern-pool` dumps to
verify progress after each batch, not to plan the entire
implementation upfront.
### Anti-pattern: analysis paralysis / proving work unnecessary
When the gap is large (hundreds of entries), the temptation is to spend rounds
analyzing whether the entries are "really needed" or whether a shortcut exists.
**This is always wrong.** The entries are needed — the test comparison is
byte-for-byte with no IP base adjustment.
Signs you are bailing out:
- Asking "does the AIR actually reference non-pre-interned IP indices?"
- Exploring "what if we DON'T create module-level entries?"
- Running verbose-air dumps to prove the function body is simple
- Suggesting the IP count "might not matter"
The correct response to a large gap:
1. Fix the immediate crash/assertion (e.g. increase buffer sizes)
2. Port the next upstream function that creates entries
3. Test, measure gap reduction, commit
4. Repeat
Time-box investigation: spend at most ~10 minutes investigating before
writing code.
## AIR comparison exceptions
@@ -198,33 +129,20 @@ Before committing, ensure the branch stays green:
extraneous output.
If a test that previously passed now fails, that is a regression. Do not commit.
Go back and fix it — never commit with fewer passing tests than before. If
it's not a test failure but a formatting/linting issue, fix it before committing.
Go back and fix it — never commit with fewer passing tests than before.
# General rules
## General rules
- When porting features from upstream Zig, it should be a **mechanical copy**.
Don't invent. Keep the structure in place, name functions and types the same
way (or within reason equivalently if there are namespacing constraints). It
should be easy to reference one from the other; and, if there are semantic
differences, they *must* be because Zig or C does not support certain
features (like errdefer).
- When translating functions from Zig to C (mechanically, remember?), add them
in the same order as in the original Zig file.
- **Never ever** remove zig-cache, neither local nor global.
- **Never** remove zig-cache, neither local nor global.
- Zig code is in ~/code/zig, don't look at /nix/...
- Debug printfs: add printfs only when debugging a specific issue; when done
debugging, remove them (or comment them if you may find them useful later). I
prefer committing code only when `zig build` returns no output.
- Debug printfs: remove when done debugging. `zig build` should produce no
output.
- Always complete all tasks before stopping. Do not stop to ask for
confirmation mid-task. If you have remaining work, continue without waiting
for input.
- No `cppcheck` suppressions. They are here for a reason. If it is complaining
about automatic variables, make it non-automatic. I.e. find a way to satisfy
the linter, do not suppress it.
confirmation mid-task.
- No `cppcheck` suppressions — satisfy the linter, don't suppress it.
- See `stage0/README.md` for testing commands and debugging tips.
# Tools
## Tools
- A debug build of the Zig compiler is available at `zig-out/bin/zig`
(build with `zig build`). It has `--verbose-intern-pool` enabled.