commit f9a061e5e1082e8e632df1d990bf2bf489e13ff4 (tree)
parent f2288a8e4b733bf4df17e9da638ca54e4dbb585b
Author: Motiejus <motiejus@jakstys.lt>
Date: Sun, 1 Mar 2026 17:07:21 +0000
stage0: rewrite CLAUDE.md to lead with porting methodology
Restructure from warnings-first ("NEVER do X") to methodology-first
("port the function, not its output"). The core principle is now front
and center with a concrete 4-step method (find, read, translate, test).
Removes the 118-line "Closing the InternPool gap" section (status moved
to MEMORY.md), the anti-pattern section, and scattered duplications.
Consolidates mechanical copy rules into the methodology. 243→160 lines.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Diffstat:
| M | stage0/CLAUDE.md | | | 224 | +++++++++++++++++++++++++------------------------------------------------------ |
1 file changed, 71 insertions(+), 153 deletions(-)
diff --git a/stage0/CLAUDE.md b/stage0/CLAUDE.md
@@ -11,11 +11,66 @@ corpus tests pass.
counter. Sema unit tests (stage0/sema_tests/) come first, then lib/ files.
- `stage0/sema_tests/` — focused unit tests for decomposing complex problems.
These are part of the main corpus and go through the full pipeline
- (parser → ZIR → sema).
+ (parser -> ZIR -> sema).
- `stage0/stages_test.zig` — runs all stages for corpus files with
**bidirectional** AIR comparison (every C function must match Zig, AND
every Zig function must exist in C output).
+## How to port
+
+**Port the function, not its output.**
+
+Every feature in `stage0/sema.c` must be a mechanical translation of an
+upstream Zig function. When a test fails, your job is to find which upstream
+function produces the missing behavior and translate it to C. Never reverse-
+engineer the output and recreate it directly.
+
+### The method
+
+1. **Find the upstream function.** Read the test failure. Trace it back to
+ the Zig source (`src/Sema.zig`, `src/Zcu/PerThread.zig`). If the mismatch
+ shows a gap like `a=0xNN b=0xMM`, find which upstream function creates
+ entries a-through-b.
+
+2. **Read it.** Understand the function's control flow. Note what it calls.
+
+3. **Translate mechanically.** Port to C following these rules:
+ - Function names match (prefix `sema` when appropriate).
+ - Control flow matches.
+ - Data structures match (C <-> Zig interop permitting). Struct definitions
+ in `stage0/sema.h` should mirror `src/Sema.zig`.
+ - Add functions in the same order as the original Zig file.
+ - Entry order matters — the C sema must create entries in the same
+ sequence as the Zig compiler.
+ - Deduplication matters — `ipIntern` must return existing indices, not
+ create duplicates.
+
+4. **Test.** `zig build test-zig0` — iterate until the test passes.
+
+### Allowed hardcoding
+
+**callconv = .c and wasm32-wasi target are OK** — this is a bootstrap
+interpreter targeting only that platform. Don't make these configurable.
+
+Everything else must be computed by porting the upstream functions that
+produce the values. Do not hand-code enum tags, field indices, type indices,
+or any values that come from running upstream logic.
+
+### When you see a gap
+
+When a test mismatch shows the C sema has fewer InternPool entries than
+the Zig compiler: **find which upstream function creates those entries and
+port it.** Do not create entries directly, resolve types by name, enumerate
+namespaces, or force type evaluation to fill the gap. Every entry must be
+a side effect of running the correctly-ported upstream function.
+
+Key upstream functions that create IP entries:
+- `createFileRootStruct` -> `type_struct` entry for a module's root
+- `scanNamespace` -> `ptr_nav` entries for declarations
+- `getStructType` / `getEnumType` -> type entries
+- `ensureFileAnalyzed` -> recursive module processing
+- `zirExport` -> forces resolution of exported declarations
+
## The loop
Repeat until all corpus tests in `stage0/corpus.zig` pass:
@@ -23,18 +78,11 @@ Repeat until all corpus tests in `stage0/corpus.zig` pass:
**Do NOT stop between iterations.** Each commit is a checkpoint, not a
stopping point. Continue looping until all tests pass. When unsure how
to proceed, the answer is always: follow what the upstream Zig compiler
-does (`src/Sema.zig`, `src/Zcu/PerThread.zig`). Never stop to ask.
+does. Never stop to ask.
1. **Bump.** Increment `num_passing` by 1 in `stage0/corpus.zig`.
2. **Run.** `zig build test-zig0` — observe the failure.
-3. **Port.** Mechanically copy the needed logic from `src/Sema.zig` into
- `stage0/sema.c` / `stage0/sema.h`. Ground rules:
- - Function names should match (except for `sema` prefix when appropriate).
- - Function control flow should match.
- - Data structures should be the same, C <-> Zig interop permitting. I.e.
- struct definitions in `stage0/sema.h` should be, language permitting, the
- same as in `src/Sema.zig`.
- - Add functions in the same order as in the original Zig file.
+3. **Port.** Follow [How to port](#how-to-port) above.
4. **Test.** `zig build test-zig0` — iterate until the new test passes.
5. **Clean up & commit.** See [Cleaning Up](#cleaning-up).
6. **Go to 1.** Do NOT stop between iterations — keep looping until
@@ -43,8 +91,7 @@ does (`src/Sema.zig`, `src/Zcu/PerThread.zig`). Never stop to ask.
## When stuck
If a single corpus test requires too many new functions, causes unclear
-failures, or needs multiple unrelated features at once — do NOT give up or
-take shortcuts. Instead, decompose:
+failures, or needs multiple unrelated features at once — decompose:
1. Create a focused test file in `stage0/sema_tests/` that isolates one piece
of the problem.
@@ -53,124 +100,8 @@ take shortcuts. Instead, decompose:
4. Repeat until enough pieces are in place for the corpus test to pass.
5. Return to [The loop](#the-loop).
-## Closing the InternPool gap
-
-### Current status (2026-03-01)
-
-**num_passing = 5.** The first 5 sema unit tests pass (empty.zig,
-const_decl.zig, empty_void_function.zig, type_identity_fn.zig,
-reify_int.zig). reify_int.zig was fixed by handling `decl_val` in
-2-instruction parameter type bodies.
-
-**Next blocker: return_integer.zig** (index 6). Uses `return 42` which
-interns a non-pre-interned value (42:u32). The IP index of this value
-depends on the module-level entry count. The Zig compiler (even for
-standalone files with `std_mod=null`) always creates std from `lib_dir`,
-producing `func_ip=217` (215 base + 2 function body entries). The C sema
-creates only ~135 entries. The gap of ~80 entries comes from the @export
-trigger chain: `zirExport` → `resolveExportOptions` → `getBuiltinType`
-→ `analyzeMemoizedState`, which resolves builtin types from
-`std.builtin` as side effects. Port this chain to close the gap.
-
-### Background
-
-- Nav entries (declarations) are stored in a **separate** list from IP
- Items — they do NOT consume IP indices. But `ptr_nav` entries (pointers
- TO Navs) DO consume IP indices.
-- The Zig compiler creates IP entries through `createFileRootStruct`
- (in `src/Zcu/PerThread.zig`) and `scanNamespace`. These must be
- ported to C.
-- The IP items count at function analysis time is embedded in the
- `.air` binary format as `func_ip`. When a test mismatch occurs, it
- is displayed as `[zig_ip_base=N]` in the error output.
-
-### The module-system porting loop
-
-1. **Read `func_ip` from mismatch output.** Run `zig build test-zig0`
- with `num_passing` bumped. The mismatch message includes
- `[zig_ip_base=N]` — this is the Zig compiler's IP items count at
- function analysis time. The gap between `a` and `b` IP refs tells
- you how many entries the C sema is missing.
-2. **Compare.** Note the mismatch: `a=0x???[ip] b=0x???[ip]`. The gap
- `a − b` is the number of missing IP entries.
-3. **Port the next batch.** Identify what the Zig compiler creates for
- the next ~10 IP entries (struct types, ptr_nav, enum types, etc.).
- Port the corresponding logic from `src/Zcu/PerThread.zig` and
- `src/Sema.zig` into `stage0/sema.c`. Key functions to port:
- - `createFileRootStruct` → creates `type_struct` IP entry for a
- module's root.
- - `scanNamespace` → iterates declarations, creates `ptr_nav` entries
- for each Nav.
- - `getStructType` / `getEnumType` → creates type entries in IP.
- - `ensureFileAnalyzed` → recursively processes imported modules.
- - `zirExport` → forces resolution of exported declarations.
-4. **Test.** `zig build test-zig0` — the gap should shrink.
-5. **Clean up & commit** (see [Cleaning Up](#cleaning-up)), then
- **immediately continue** to step 3. Do NOT stop here. Keep
- `num_passing` at whatever value passes; don't bump it until the gap
- reaches zero.
-6. **Exit condition:** the gap is zero, `num_passing` is incremented,
- and `zig build test-zig0` passes. Only then return to
- [The loop](#the-loop) step 6.
-
-### Important constraints
-
-- Do NOT hardcode IP entries from a dump. The entries must be computed
- from the ZIR, matching the Zig compiler's processing.
-- **callconv = .c and wasm32-wasi target hardcoding are OK.** This is a
- bootstrap interpreter targeting only that platform. Do not spend time
- making these configurable.
-- Do NOT hardcode other target-specific values (enum tags, field indices,
- type indices, etc.) into sema.c. All such values must be computed by
- porting the upstream Zig functions that produce them. Port the function,
- not the output.
-- Do NOT include generated `.c` files. All logic belongs in `sema.c`.
-- Entry ORDER matters. The C sema must create entries in the same
- order as the Zig compiler. Follow the Zig compiler's processing
- sequence (struct type → scan declarations → process imports →
- resolve comptime blocks).
-- Deduplication matters. If the function body interns a value that was
- already created during module-level analysis, `ipIntern` must return
- the existing index (not create a duplicate).
-- Port **functions**, not entries. The IP entries are a consequence of
- running the upstream functions (`createFileRootStruct`,
- `scanNamespace`, `ensureFileAnalyzed`, etc.) correctly. Port those
- functions mechanically from `src/Zcu/PerThread.zig` to C; do not try
- to understand every individual IP entry before starting to code.
-- **NEVER fill the IP gap by pre-resolving types or declarations.**
- Do NOT resolve declarations by name, enumerate namespaces, or force
- type evaluation just to create IP entries and "fill up" the gap.
- This applies to builtin types, std types, module declarations, or
- any other source. Every IP entry must be created as a side effect
- of honestly porting the upstream function that produces it. If the
- Zig compiler creates entries through `zirExport` → `getBuiltinType`
- → `analyzeMemoizedState`, then port that chain. If it creates
- entries through `scanNamespace` → `analyzeNav`, port that. Do not
- invent shortcuts that produce the right count but through the wrong
- mechanism.
-- Time-box investigation. Spend at most ~10 minutes investigating a
- problem before writing code. Use `--verbose-intern-pool` dumps to
- verify progress after each batch, not to plan the entire
- implementation upfront.
-
-### Anti-pattern: analysis paralysis / proving work unnecessary
-
-When the gap is large (hundreds of entries), the temptation is to spend rounds
-analyzing whether the entries are "really needed" or whether a shortcut exists.
-**This is always wrong.** The entries are needed — the test comparison is
-byte-for-byte with no IP base adjustment.
-
-Signs you are bailing out:
-- Asking "does the AIR actually reference non-pre-interned IP indices?"
-- Exploring "what if we DON'T create module-level entries?"
-- Running verbose-air dumps to prove the function body is simple
-- Suggesting the IP count "might not matter"
-
-The correct response to a large gap:
-1. Fix the immediate crash/assertion (e.g. increase buffer sizes)
-2. Port the next upstream function that creates entries
-3. Test, measure gap reduction, commit
-4. Repeat
+Time-box investigation: spend at most ~10 minutes investigating before
+writing code.
## AIR comparison exceptions
@@ -198,33 +129,20 @@ Before committing, ensure the branch stays green:
extraneous output.
If a test that previously passed now fails, that is a regression. Do not commit.
-Go back and fix it — never commit with fewer passing tests than before. If
-it's not a test failure but a formatting/linting issue, fix it before committing.
-
-# General rules
-
-- When porting features from upstream Zig, it should be a **mechanical copy**.
- Don't invent. Keep the structure in place, name functions and types the same
- way (or within reason equivalently if there are namespacing constraints). It
- should be easy to reference one from the other; and, if there are semantic
- differences, they *must* be because Zig or C does not support certain
- features (like errdefer).
-- When translating functions from Zig to C (mechanically, remember?), add them
- in the same order as in the original Zig file.
-- **Never ever** remove zig-cache, neither local nor global.
+Go back and fix it — never commit with fewer passing tests than before.
+
+## General rules
+
+- **Never** remove zig-cache, neither local nor global.
- Zig code is in ~/code/zig, don't look at /nix/...
-- Debug printfs: add printfs only when debugging a specific issue; when done
- debugging, remove them (or comment them if you may find them useful later). I
- prefer committing code only when `zig build` returns no output.
+- Debug printfs: remove when done debugging. `zig build` should produce no
+ output.
- Always complete all tasks before stopping. Do not stop to ask for
- confirmation mid-task. If you have remaining work, continue without waiting
- for input.
-- No `cppcheck` suppressions. They are here for a reason. If it is complaining
- about automatic variables, make it non-automatic. I.e. find a way to satisfy
- the linter, do not suppress it.
+ confirmation mid-task.
+- No `cppcheck` suppressions — satisfy the linter, don't suppress it.
- See `stage0/README.md` for testing commands and debugging tips.
-# Tools
+## Tools
- A debug build of the Zig compiler is available at `zig-out/bin/zig`
(build with `zig build`). It has `--verbose-intern-pool` enabled.