stage0: rewrite CLAUDE.md to lead with porting methodology

Restructure from warnings-first ("NEVER do X") to methodology-first ("port the function, not its output"). The core principle is now front and center with a concrete 4-step method (find, read, translate, test). Removes the 118-line "Closing the InternPool gap" section (status moved to MEMORY.md), the anti-pattern section, and scattered duplications. Consolidates mechanical copy rules into the methodology. 243→160 lines. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-01 17:07:21 +00:00
parent f2288a8e4b
commit f9a061e5e1
1 changed files with 69 additions and 151 deletions
--- a/stage0/CLAUDE.md
+++ b/stage0/CLAUDE.md
@@ -11,11 +11,66 @@ corpus tests pass.
  counter. Sema unit tests (stage0/sema_tests/) come first, then lib/ files.
 - `stage0/sema_tests/` — focused unit tests for decomposing complex problems.
  These are part of the main corpus and go through the full pipeline
-  (parser → ZIR → sema).
+  (parser -> ZIR -> sema).
 - `stage0/stages_test.zig` — runs all stages for corpus files with
  **bidirectional** AIR comparison (every C function must match Zig, AND
  every Zig function must exist in C output).

+## How to port
+
+**Port the function, not its output.**
+
+Every feature in `stage0/sema.c` must be a mechanical translation of an
+upstream Zig function. When a test fails, your job is to find which upstream
+function produces the missing behavior and translate it to C. Never reverse-
+engineer the output and recreate it directly.
+
+### The method
+
+1. **Find the upstream function.** Read the test failure. Trace it back to
+   the Zig source (`src/Sema.zig`, `src/Zcu/PerThread.zig`). If the mismatch
+   shows a gap like `a=0xNN b=0xMM`, find which upstream function creates
+   entries a-through-b.
+
+2. **Read it.** Understand the function's control flow. Note what it calls.
+
+3. **Translate mechanically.** Port to C following these rules:
+   - Function names match (prefix `sema` when appropriate).
+   - Control flow matches.
+   - Data structures match (C <-> Zig interop permitting). Struct definitions
+     in `stage0/sema.h` should mirror `src/Sema.zig`.
+   - Add functions in the same order as the original Zig file.
+   - Entry order matters — the C sema must create entries in the same
+     sequence as the Zig compiler.
+   - Deduplication matters — `ipIntern` must return existing indices, not
+     create duplicates.
+
+4. **Test.** `zig build test-zig0` — iterate until the test passes.
+
+### Allowed hardcoding
+
+**callconv = .c and wasm32-wasi target are OK** — this is a bootstrap
+interpreter targeting only that platform. Don't make these configurable.
+
+Everything else must be computed by porting the upstream functions that
+produce the values. Do not hand-code enum tags, field indices, type indices,
+or any values that come from running upstream logic.
+
+### When you see a gap
+
+When a test mismatch shows the C sema has fewer InternPool entries than
+the Zig compiler: **find which upstream function creates those entries and
+port it.** Do not create entries directly, resolve types by name, enumerate
+namespaces, or force type evaluation to fill the gap. Every entry must be
+a side effect of running the correctly-ported upstream function.
+
+Key upstream functions that create IP entries:
+- `createFileRootStruct` -> `type_struct` entry for a module's root
+- `scanNamespace` -> `ptr_nav` entries for declarations
+- `getStructType` / `getEnumType` -> type entries
+- `ensureFileAnalyzed` -> recursive module processing
+- `zirExport` -> forces resolution of exported declarations
+
 ## The loop

 Repeat until all corpus tests in `stage0/corpus.zig` pass:
@@ -23,18 +78,11 @@ Repeat until all corpus tests in `stage0/corpus.zig` pass:
 **Do NOT stop between iterations.** Each commit is a checkpoint, not a
 stopping point. Continue looping until all tests pass. When unsure how
 to proceed, the answer is always: follow what the upstream Zig compiler
-does (`src/Sema.zig`, `src/Zcu/PerThread.zig`). Never stop to ask.
+does. Never stop to ask.

 1. **Bump.** Increment `num_passing` by 1 in `stage0/corpus.zig`.
 2. **Run.** `zig build test-zig0` — observe the failure.
-3. **Port.** Mechanically copy the needed logic from `src/Sema.zig` into
-   `stage0/sema.c` / `stage0/sema.h`. Ground rules:
-    - Function names should match (except for `sema` prefix when appropriate).
-    - Function control flow should match.
-    - Data structures should be the same, C <-> Zig interop permitting. I.e.
-      struct definitions in `stage0/sema.h` should be, language permitting, the
-      same as in `src/Sema.zig`.
-    - Add functions in the same order as in the original Zig file.
+3. **Port.** Follow [How to port](#how-to-port) above.
 4. **Test.** `zig build test-zig0` — iterate until the new test passes.
 5. **Clean up & commit.** See [Cleaning Up](#cleaning-up).
 6. **Go to 1.** Do NOT stop between iterations — keep looping until
@@ -43,8 +91,7 @@ does (`src/Sema.zig`, `src/Zcu/PerThread.zig`). Never stop to ask.
 ## When stuck

 If a single corpus test requires too many new functions, causes unclear
-failures, or needs multiple unrelated features at once — do NOT give up or
-take shortcuts. Instead, decompose:
+failures, or needs multiple unrelated features at once — decompose:

 1. Create a focused test file in `stage0/sema_tests/` that isolates one piece
   of the problem.
@@ -53,124 +100,8 @@ take shortcuts. Instead, decompose:
 4. Repeat until enough pieces are in place for the corpus test to pass.
 5. Return to [The loop](#the-loop).

-## Closing the InternPool gap
-
-### Current status (2026-03-01)
-
-**num_passing = 5.** The first 5 sema unit tests pass (empty.zig,
-const_decl.zig, empty_void_function.zig, type_identity_fn.zig,
-reify_int.zig). reify_int.zig was fixed by handling `decl_val` in
-2-instruction parameter type bodies.
-
-**Next blocker: return_integer.zig** (index 6). Uses `return 42` which
-interns a non-pre-interned value (42:u32). The IP index of this value
-depends on the module-level entry count. The Zig compiler (even for
-standalone files with `std_mod=null`) always creates std from `lib_dir`,
-producing `func_ip=217` (215 base + 2 function body entries). The C sema
-creates only ~135 entries. The gap of ~80 entries comes from the @export
-trigger chain: `zirExport` → `resolveExportOptions` → `getBuiltinType`
-→ `analyzeMemoizedState`, which resolves builtin types from
-`std.builtin` as side effects. Port this chain to close the gap.
-
-### Background
-
- Nav entries (declarations) are stored in a **separate** list from IP
-  Items — they do NOT consume IP indices. But `ptr_nav` entries (pointers
-  TO Navs) DO consume IP indices.
- The Zig compiler creates IP entries through `createFileRootStruct`
-  (in `src/Zcu/PerThread.zig`) and `scanNamespace`. These must be
-  ported to C.
- The IP items count at function analysis time is embedded in the
-  `.air` binary format as `func_ip`. When a test mismatch occurs, it
-  is displayed as `[zig_ip_base=N]` in the error output.
-
-### The module-system porting loop
-
-1. **Read `func_ip` from mismatch output.** Run `zig build test-zig0`
-   with `num_passing` bumped. The mismatch message includes
-   `[zig_ip_base=N]` — this is the Zig compiler's IP items count at
-   function analysis time. The gap between `a` and `b` IP refs tells
-   you how many entries the C sema is missing.
-2. **Compare.** Note the mismatch: `a=0x???[ip] b=0x???[ip]`. The gap
-   `a − b` is the number of missing IP entries.
-3. **Port the next batch.** Identify what the Zig compiler creates for
-   the next ~10 IP entries (struct types, ptr_nav, enum types, etc.).
-   Port the corresponding logic from `src/Zcu/PerThread.zig` and
-   `src/Sema.zig` into `stage0/sema.c`.  Key functions to port:
-   - `createFileRootStruct` → creates `type_struct` IP entry for a
-     module's root.
-   - `scanNamespace` → iterates declarations, creates `ptr_nav` entries
-     for each Nav.
-   - `getStructType` / `getEnumType` → creates type entries in IP.
-   - `ensureFileAnalyzed` → recursively processes imported modules.
-   - `zirExport` → forces resolution of exported declarations.
-4. **Test.** `zig build test-zig0` — the gap should shrink.
-5. **Clean up & commit** (see [Cleaning Up](#cleaning-up)), then
-   **immediately continue** to step 3. Do NOT stop here. Keep
-   `num_passing` at whatever value passes; don't bump it until the gap
-   reaches zero.
-6. **Exit condition:** the gap is zero, `num_passing` is incremented,
-   and `zig build test-zig0` passes. Only then return to
-   [The loop](#the-loop) step 6.
-
-### Important constraints
-
- Do NOT hardcode IP entries from a dump. The entries must be computed
-  from the ZIR, matching the Zig compiler's processing.
- **callconv = .c and wasm32-wasi target hardcoding are OK.** This is a
-  bootstrap interpreter targeting only that platform. Do not spend time
-  making these configurable.
- Do NOT hardcode other target-specific values (enum tags, field indices,
-  type indices, etc.) into sema.c. All such values must be computed by
-  porting the upstream Zig functions that produce them. Port the function,
-  not the output.
- Do NOT include generated `.c` files. All logic belongs in `sema.c`.
- Entry ORDER matters. The C sema must create entries in the same
-  order as the Zig compiler. Follow the Zig compiler's processing
-  sequence (struct type → scan declarations → process imports →
-  resolve comptime blocks).
- Deduplication matters. If the function body interns a value that was
-  already created during module-level analysis, `ipIntern` must return
-  the existing index (not create a duplicate).
- Port **functions**, not entries. The IP entries are a consequence of
-  running the upstream functions (`createFileRootStruct`,
-  `scanNamespace`, `ensureFileAnalyzed`, etc.) correctly. Port those
-  functions mechanically from `src/Zcu/PerThread.zig` to C; do not try
-  to understand every individual IP entry before starting to code.
- **NEVER fill the IP gap by pre-resolving types or declarations.**
-  Do NOT resolve declarations by name, enumerate namespaces, or force
-  type evaluation just to create IP entries and "fill up" the gap.
-  This applies to builtin types, std types, module declarations, or
-  any other source. Every IP entry must be created as a side effect
-  of honestly porting the upstream function that produces it. If the
-  Zig compiler creates entries through `zirExport` → `getBuiltinType`
-  → `analyzeMemoizedState`, then port that chain. If it creates
-  entries through `scanNamespace` → `analyzeNav`, port that. Do not
-  invent shortcuts that produce the right count but through the wrong
-  mechanism.
- Time-box investigation. Spend at most ~10 minutes investigating a
-  problem before writing code. Use `--verbose-intern-pool` dumps to
-  verify progress after each batch, not to plan the entire
-  implementation upfront.
-
-### Anti-pattern: analysis paralysis / proving work unnecessary
-
-When the gap is large (hundreds of entries), the temptation is to spend rounds
-analyzing whether the entries are "really needed" or whether a shortcut exists.
-**This is always wrong.** The entries are needed — the test comparison is
-byte-for-byte with no IP base adjustment.
-
-Signs you are bailing out:
- Asking "does the AIR actually reference non-pre-interned IP indices?"
- Exploring "what if we DON'T create module-level entries?"
- Running verbose-air dumps to prove the function body is simple
- Suggesting the IP count "might not matter"
-
-The correct response to a large gap:
-1. Fix the immediate crash/assertion (e.g. increase buffer sizes)
-2. Port the next upstream function that creates entries
-3. Test, measure gap reduction, commit
-4. Repeat
+Time-box investigation: spend at most ~10 minutes investigating before
+writing code.

 ## AIR comparison exceptions

@@ -198,33 +129,20 @@ Before committing, ensure the branch stays green:
   extraneous output.

 If a test that previously passed now fails, that is a regression. Do not commit.
-Go back and fix it — never commit with fewer passing tests than before. If
-it's not a test failure but a formatting/linting issue, fix it before committing.
+Go back and fix it — never commit with fewer passing tests than before.

-# General rules
+## General rules

- When porting features from upstream Zig, it should be a **mechanical copy**.
-  Don't invent. Keep the structure in place, name functions and types the same
-  way (or within reason equivalently if there are namespacing constraints). It
-  should be easy to reference one from the other; and, if there are semantic
-  differences, they *must* be because Zig or C does not support certain
-  features (like errdefer).
- When translating functions from Zig to C (mechanically, remember?), add them
-  in the same order as in the original Zig file.
- **Never ever** remove zig-cache, neither local nor global.
+- **Never** remove zig-cache, neither local nor global.
 - Zig code is in ~/code/zig, don't look at /nix/...
- Debug printfs: add printfs only when debugging a specific issue; when done
-  debugging, remove them (or comment them if you may find them useful later). I
-  prefer committing code only when `zig build` returns no output.
+- Debug printfs: remove when done debugging. `zig build` should produce no
+  output.
 - Always complete all tasks before stopping. Do not stop to ask for
-  confirmation mid-task. If you have remaining work, continue without waiting
-  for input.
- No `cppcheck` suppressions. They are here for a reason. If it is complaining
-  about automatic variables, make it non-automatic. I.e. find a way to satisfy
-  the linter, do not suppress it.
+  confirmation mid-task.
+- No `cppcheck` suppressions — satisfy the linter, don't suppress it.
 - See `stage0/README.md` for testing commands and debugging tips.

-# Tools
+## Tools

 - A debug build of the Zig compiler is available at `zig-out/bin/zig`
  (build with `zig build`). It has `--verbose-intern-pool` enabled.