stage0: rewrite CLAUDE.md to lead with porting methodology
Restructure from warnings-first ("NEVER do X") to methodology-first
("port the function, not its output"). The core principle is now front
and center with a concrete 4-step method (find, read, translate, test).
Removes the 118-line "Closing the InternPool gap" section (status moved
to MEMORY.md), the anti-pattern section, and scattered duplications.
Consolidates mechanical copy rules into the methodology. 243→160 lines.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
220
stage0/CLAUDE.md
220
stage0/CLAUDE.md
@@ -11,11 +11,66 @@ corpus tests pass.
|
||||
counter. Sema unit tests (stage0/sema_tests/) come first, then lib/ files.
|
||||
- `stage0/sema_tests/` — focused unit tests for decomposing complex problems.
|
||||
These are part of the main corpus and go through the full pipeline
|
||||
(parser → ZIR → sema).
|
||||
(parser -> ZIR -> sema).
|
||||
- `stage0/stages_test.zig` — runs all stages for corpus files with
|
||||
**bidirectional** AIR comparison (every C function must match Zig, AND
|
||||
every Zig function must exist in C output).
|
||||
|
||||
## How to port
|
||||
|
||||
**Port the function, not its output.**
|
||||
|
||||
Every feature in `stage0/sema.c` must be a mechanical translation of an
|
||||
upstream Zig function. When a test fails, your job is to find which upstream
|
||||
function produces the missing behavior and translate it to C. Never reverse-
|
||||
engineer the output and recreate it directly.
|
||||
|
||||
### The method
|
||||
|
||||
1. **Find the upstream function.** Read the test failure. Trace it back to
|
||||
the Zig source (`src/Sema.zig`, `src/Zcu/PerThread.zig`). If the mismatch
|
||||
shows a gap like `a=0xNN b=0xMM`, find which upstream function creates
|
||||
entries a-through-b.
|
||||
|
||||
2. **Read it.** Understand the function's control flow. Note what it calls.
|
||||
|
||||
3. **Translate mechanically.** Port to C following these rules:
|
||||
- Function names match (prefix `sema` when appropriate).
|
||||
- Control flow matches.
|
||||
- Data structures match (C <-> Zig interop permitting). Struct definitions
|
||||
in `stage0/sema.h` should mirror `src/Sema.zig`.
|
||||
- Add functions in the same order as the original Zig file.
|
||||
- Entry order matters — the C sema must create entries in the same
|
||||
sequence as the Zig compiler.
|
||||
- Deduplication matters — `ipIntern` must return existing indices, not
|
||||
create duplicates.
|
||||
|
||||
4. **Test.** `zig build test-zig0` — iterate until the test passes.
|
||||
|
||||
### Allowed hardcoding
|
||||
|
||||
**callconv = .c and wasm32-wasi target are OK** — this is a bootstrap
|
||||
interpreter targeting only that platform. Don't make these configurable.
|
||||
|
||||
Everything else must be computed by porting the upstream functions that
|
||||
produce the values. Do not hand-code enum tags, field indices, type indices,
|
||||
or any values that come from running upstream logic.
|
||||
|
||||
### When you see a gap
|
||||
|
||||
When a test mismatch shows the C sema has fewer InternPool entries than
|
||||
the Zig compiler: **find which upstream function creates those entries and
|
||||
port it.** Do not create entries directly, resolve types by name, enumerate
|
||||
namespaces, or force type evaluation to fill the gap. Every entry must be
|
||||
a side effect of running the correctly-ported upstream function.
|
||||
|
||||
Key upstream functions that create IP entries:
|
||||
- `createFileRootStruct` -> `type_struct` entry for a module's root
|
||||
- `scanNamespace` -> `ptr_nav` entries for declarations
|
||||
- `getStructType` / `getEnumType` -> type entries
|
||||
- `ensureFileAnalyzed` -> recursive module processing
|
||||
- `zirExport` -> forces resolution of exported declarations
|
||||
|
||||
## The loop
|
||||
|
||||
Repeat until all corpus tests in `stage0/corpus.zig` pass:
|
||||
@@ -23,18 +78,11 @@ Repeat until all corpus tests in `stage0/corpus.zig` pass:
|
||||
**Do NOT stop between iterations.** Each commit is a checkpoint, not a
|
||||
stopping point. Continue looping until all tests pass. When unsure how
|
||||
to proceed, the answer is always: follow what the upstream Zig compiler
|
||||
does (`src/Sema.zig`, `src/Zcu/PerThread.zig`). Never stop to ask.
|
||||
does. Never stop to ask.
|
||||
|
||||
1. **Bump.** Increment `num_passing` by 1 in `stage0/corpus.zig`.
|
||||
2. **Run.** `zig build test-zig0` — observe the failure.
|
||||
3. **Port.** Mechanically copy the needed logic from `src/Sema.zig` into
|
||||
`stage0/sema.c` / `stage0/sema.h`. Ground rules:
|
||||
- Function names should match (except for `sema` prefix when appropriate).
|
||||
- Function control flow should match.
|
||||
- Data structures should be the same, C <-> Zig interop permitting. I.e.
|
||||
struct definitions in `stage0/sema.h` should be, language permitting, the
|
||||
same as in `src/Sema.zig`.
|
||||
- Add functions in the same order as in the original Zig file.
|
||||
3. **Port.** Follow [How to port](#how-to-port) above.
|
||||
4. **Test.** `zig build test-zig0` — iterate until the new test passes.
|
||||
5. **Clean up & commit.** See [Cleaning Up](#cleaning-up).
|
||||
6. **Go to 1.** Do NOT stop between iterations — keep looping until
|
||||
@@ -43,8 +91,7 @@ does (`src/Sema.zig`, `src/Zcu/PerThread.zig`). Never stop to ask.
|
||||
## When stuck
|
||||
|
||||
If a single corpus test requires too many new functions, causes unclear
|
||||
failures, or needs multiple unrelated features at once — do NOT give up or
|
||||
take shortcuts. Instead, decompose:
|
||||
failures, or needs multiple unrelated features at once — decompose:
|
||||
|
||||
1. Create a focused test file in `stage0/sema_tests/` that isolates one piece
|
||||
of the problem.
|
||||
@@ -53,124 +100,8 @@ take shortcuts. Instead, decompose:
|
||||
4. Repeat until enough pieces are in place for the corpus test to pass.
|
||||
5. Return to [The loop](#the-loop).
|
||||
|
||||
## Closing the InternPool gap
|
||||
|
||||
### Current status (2026-03-01)
|
||||
|
||||
**num_passing = 5.** The first 5 sema unit tests pass (empty.zig,
|
||||
const_decl.zig, empty_void_function.zig, type_identity_fn.zig,
|
||||
reify_int.zig). reify_int.zig was fixed by handling `decl_val` in
|
||||
2-instruction parameter type bodies.
|
||||
|
||||
**Next blocker: return_integer.zig** (index 6). Uses `return 42` which
|
||||
interns a non-pre-interned value (42:u32). The IP index of this value
|
||||
depends on the module-level entry count. The Zig compiler (even for
|
||||
standalone files with `std_mod=null`) always creates std from `lib_dir`,
|
||||
producing `func_ip=217` (215 base + 2 function body entries). The C sema
|
||||
creates only ~135 entries. The gap of ~80 entries comes from the @export
|
||||
trigger chain: `zirExport` → `resolveExportOptions` → `getBuiltinType`
|
||||
→ `analyzeMemoizedState`, which resolves builtin types from
|
||||
`std.builtin` as side effects. Port this chain to close the gap.
|
||||
|
||||
### Background
|
||||
|
||||
- Nav entries (declarations) are stored in a **separate** list from IP
|
||||
Items — they do NOT consume IP indices. But `ptr_nav` entries (pointers
|
||||
TO Navs) DO consume IP indices.
|
||||
- The Zig compiler creates IP entries through `createFileRootStruct`
|
||||
(in `src/Zcu/PerThread.zig`) and `scanNamespace`. These must be
|
||||
ported to C.
|
||||
- The IP items count at function analysis time is embedded in the
|
||||
`.air` binary format as `func_ip`. When a test mismatch occurs, it
|
||||
is displayed as `[zig_ip_base=N]` in the error output.
|
||||
|
||||
### The module-system porting loop
|
||||
|
||||
1. **Read `func_ip` from mismatch output.** Run `zig build test-zig0`
|
||||
with `num_passing` bumped. The mismatch message includes
|
||||
`[zig_ip_base=N]` — this is the Zig compiler's IP items count at
|
||||
function analysis time. The gap between `a` and `b` IP refs tells
|
||||
you how many entries the C sema is missing.
|
||||
2. **Compare.** Note the mismatch: `a=0x???[ip] b=0x???[ip]`. The gap
|
||||
`a − b` is the number of missing IP entries.
|
||||
3. **Port the next batch.** Identify what the Zig compiler creates for
|
||||
the next ~10 IP entries (struct types, ptr_nav, enum types, etc.).
|
||||
Port the corresponding logic from `src/Zcu/PerThread.zig` and
|
||||
`src/Sema.zig` into `stage0/sema.c`. Key functions to port:
|
||||
- `createFileRootStruct` → creates `type_struct` IP entry for a
|
||||
module's root.
|
||||
- `scanNamespace` → iterates declarations, creates `ptr_nav` entries
|
||||
for each Nav.
|
||||
- `getStructType` / `getEnumType` → creates type entries in IP.
|
||||
- `ensureFileAnalyzed` → recursively processes imported modules.
|
||||
- `zirExport` → forces resolution of exported declarations.
|
||||
4. **Test.** `zig build test-zig0` — the gap should shrink.
|
||||
5. **Clean up & commit** (see [Cleaning Up](#cleaning-up)), then
|
||||
**immediately continue** to step 3. Do NOT stop here. Keep
|
||||
`num_passing` at whatever value passes; don't bump it until the gap
|
||||
reaches zero.
|
||||
6. **Exit condition:** the gap is zero, `num_passing` is incremented,
|
||||
and `zig build test-zig0` passes. Only then return to
|
||||
[The loop](#the-loop) step 6.
|
||||
|
||||
### Important constraints
|
||||
|
||||
- Do NOT hardcode IP entries from a dump. The entries must be computed
|
||||
from the ZIR, matching the Zig compiler's processing.
|
||||
- **callconv = .c and wasm32-wasi target hardcoding are OK.** This is a
|
||||
bootstrap interpreter targeting only that platform. Do not spend time
|
||||
making these configurable.
|
||||
- Do NOT hardcode other target-specific values (enum tags, field indices,
|
||||
type indices, etc.) into sema.c. All such values must be computed by
|
||||
porting the upstream Zig functions that produce them. Port the function,
|
||||
not the output.
|
||||
- Do NOT include generated `.c` files. All logic belongs in `sema.c`.
|
||||
- Entry ORDER matters. The C sema must create entries in the same
|
||||
order as the Zig compiler. Follow the Zig compiler's processing
|
||||
sequence (struct type → scan declarations → process imports →
|
||||
resolve comptime blocks).
|
||||
- Deduplication matters. If the function body interns a value that was
|
||||
already created during module-level analysis, `ipIntern` must return
|
||||
the existing index (not create a duplicate).
|
||||
- Port **functions**, not entries. The IP entries are a consequence of
|
||||
running the upstream functions (`createFileRootStruct`,
|
||||
`scanNamespace`, `ensureFileAnalyzed`, etc.) correctly. Port those
|
||||
functions mechanically from `src/Zcu/PerThread.zig` to C; do not try
|
||||
to understand every individual IP entry before starting to code.
|
||||
- **NEVER fill the IP gap by pre-resolving types or declarations.**
|
||||
Do NOT resolve declarations by name, enumerate namespaces, or force
|
||||
type evaluation just to create IP entries and "fill up" the gap.
|
||||
This applies to builtin types, std types, module declarations, or
|
||||
any other source. Every IP entry must be created as a side effect
|
||||
of honestly porting the upstream function that produces it. If the
|
||||
Zig compiler creates entries through `zirExport` → `getBuiltinType`
|
||||
→ `analyzeMemoizedState`, then port that chain. If it creates
|
||||
entries through `scanNamespace` → `analyzeNav`, port that. Do not
|
||||
invent shortcuts that produce the right count but through the wrong
|
||||
mechanism.
|
||||
- Time-box investigation. Spend at most ~10 minutes investigating a
|
||||
problem before writing code. Use `--verbose-intern-pool` dumps to
|
||||
verify progress after each batch, not to plan the entire
|
||||
implementation upfront.
|
||||
|
||||
### Anti-pattern: analysis paralysis / proving work unnecessary
|
||||
|
||||
When the gap is large (hundreds of entries), the temptation is to spend rounds
|
||||
analyzing whether the entries are "really needed" or whether a shortcut exists.
|
||||
**This is always wrong.** The entries are needed — the test comparison is
|
||||
byte-for-byte with no IP base adjustment.
|
||||
|
||||
Signs you are bailing out:
|
||||
- Asking "does the AIR actually reference non-pre-interned IP indices?"
|
||||
- Exploring "what if we DON'T create module-level entries?"
|
||||
- Running verbose-air dumps to prove the function body is simple
|
||||
- Suggesting the IP count "might not matter"
|
||||
|
||||
The correct response to a large gap:
|
||||
1. Fix the immediate crash/assertion (e.g. increase buffer sizes)
|
||||
2. Port the next upstream function that creates entries
|
||||
3. Test, measure gap reduction, commit
|
||||
4. Repeat
|
||||
Time-box investigation: spend at most ~10 minutes investigating before
|
||||
writing code.
|
||||
|
||||
## AIR comparison exceptions
|
||||
|
||||
@@ -198,33 +129,20 @@ Before committing, ensure the branch stays green:
|
||||
extraneous output.
|
||||
|
||||
If a test that previously passed now fails, that is a regression. Do not commit.
|
||||
Go back and fix it — never commit with fewer passing tests than before. If
|
||||
it's not a test failure but a formatting/linting issue, fix it before committing.
|
||||
Go back and fix it — never commit with fewer passing tests than before.
|
||||
|
||||
# General rules
|
||||
## General rules
|
||||
|
||||
- When porting features from upstream Zig, it should be a **mechanical copy**.
|
||||
Don't invent. Keep the structure in place, name functions and types the same
|
||||
way (or within reason equivalently if there are namespacing constraints). It
|
||||
should be easy to reference one from the other; and, if there are semantic
|
||||
differences, they *must* be because Zig or C does not support certain
|
||||
features (like errdefer).
|
||||
- When translating functions from Zig to C (mechanically, remember?), add them
|
||||
in the same order as in the original Zig file.
|
||||
- **Never ever** remove zig-cache, neither local nor global.
|
||||
- **Never** remove zig-cache, neither local nor global.
|
||||
- Zig code is in ~/code/zig, don't look at /nix/...
|
||||
- Debug printfs: add printfs only when debugging a specific issue; when done
|
||||
debugging, remove them (or comment them if you may find them useful later). I
|
||||
prefer committing code only when `zig build` returns no output.
|
||||
- Debug printfs: remove when done debugging. `zig build` should produce no
|
||||
output.
|
||||
- Always complete all tasks before stopping. Do not stop to ask for
|
||||
confirmation mid-task. If you have remaining work, continue without waiting
|
||||
for input.
|
||||
- No `cppcheck` suppressions. They are here for a reason. If it is complaining
|
||||
about automatic variables, make it non-automatic. I.e. find a way to satisfy
|
||||
the linter, do not suppress it.
|
||||
confirmation mid-task.
|
||||
- No `cppcheck` suppressions — satisfy the linter, don't suppress it.
|
||||
- See `stage0/README.md` for testing commands and debugging tips.
|
||||
|
||||
# Tools
|
||||
## Tools
|
||||
|
||||
- A debug build of the Zig compiler is available at `zig-out/bin/zig`
|
||||
(build with `zig build`). It has `--verbose-intern-pool` enabled.
|
||||
|
||||
Reference in New Issue
Block a user