diff --git a/stage0/CLAUDE.md b/stage0/CLAUDE.md index c059ef9eb2..d2d9060f7a 100644 --- a/stage0/CLAUDE.md +++ b/stage0/CLAUDE.md @@ -11,11 +11,66 @@ corpus tests pass. counter. Sema unit tests (stage0/sema_tests/) come first, then lib/ files. - `stage0/sema_tests/` — focused unit tests for decomposing complex problems. These are part of the main corpus and go through the full pipeline - (parser → ZIR → sema). + (parser -> ZIR -> sema). - `stage0/stages_test.zig` — runs all stages for corpus files with **bidirectional** AIR comparison (every C function must match Zig, AND every Zig function must exist in C output). +## How to port + +**Port the function, not its output.** + +Every feature in `stage0/sema.c` must be a mechanical translation of an +upstream Zig function. When a test fails, your job is to find which upstream +function produces the missing behavior and translate it to C. Never reverse- +engineer the output and recreate it directly. + +### The method + +1. **Find the upstream function.** Read the test failure. Trace it back to + the Zig source (`src/Sema.zig`, `src/Zcu/PerThread.zig`). If the mismatch + shows a gap like `a=0xNN b=0xMM`, find which upstream function creates + entries a-through-b. + +2. **Read it.** Understand the function's control flow. Note what it calls. + +3. **Translate mechanically.** Port to C following these rules: + - Function names match (prefix `sema` when appropriate). + - Control flow matches. + - Data structures match (C <-> Zig interop permitting). Struct definitions + in `stage0/sema.h` should mirror `src/Sema.zig`. + - Add functions in the same order as the original Zig file. + - Entry order matters — the C sema must create entries in the same + sequence as the Zig compiler. + - Deduplication matters — `ipIntern` must return existing indices, not + create duplicates. + +4. **Test.** `zig build test-zig0` — iterate until the test passes. + +### Allowed hardcoding + +**callconv = .c and wasm32-wasi target are OK** — this is a bootstrap +interpreter targeting only that platform. Don't make these configurable. + +Everything else must be computed by porting the upstream functions that +produce the values. Do not hand-code enum tags, field indices, type indices, +or any values that come from running upstream logic. + +### When you see a gap + +When a test mismatch shows the C sema has fewer InternPool entries than +the Zig compiler: **find which upstream function creates those entries and +port it.** Do not create entries directly, resolve types by name, enumerate +namespaces, or force type evaluation to fill the gap. Every entry must be +a side effect of running the correctly-ported upstream function. + +Key upstream functions that create IP entries: +- `createFileRootStruct` -> `type_struct` entry for a module's root +- `scanNamespace` -> `ptr_nav` entries for declarations +- `getStructType` / `getEnumType` -> type entries +- `ensureFileAnalyzed` -> recursive module processing +- `zirExport` -> forces resolution of exported declarations + ## The loop Repeat until all corpus tests in `stage0/corpus.zig` pass: @@ -23,18 +78,11 @@ Repeat until all corpus tests in `stage0/corpus.zig` pass: **Do NOT stop between iterations.** Each commit is a checkpoint, not a stopping point. Continue looping until all tests pass. When unsure how to proceed, the answer is always: follow what the upstream Zig compiler -does (`src/Sema.zig`, `src/Zcu/PerThread.zig`). Never stop to ask. +does. Never stop to ask. 1. **Bump.** Increment `num_passing` by 1 in `stage0/corpus.zig`. 2. **Run.** `zig build test-zig0` — observe the failure. -3. **Port.** Mechanically copy the needed logic from `src/Sema.zig` into - `stage0/sema.c` / `stage0/sema.h`. Ground rules: - - Function names should match (except for `sema` prefix when appropriate). - - Function control flow should match. - - Data structures should be the same, C <-> Zig interop permitting. I.e. - struct definitions in `stage0/sema.h` should be, language permitting, the - same as in `src/Sema.zig`. - - Add functions in the same order as in the original Zig file. +3. **Port.** Follow [How to port](#how-to-port) above. 4. **Test.** `zig build test-zig0` — iterate until the new test passes. 5. **Clean up & commit.** See [Cleaning Up](#cleaning-up). 6. **Go to 1.** Do NOT stop between iterations — keep looping until @@ -43,8 +91,7 @@ does (`src/Sema.zig`, `src/Zcu/PerThread.zig`). Never stop to ask. ## When stuck If a single corpus test requires too many new functions, causes unclear -failures, or needs multiple unrelated features at once — do NOT give up or -take shortcuts. Instead, decompose: +failures, or needs multiple unrelated features at once — decompose: 1. Create a focused test file in `stage0/sema_tests/` that isolates one piece of the problem. @@ -53,124 +100,8 @@ take shortcuts. Instead, decompose: 4. Repeat until enough pieces are in place for the corpus test to pass. 5. Return to [The loop](#the-loop). -## Closing the InternPool gap - -### Current status (2026-03-01) - -**num_passing = 5.** The first 5 sema unit tests pass (empty.zig, -const_decl.zig, empty_void_function.zig, type_identity_fn.zig, -reify_int.zig). reify_int.zig was fixed by handling `decl_val` in -2-instruction parameter type bodies. - -**Next blocker: return_integer.zig** (index 6). Uses `return 42` which -interns a non-pre-interned value (42:u32). The IP index of this value -depends on the module-level entry count. The Zig compiler (even for -standalone files with `std_mod=null`) always creates std from `lib_dir`, -producing `func_ip=217` (215 base + 2 function body entries). The C sema -creates only ~135 entries. The gap of ~80 entries comes from the @export -trigger chain: `zirExport` → `resolveExportOptions` → `getBuiltinType` -→ `analyzeMemoizedState`, which resolves builtin types from -`std.builtin` as side effects. Port this chain to close the gap. - -### Background - -- Nav entries (declarations) are stored in a **separate** list from IP - Items — they do NOT consume IP indices. But `ptr_nav` entries (pointers - TO Navs) DO consume IP indices. -- The Zig compiler creates IP entries through `createFileRootStruct` - (in `src/Zcu/PerThread.zig`) and `scanNamespace`. These must be - ported to C. -- The IP items count at function analysis time is embedded in the - `.air` binary format as `func_ip`. When a test mismatch occurs, it - is displayed as `[zig_ip_base=N]` in the error output. - -### The module-system porting loop - -1. **Read `func_ip` from mismatch output.** Run `zig build test-zig0` - with `num_passing` bumped. The mismatch message includes - `[zig_ip_base=N]` — this is the Zig compiler's IP items count at - function analysis time. The gap between `a` and `b` IP refs tells - you how many entries the C sema is missing. -2. **Compare.** Note the mismatch: `a=0x???[ip] b=0x???[ip]`. The gap - `a − b` is the number of missing IP entries. -3. **Port the next batch.** Identify what the Zig compiler creates for - the next ~10 IP entries (struct types, ptr_nav, enum types, etc.). - Port the corresponding logic from `src/Zcu/PerThread.zig` and - `src/Sema.zig` into `stage0/sema.c`. Key functions to port: - - `createFileRootStruct` → creates `type_struct` IP entry for a - module's root. - - `scanNamespace` → iterates declarations, creates `ptr_nav` entries - for each Nav. - - `getStructType` / `getEnumType` → creates type entries in IP. - - `ensureFileAnalyzed` → recursively processes imported modules. - - `zirExport` → forces resolution of exported declarations. -4. **Test.** `zig build test-zig0` — the gap should shrink. -5. **Clean up & commit** (see [Cleaning Up](#cleaning-up)), then - **immediately continue** to step 3. Do NOT stop here. Keep - `num_passing` at whatever value passes; don't bump it until the gap - reaches zero. -6. **Exit condition:** the gap is zero, `num_passing` is incremented, - and `zig build test-zig0` passes. Only then return to - [The loop](#the-loop) step 6. - -### Important constraints - -- Do NOT hardcode IP entries from a dump. The entries must be computed - from the ZIR, matching the Zig compiler's processing. -- **callconv = .c and wasm32-wasi target hardcoding are OK.** This is a - bootstrap interpreter targeting only that platform. Do not spend time - making these configurable. -- Do NOT hardcode other target-specific values (enum tags, field indices, - type indices, etc.) into sema.c. All such values must be computed by - porting the upstream Zig functions that produce them. Port the function, - not the output. -- Do NOT include generated `.c` files. All logic belongs in `sema.c`. -- Entry ORDER matters. The C sema must create entries in the same - order as the Zig compiler. Follow the Zig compiler's processing - sequence (struct type → scan declarations → process imports → - resolve comptime blocks). -- Deduplication matters. If the function body interns a value that was - already created during module-level analysis, `ipIntern` must return - the existing index (not create a duplicate). -- Port **functions**, not entries. The IP entries are a consequence of - running the upstream functions (`createFileRootStruct`, - `scanNamespace`, `ensureFileAnalyzed`, etc.) correctly. Port those - functions mechanically from `src/Zcu/PerThread.zig` to C; do not try - to understand every individual IP entry before starting to code. -- **NEVER fill the IP gap by pre-resolving types or declarations.** - Do NOT resolve declarations by name, enumerate namespaces, or force - type evaluation just to create IP entries and "fill up" the gap. - This applies to builtin types, std types, module declarations, or - any other source. Every IP entry must be created as a side effect - of honestly porting the upstream function that produces it. If the - Zig compiler creates entries through `zirExport` → `getBuiltinType` - → `analyzeMemoizedState`, then port that chain. If it creates - entries through `scanNamespace` → `analyzeNav`, port that. Do not - invent shortcuts that produce the right count but through the wrong - mechanism. -- Time-box investigation. Spend at most ~10 minutes investigating a - problem before writing code. Use `--verbose-intern-pool` dumps to - verify progress after each batch, not to plan the entire - implementation upfront. - -### Anti-pattern: analysis paralysis / proving work unnecessary - -When the gap is large (hundreds of entries), the temptation is to spend rounds -analyzing whether the entries are "really needed" or whether a shortcut exists. -**This is always wrong.** The entries are needed — the test comparison is -byte-for-byte with no IP base adjustment. - -Signs you are bailing out: -- Asking "does the AIR actually reference non-pre-interned IP indices?" -- Exploring "what if we DON'T create module-level entries?" -- Running verbose-air dumps to prove the function body is simple -- Suggesting the IP count "might not matter" - -The correct response to a large gap: -1. Fix the immediate crash/assertion (e.g. increase buffer sizes) -2. Port the next upstream function that creates entries -3. Test, measure gap reduction, commit -4. Repeat +Time-box investigation: spend at most ~10 minutes investigating before +writing code. ## AIR comparison exceptions @@ -198,33 +129,20 @@ Before committing, ensure the branch stays green: extraneous output. If a test that previously passed now fails, that is a regression. Do not commit. -Go back and fix it — never commit with fewer passing tests than before. If -it's not a test failure but a formatting/linting issue, fix it before committing. +Go back and fix it — never commit with fewer passing tests than before. -# General rules +## General rules -- When porting features from upstream Zig, it should be a **mechanical copy**. - Don't invent. Keep the structure in place, name functions and types the same - way (or within reason equivalently if there are namespacing constraints). It - should be easy to reference one from the other; and, if there are semantic - differences, they *must* be because Zig or C does not support certain - features (like errdefer). -- When translating functions from Zig to C (mechanically, remember?), add them - in the same order as in the original Zig file. -- **Never ever** remove zig-cache, neither local nor global. +- **Never** remove zig-cache, neither local nor global. - Zig code is in ~/code/zig, don't look at /nix/... -- Debug printfs: add printfs only when debugging a specific issue; when done - debugging, remove them (or comment them if you may find them useful later). I - prefer committing code only when `zig build` returns no output. +- Debug printfs: remove when done debugging. `zig build` should produce no + output. - Always complete all tasks before stopping. Do not stop to ask for - confirmation mid-task. If you have remaining work, continue without waiting - for input. -- No `cppcheck` suppressions. They are here for a reason. If it is complaining - about automatic variables, make it non-automatic. I.e. find a way to satisfy - the linter, do not suppress it. + confirmation mid-task. +- No `cppcheck` suppressions — satisfy the linter, don't suppress it. - See `stage0/README.md` for testing commands and debugging tips. -# Tools +## Tools - A debug build of the Zig compiler is available at `zig-out/bin/zig` (build with `zig build`). It has `--verbose-intern-pool` enabled.