skill: update enable-tests to never skip, always drill down

Instead of skipping/commenting out tests when they need complex features,
the worker now decomposes the problem into smaller pieces:
- Analyze missing ZIR instructions via ast-check -t
- Add minimal unit tests in the appropriate test file (e.g. sema_test.zig)
- Implement one handler at a time, keeping unit tests green
- Build up from simple to compound constructs
- Re-enable the corpus test only when all pieces work

Key changes to orchestrator:
- Add "Unit test file" column to file configuration table
- Add "Drill-Down Strategy" section
- stages_test.zig now includes sema_test.zig in git-add list
- progress commits include unit tests + handlers

Key changes to worker:
- Add full "Drill-Down Strategy" section with examples
- Never skip/comment out a test — decompose instead
- Each iteration must produce at least one passing unit test
- Unit test file always stays green

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-20 13:57:38 +00:00
parent ad7aead47d
commit 10ff356d92
2 changed files with 148 additions and 25 deletions

View File

@@ -39,16 +39,20 @@ Zig/C code, or edit C source files. The worker handles all of that.
C↔Zig API boundary (e.g. `src/test_exports.zig`, `src/verbose_air.zig`)
so the internals are properly exposed and maintainable long-term, and
commit the change.
6. **NEVER skip tests.** When a corpus test needs complex features, the worker
must drill down: analyze what's missing, add unit tests in the appropriate
test file for each small piece, implement them, and build back up until the
corpus test passes. See "Drill-Down Strategy" below.
## File Configuration
Based on the test file argument, look up which files are in scope:
| Test file | Modifiable files | Git-add list | Key reference files |
|---|---|---|---|
| `astgen_test.zig` | `astgen.c`, `astgen_test.zig` | `stage0/astgen.c stage0/astgen_test.zig` | `lib/std/zig/AstGen.zig`, `lib/std/zig/Ast.zig`, `lib/std/zig/Zir.zig` |
| `stages_test.zig` | `astgen.c`, `parser.c`, `sema.c`, `stages_test.zig` | `stage0/astgen.c stage0/parser.c stage0/sema.c stage0/stages_test.zig` | `lib/std/zig/AstGen.zig`, `lib/std/zig/Parse.zig`, `lib/std/zig/Ast.zig`, `lib/std/zig/Zir.zig`, `src/Sema.zig` |
| `sema_test.zig` | `sema.c`, `intern_pool.c`, `sema_test.zig` | `stage0/sema.c stage0/intern_pool.c stage0/sema_test.zig` | `src/Sema.zig`, `src/InternPool.zig`, `src/Air.zig` |
| Test file | Modifiable files | Git-add list | Unit test file | Key reference files |
|---|---|---|---|---|
| `astgen_test.zig` | `astgen.c`, `astgen_test.zig` | `stage0/astgen.c stage0/astgen_test.zig` | `astgen_test.zig` | `lib/std/zig/AstGen.zig`, `lib/std/zig/Ast.zig`, `lib/std/zig/Zir.zig` |
| `stages_test.zig` | `astgen.c`, `parser.c`, `sema.c`, `sema_test.zig`, `stages_test.zig` | `stage0/astgen.c stage0/parser.c stage0/sema.c stage0/sema_test.zig stage0/stages_test.zig` | `sema_test.zig` | `lib/std/zig/AstGen.zig`, `lib/std/zig/Parse.zig`, `lib/std/zig/Ast.zig`, `lib/std/zig/Zir.zig`, `src/Sema.zig` |
| `sema_test.zig` | `sema.c`, `intern_pool.c`, `sema_test.zig` | `stage0/sema.c stage0/intern_pool.c stage0/sema_test.zig` | `sema_test.zig` | `src/Sema.zig`, `src/InternPool.zig`, `src/Air.zig` |
All C source paths are under `stage0/`. Reference paths are relative to the
repository root.
@@ -57,6 +61,44 @@ If the test file is not in the table, infer by inspecting the file's imports
and the types of failures it can produce. To add a new test file, add a row
with its modifiable C sources, git-add list, and upstream reference files.
## Drill-Down Strategy
When a corpus test (e.g., `neghf2.zig`) fails because the C sema is missing
features, **do NOT skip or comment it out**. Instead, the worker must:
1. **Analyze**: dump the ZIR for the failing file (`zig ast-check -t <file>`)
and identify which ZIR instructions in the function body are unhandled.
2. **Decompose**: for each missing instruction, write the smallest possible
unit test in the unit test file (e.g., `sema_test.zig`) that exercises it.
Example: if `shl` is missing, add:
```zig
test "sema air: bit shift left" {
try semaAirRawCheck("export fn f(x: u32) u32 { return x << 1; }");
}
```
3. **Implement**: port the handler from upstream, make the unit test pass.
4. **Repeat**: move on to the next missing instruction. Each iteration adds
one unit test + one handler.
5. **Re-test corpus**: after all constituent pieces pass as unit tests, try
enabling the corpus test again. If it still fails (e.g., a new instruction
appears), repeat from step 1.
The worker should commit unit tests and their implementations together. The
corpus test stays disabled until ALL its pieces work, then it's enabled in a
final commit.
**When a feature is too complex for a single unit test** (e.g., inline
function calls require `decl_ref` + `call` + `dbg_inline_block` + `br`),
decompose further:
- First test `decl_ref` alone (if possible with a simpler construct)
- Then `call` with a same-file function
- Then inline call mechanics (`dbg_inline_block`, `dbg_arg_inline`, `br`)
- Then the full construct
**The unit test file always stays green.** Every test you add must pass
before the worker returns. Failing tests are never committed. The corpus test
stays commented until it's ready.
## Phase 0: Verify baseline is green
Run the full test suite:
@@ -115,6 +157,7 @@ when you have accumulated ~10 passing tests, flush the batch:
- Test file name
- What was enabled: test name or corpus file path + line number
- Modifiable C files (from the configuration table above)
- Unit test file (from the configuration table above)
- Key reference files (from the configuration table above)
- If this is a retry after `progress`, include the previous worker's
COMMIT_MSG so the new worker knows what was already done.
@@ -135,15 +178,16 @@ treat it as `no-progress`.
**If STATUS is `pass`:**
The worker made the test pass (possibly with C code changes). Go to
Step 5 to verify and commit.
The worker made the test pass (possibly with C code changes and new unit
tests). Go to Step 5 to verify and commit.
**If STATUS is `progress`:**
The worker made partial progress but the test still fails. The worker
has re-disabled the test. Go to Step 5 to verify and commit the partial
C code fixes. After committing, re-enable the same test and go to
Step 3, including the previous COMMIT_MSG in the context.
The worker made partial progress but the corpus test still fails. The worker
has re-disabled the corpus test but added passing unit tests and C handlers.
Go to Step 5 to verify and commit the unit tests + C fixes. After committing,
re-enable the same corpus test and go to Step 3, including the previous
COMMIT_MSG in the context.
**If STATUS is `no-progress`:**
@@ -203,7 +247,7 @@ tests are enabled.
- Batch of trivially-passing tests: `"<test_file>: enable <file1>, <file2>, ..."`
- Single test enabled with C fixes: use the worker's COMMIT_MSG
- Partial progress (test re-disabled): `"<test_file>: <worker COMMIT_MSG>"`
- Partial progress with unit tests: `"sema: <worker COMMIT_MSG>"`
- If the list is too long: `"<test_file>: enable N files from <dir>/"`
## Final Check

View File

@@ -11,6 +11,9 @@ C code differs from Zig, copy the Zig structure into C.
**Your goal is to make the test pass.** Do not re-disable the test unless you
have exhausted your iteration budget and cannot make further progress.
**NEVER skip a failing test.** If a test needs complex features, break it
into smaller pieces and build up. See the "Drill-Down Strategy" section.
## Context from orchestrator
{{TEST_CONTEXT}}
@@ -77,23 +80,21 @@ diffs and the first tag mismatch position.
**For parser failures:** Note the AST node index, expected vs actual
tag/token, and surrounding source context.
### Step 4: Compare and port
### Step 4: Compare and port (or drill down)
Find the upstream Zig function that corresponds to the failing code path.
Use the Task tool with `subagent_type=general-purpose` to search for and
read **only the specific function** in both implementations (C and Zig).
Do NOT read entire files -- request just the relevant function(s).
Enumerate differences that affect output:
- Extra data written (field order, conditional fields, body lengths)
- Instruction tags emitted
- String table entries
- Break payload values
- AST node tags and token associations
- Missing sema handlers (functions that exist in Zig but not in C)
**If the fix is straightforward** (a missing case, a wrong field, etc.),
apply the minimal mechanical change and test.
Apply the minimal mechanical change to match the upstream. Make ONE change
at a time. After each change, run:
**If the fix requires complex new infrastructure** (e.g., a corpus test
needs inline function calls, cross-module imports, etc.), use the
**Drill-Down Strategy** below.
After each change, run:
```sh
./zig-out/bin/zig build test-zig0 -Dzig0-cc=tcc 2>&1 | tail -20
@@ -108,17 +109,91 @@ Check that no previously-passing tests broke.
- AST mismatch moved to a later node
- A different function name now appears in "not found" error
- The error changes from "not found" to a different type
- New unit tests were added and pass
## Drill-Down Strategy
When a test (typically a corpus test in `stages_test.zig`) fails because the
C sema is missing multiple features, **do NOT skip or comment out the test
and give up**. Instead, decompose the problem:
### 1. Analyze what's missing
Dump the ZIR for the failing file:
```sh
./zig-out/bin/zig ast-check -t <file>
```
Identify which ZIR instructions in the function body are not handled by the
C sema's `analyzeBodyInner`. Also dump the AIR to see what the Zig compiler
produces:
```sh
./zig-out/bin/zig build-obj <file> --verbose-air -target wasm32-wasi -OReleaseFast
```
### 2. Write the smallest unit test for each missing piece
For each unhandled ZIR instruction, write the smallest possible
`semaAirRawCheck` test in the unit test file that exercises ONLY that
instruction. Examples:
| Missing ZIR | Minimal unit test |
|---|---|
| `shl` | `"export fn f(x: u32) u32 { return x << 1; }"` |
| `int_cast` | `"export fn f(x: u16) u32 { return @intCast(x); }"` |
| `mul` | `"export fn f(x: u32, y: u32) u32 { return x * y; }"` |
| `store_node` | `"export fn f(x: *u32) void { x.* = 42; }"` |
| `dbg_var_val` | `"export fn f(x: u32) u32 { const y = x + 1; return y; }"` |
**Important:** each unit test must be SMALL -- one new feature at a time.
If a test needs TWO new features (e.g., `shl` needs `typeof_log2_int_type`
AND `as_shift_operand` AND `shl` itself), write separate tests if possible,
or at minimum understand all three before implementing.
### 3. Implement one handler at a time
For each unit test:
1. Check the ZIR data format (`zig ast-check -t` on a temp file)
2. Read the upstream Zig handler in `src/Sema.zig` (use Grep, don't read
the whole file)
3. Port the simplified version to C
4. Run tests, verify the unit test passes AND no regressions
### 4. Build up to more complex constructs
After individual instructions work, combine them:
- Simple → compound: first `add`, then `"(x + y) ^ 0xFF"`
- Same type → cross type: first `u32 + u32`, then `@intCast(u16) + u32`
- Single function → inline call: first individual ops, then
`dbg_inline_block` + `br` + `dbg_arg_inline`
### 5. Re-test the corpus test
After all constituent pieces work as unit tests, try enabling the corpus
test again. If it still fails with a NEW error (different instruction),
repeat from step 1. If it passes, you're done.
### Key principle: the unit test file always stays green
- Every unit test you add MUST pass before you return.
- NEVER commit a failing test. If a test fails, fix the handler first.
- NEVER comment out or skip a failing unit test you just added -- either
make it pass or remove it.
- The corpus test stays disabled until ALL its pieces work, then it's
enabled in a final commit.
### Step 5: Clean up
1. Remove ALL `fprintf`/`printf` debug statements from C files.
2. Remove `#include <stdio.h>` if it was added for debugging.
3. Verify with: `grep -n 'fprintf\|printf' stage0/astgen.c stage0/parser.c stage0/sema.c`
4. If the test still fails after exhausting your iteration budget:
4. If the corpus test still fails after exhausting your iteration budget:
- **For SkipZigTest:** re-add the `if (true) return error.SkipZigTest;`
line with a TODO comment describing the remaining diff.
- **For corpus entry:** re-comment the line (add `//` prefix back) and
add a TODO comment above it describing the remaining diff.
update the comment describing what's still missing.
- **DO report `progress`** if you added any unit tests + handlers, even
if the corpus test isn't enabled yet. The partial work IS valuable.
5. Final verification -- this must exit 0:
```sh
@@ -136,7 +211,8 @@ COMMIT_MSG: <one-line descriptive message about what was ported/fixed>
- `pass` -- the enabled test now passes (entry remains enabled)
- `progress` -- partial progress was made toward making the test pass
(entry was re-disabled with TODO); C code changes are worth committing
(entry was re-disabled with updated comment); C code changes and/or new
unit tests are worth committing
- `no-progress` -- no measurable improvement (entry was re-disabled)
## Rules
@@ -170,3 +246,6 @@ COMMIT_MSG: <one-line descriptive message about what was ported/fixed>
change `src/` to expose compiler internals, change the C↔Zig API boundary
(e.g. `src/test_exports.zig`, `src/verbose_air.zig`) so the internals are
properly exposed and maintainable long-term, and commit the change.
- **NEVER skip a test.** If a feature is complex, break it into smaller
pieces with unit tests. Every iteration should produce at least one new
passing unit test.