skill: update enable-tests to never skip, always drill down
Instead of skipping/commenting out tests when they need complex features, the worker now decomposes the problem into smaller pieces: - Analyze missing ZIR instructions via ast-check -t - Add minimal unit tests in the appropriate test file (e.g. sema_test.zig) - Implement one handler at a time, keeping unit tests green - Build up from simple to compound constructs - Re-enable the corpus test only when all pieces work Key changes to orchestrator: - Add "Unit test file" column to file configuration table - Add "Drill-Down Strategy" section - stages_test.zig now includes sema_test.zig in git-add list - progress commits include unit tests + handlers Key changes to worker: - Add full "Drill-Down Strategy" section with examples - Never skip/comment out a test — decompose instead - Each iteration must produce at least one passing unit test - Unit test file always stays green Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -39,16 +39,20 @@ Zig/C code, or edit C source files. The worker handles all of that.
|
||||
C↔Zig API boundary (e.g. `src/test_exports.zig`, `src/verbose_air.zig`)
|
||||
so the internals are properly exposed and maintainable long-term, and
|
||||
commit the change.
|
||||
6. **NEVER skip tests.** When a corpus test needs complex features, the worker
|
||||
must drill down: analyze what's missing, add unit tests in the appropriate
|
||||
test file for each small piece, implement them, and build back up until the
|
||||
corpus test passes. See "Drill-Down Strategy" below.
|
||||
|
||||
## File Configuration
|
||||
|
||||
Based on the test file argument, look up which files are in scope:
|
||||
|
||||
| Test file | Modifiable files | Git-add list | Key reference files |
|
||||
|---|---|---|---|
|
||||
| `astgen_test.zig` | `astgen.c`, `astgen_test.zig` | `stage0/astgen.c stage0/astgen_test.zig` | `lib/std/zig/AstGen.zig`, `lib/std/zig/Ast.zig`, `lib/std/zig/Zir.zig` |
|
||||
| `stages_test.zig` | `astgen.c`, `parser.c`, `sema.c`, `stages_test.zig` | `stage0/astgen.c stage0/parser.c stage0/sema.c stage0/stages_test.zig` | `lib/std/zig/AstGen.zig`, `lib/std/zig/Parse.zig`, `lib/std/zig/Ast.zig`, `lib/std/zig/Zir.zig`, `src/Sema.zig` |
|
||||
| `sema_test.zig` | `sema.c`, `intern_pool.c`, `sema_test.zig` | `stage0/sema.c stage0/intern_pool.c stage0/sema_test.zig` | `src/Sema.zig`, `src/InternPool.zig`, `src/Air.zig` |
|
||||
| Test file | Modifiable files | Git-add list | Unit test file | Key reference files |
|
||||
|---|---|---|---|---|
|
||||
| `astgen_test.zig` | `astgen.c`, `astgen_test.zig` | `stage0/astgen.c stage0/astgen_test.zig` | `astgen_test.zig` | `lib/std/zig/AstGen.zig`, `lib/std/zig/Ast.zig`, `lib/std/zig/Zir.zig` |
|
||||
| `stages_test.zig` | `astgen.c`, `parser.c`, `sema.c`, `sema_test.zig`, `stages_test.zig` | `stage0/astgen.c stage0/parser.c stage0/sema.c stage0/sema_test.zig stage0/stages_test.zig` | `sema_test.zig` | `lib/std/zig/AstGen.zig`, `lib/std/zig/Parse.zig`, `lib/std/zig/Ast.zig`, `lib/std/zig/Zir.zig`, `src/Sema.zig` |
|
||||
| `sema_test.zig` | `sema.c`, `intern_pool.c`, `sema_test.zig` | `stage0/sema.c stage0/intern_pool.c stage0/sema_test.zig` | `sema_test.zig` | `src/Sema.zig`, `src/InternPool.zig`, `src/Air.zig` |
|
||||
|
||||
All C source paths are under `stage0/`. Reference paths are relative to the
|
||||
repository root.
|
||||
@@ -57,6 +61,44 @@ If the test file is not in the table, infer by inspecting the file's imports
|
||||
and the types of failures it can produce. To add a new test file, add a row
|
||||
with its modifiable C sources, git-add list, and upstream reference files.
|
||||
|
||||
## Drill-Down Strategy
|
||||
|
||||
When a corpus test (e.g., `neghf2.zig`) fails because the C sema is missing
|
||||
features, **do NOT skip or comment it out**. Instead, the worker must:
|
||||
|
||||
1. **Analyze**: dump the ZIR for the failing file (`zig ast-check -t <file>`)
|
||||
and identify which ZIR instructions in the function body are unhandled.
|
||||
2. **Decompose**: for each missing instruction, write the smallest possible
|
||||
unit test in the unit test file (e.g., `sema_test.zig`) that exercises it.
|
||||
Example: if `shl` is missing, add:
|
||||
```zig
|
||||
test "sema air: bit shift left" {
|
||||
try semaAirRawCheck("export fn f(x: u32) u32 { return x << 1; }");
|
||||
}
|
||||
```
|
||||
3. **Implement**: port the handler from upstream, make the unit test pass.
|
||||
4. **Repeat**: move on to the next missing instruction. Each iteration adds
|
||||
one unit test + one handler.
|
||||
5. **Re-test corpus**: after all constituent pieces pass as unit tests, try
|
||||
enabling the corpus test again. If it still fails (e.g., a new instruction
|
||||
appears), repeat from step 1.
|
||||
|
||||
The worker should commit unit tests and their implementations together. The
|
||||
corpus test stays disabled until ALL its pieces work, then it's enabled in a
|
||||
final commit.
|
||||
|
||||
**When a feature is too complex for a single unit test** (e.g., inline
|
||||
function calls require `decl_ref` + `call` + `dbg_inline_block` + `br`),
|
||||
decompose further:
|
||||
- First test `decl_ref` alone (if possible with a simpler construct)
|
||||
- Then `call` with a same-file function
|
||||
- Then inline call mechanics (`dbg_inline_block`, `dbg_arg_inline`, `br`)
|
||||
- Then the full construct
|
||||
|
||||
**The unit test file always stays green.** Every test you add must pass
|
||||
before the worker returns. Failing tests are never committed. The corpus test
|
||||
stays commented until it's ready.
|
||||
|
||||
## Phase 0: Verify baseline is green
|
||||
|
||||
Run the full test suite:
|
||||
@@ -115,6 +157,7 @@ when you have accumulated ~10 passing tests, flush the batch:
|
||||
- Test file name
|
||||
- What was enabled: test name or corpus file path + line number
|
||||
- Modifiable C files (from the configuration table above)
|
||||
- Unit test file (from the configuration table above)
|
||||
- Key reference files (from the configuration table above)
|
||||
- If this is a retry after `progress`, include the previous worker's
|
||||
COMMIT_MSG so the new worker knows what was already done.
|
||||
@@ -135,15 +178,16 @@ treat it as `no-progress`.
|
||||
|
||||
**If STATUS is `pass`:**
|
||||
|
||||
The worker made the test pass (possibly with C code changes). Go to
|
||||
Step 5 to verify and commit.
|
||||
The worker made the test pass (possibly with C code changes and new unit
|
||||
tests). Go to Step 5 to verify and commit.
|
||||
|
||||
**If STATUS is `progress`:**
|
||||
|
||||
The worker made partial progress but the test still fails. The worker
|
||||
has re-disabled the test. Go to Step 5 to verify and commit the partial
|
||||
C code fixes. After committing, re-enable the same test and go to
|
||||
Step 3, including the previous COMMIT_MSG in the context.
|
||||
The worker made partial progress but the corpus test still fails. The worker
|
||||
has re-disabled the corpus test but added passing unit tests and C handlers.
|
||||
Go to Step 5 to verify and commit the unit tests + C fixes. After committing,
|
||||
re-enable the same corpus test and go to Step 3, including the previous
|
||||
COMMIT_MSG in the context.
|
||||
|
||||
**If STATUS is `no-progress`:**
|
||||
|
||||
@@ -203,7 +247,7 @@ tests are enabled.
|
||||
|
||||
- Batch of trivially-passing tests: `"<test_file>: enable <file1>, <file2>, ..."`
|
||||
- Single test enabled with C fixes: use the worker's COMMIT_MSG
|
||||
- Partial progress (test re-disabled): `"<test_file>: <worker COMMIT_MSG>"`
|
||||
- Partial progress with unit tests: `"sema: <worker COMMIT_MSG>"`
|
||||
- If the list is too long: `"<test_file>: enable N files from <dir>/"`
|
||||
|
||||
## Final Check
|
||||
|
||||
@@ -11,6 +11,9 @@ C code differs from Zig, copy the Zig structure into C.
|
||||
**Your goal is to make the test pass.** Do not re-disable the test unless you
|
||||
have exhausted your iteration budget and cannot make further progress.
|
||||
|
||||
**NEVER skip a failing test.** If a test needs complex features, break it
|
||||
into smaller pieces and build up. See the "Drill-Down Strategy" section.
|
||||
|
||||
## Context from orchestrator
|
||||
|
||||
{{TEST_CONTEXT}}
|
||||
@@ -77,23 +80,21 @@ diffs and the first tag mismatch position.
|
||||
**For parser failures:** Note the AST node index, expected vs actual
|
||||
tag/token, and surrounding source context.
|
||||
|
||||
### Step 4: Compare and port
|
||||
### Step 4: Compare and port (or drill down)
|
||||
|
||||
Find the upstream Zig function that corresponds to the failing code path.
|
||||
Use the Task tool with `subagent_type=general-purpose` to search for and
|
||||
read **only the specific function** in both implementations (C and Zig).
|
||||
Do NOT read entire files -- request just the relevant function(s).
|
||||
|
||||
Enumerate differences that affect output:
|
||||
- Extra data written (field order, conditional fields, body lengths)
|
||||
- Instruction tags emitted
|
||||
- String table entries
|
||||
- Break payload values
|
||||
- AST node tags and token associations
|
||||
- Missing sema handlers (functions that exist in Zig but not in C)
|
||||
**If the fix is straightforward** (a missing case, a wrong field, etc.),
|
||||
apply the minimal mechanical change and test.
|
||||
|
||||
Apply the minimal mechanical change to match the upstream. Make ONE change
|
||||
at a time. After each change, run:
|
||||
**If the fix requires complex new infrastructure** (e.g., a corpus test
|
||||
needs inline function calls, cross-module imports, etc.), use the
|
||||
**Drill-Down Strategy** below.
|
||||
|
||||
After each change, run:
|
||||
|
||||
```sh
|
||||
./zig-out/bin/zig build test-zig0 -Dzig0-cc=tcc 2>&1 | tail -20
|
||||
@@ -108,17 +109,91 @@ Check that no previously-passing tests broke.
|
||||
- AST mismatch moved to a later node
|
||||
- A different function name now appears in "not found" error
|
||||
- The error changes from "not found" to a different type
|
||||
- New unit tests were added and pass
|
||||
|
||||
## Drill-Down Strategy
|
||||
|
||||
When a test (typically a corpus test in `stages_test.zig`) fails because the
|
||||
C sema is missing multiple features, **do NOT skip or comment out the test
|
||||
and give up**. Instead, decompose the problem:
|
||||
|
||||
### 1. Analyze what's missing
|
||||
|
||||
Dump the ZIR for the failing file:
|
||||
```sh
|
||||
./zig-out/bin/zig ast-check -t <file>
|
||||
```
|
||||
|
||||
Identify which ZIR instructions in the function body are not handled by the
|
||||
C sema's `analyzeBodyInner`. Also dump the AIR to see what the Zig compiler
|
||||
produces:
|
||||
```sh
|
||||
./zig-out/bin/zig build-obj <file> --verbose-air -target wasm32-wasi -OReleaseFast
|
||||
```
|
||||
|
||||
### 2. Write the smallest unit test for each missing piece
|
||||
|
||||
For each unhandled ZIR instruction, write the smallest possible
|
||||
`semaAirRawCheck` test in the unit test file that exercises ONLY that
|
||||
instruction. Examples:
|
||||
|
||||
| Missing ZIR | Minimal unit test |
|
||||
|---|---|
|
||||
| `shl` | `"export fn f(x: u32) u32 { return x << 1; }"` |
|
||||
| `int_cast` | `"export fn f(x: u16) u32 { return @intCast(x); }"` |
|
||||
| `mul` | `"export fn f(x: u32, y: u32) u32 { return x * y; }"` |
|
||||
| `store_node` | `"export fn f(x: *u32) void { x.* = 42; }"` |
|
||||
| `dbg_var_val` | `"export fn f(x: u32) u32 { const y = x + 1; return y; }"` |
|
||||
|
||||
**Important:** each unit test must be SMALL -- one new feature at a time.
|
||||
If a test needs TWO new features (e.g., `shl` needs `typeof_log2_int_type`
|
||||
AND `as_shift_operand` AND `shl` itself), write separate tests if possible,
|
||||
or at minimum understand all three before implementing.
|
||||
|
||||
### 3. Implement one handler at a time
|
||||
|
||||
For each unit test:
|
||||
1. Check the ZIR data format (`zig ast-check -t` on a temp file)
|
||||
2. Read the upstream Zig handler in `src/Sema.zig` (use Grep, don't read
|
||||
the whole file)
|
||||
3. Port the simplified version to C
|
||||
4. Run tests, verify the unit test passes AND no regressions
|
||||
|
||||
### 4. Build up to more complex constructs
|
||||
|
||||
After individual instructions work, combine them:
|
||||
- Simple → compound: first `add`, then `"(x + y) ^ 0xFF"`
|
||||
- Same type → cross type: first `u32 + u32`, then `@intCast(u16) + u32`
|
||||
- Single function → inline call: first individual ops, then
|
||||
`dbg_inline_block` + `br` + `dbg_arg_inline`
|
||||
|
||||
### 5. Re-test the corpus test
|
||||
|
||||
After all constituent pieces work as unit tests, try enabling the corpus
|
||||
test again. If it still fails with a NEW error (different instruction),
|
||||
repeat from step 1. If it passes, you're done.
|
||||
|
||||
### Key principle: the unit test file always stays green
|
||||
|
||||
- Every unit test you add MUST pass before you return.
|
||||
- NEVER commit a failing test. If a test fails, fix the handler first.
|
||||
- NEVER comment out or skip a failing unit test you just added -- either
|
||||
make it pass or remove it.
|
||||
- The corpus test stays disabled until ALL its pieces work, then it's
|
||||
enabled in a final commit.
|
||||
|
||||
### Step 5: Clean up
|
||||
|
||||
1. Remove ALL `fprintf`/`printf` debug statements from C files.
|
||||
2. Remove `#include <stdio.h>` if it was added for debugging.
|
||||
3. Verify with: `grep -n 'fprintf\|printf' stage0/astgen.c stage0/parser.c stage0/sema.c`
|
||||
4. If the test still fails after exhausting your iteration budget:
|
||||
4. If the corpus test still fails after exhausting your iteration budget:
|
||||
- **For SkipZigTest:** re-add the `if (true) return error.SkipZigTest;`
|
||||
line with a TODO comment describing the remaining diff.
|
||||
- **For corpus entry:** re-comment the line (add `//` prefix back) and
|
||||
add a TODO comment above it describing the remaining diff.
|
||||
update the comment describing what's still missing.
|
||||
- **DO report `progress`** if you added any unit tests + handlers, even
|
||||
if the corpus test isn't enabled yet. The partial work IS valuable.
|
||||
5. Final verification -- this must exit 0:
|
||||
|
||||
```sh
|
||||
@@ -136,7 +211,8 @@ COMMIT_MSG: <one-line descriptive message about what was ported/fixed>
|
||||
|
||||
- `pass` -- the enabled test now passes (entry remains enabled)
|
||||
- `progress` -- partial progress was made toward making the test pass
|
||||
(entry was re-disabled with TODO); C code changes are worth committing
|
||||
(entry was re-disabled with updated comment); C code changes and/or new
|
||||
unit tests are worth committing
|
||||
- `no-progress` -- no measurable improvement (entry was re-disabled)
|
||||
|
||||
## Rules
|
||||
@@ -170,3 +246,6 @@ COMMIT_MSG: <one-line descriptive message about what was ported/fixed>
|
||||
change `src/` to expose compiler internals, change the C↔Zig API boundary
|
||||
(e.g. `src/test_exports.zig`, `src/verbose_air.zig`) so the internals are
|
||||
properly exposed and maintainable long-term, and commit the change.
|
||||
- **NEVER skip a test.** If a feature is complex, break it into smaller
|
||||
pieces with unit tests. Every iteration should produce at least one new
|
||||
passing unit test.
|
||||
|
||||
Reference in New Issue
Block a user