This commit is contained in:
2026-02-24 05:46:52 +02:00
parent 5746beb822
commit d1d9a92855
2 changed files with 0 additions and 532 deletions

View File

@@ -1,281 +0,0 @@
---
name: enable-tests
description: Sequentially enable disabled tests, fixing divergences by mechanical porting from upstream, and commit.
allowed-tools: Read, Write, Edit, Bash, Grep, Glob, Task, TaskOutput
disable-model-invocation: true
---
# Enable Tests — Orchestrator
You manage the sequential loop of enabling disabled tests in a Zig test file.
For each test you enable it, dispatch a worker sub-agent to test and fix it,
then verify and commit. You go through the corpus **in order** and do not
skip tests.
**Input:** The test file is provided as the skill argument (e.g.,
`stage0/astgen_test.zig`). If no argument is given, ask the user.
**You do NOT**: run test commands (except Phase 0 baseline and one
verification before committing), analyze test output in detail, compare
Zig/C code, or edit C source files. The worker handles all of that.
**CRITICAL RULES:**
1. **NEVER stop early.** Do not pause to ask the user if you should continue.
Do not summarize remaining work and wait. Keep looping until every disabled
test is enabled and passing, or a test is stuck (see guard rail).
2. **Go in order.** Enable tests sequentially. Do not skip ahead to easier
tests. If a test requires porting a major feature (e.g. `zirFunc`), that
is the worker's job.
3. **ALWAYS verify before committing.** Run the test command once after the
worker returns. Never trust the worker's claim alone.
4. **NEVER run `zig build test` or `zig build` without arguments.** These
build and test upstream Zig, which takes ages and is irrelevant to zig0.
Always use `zig build test-zig0 -Dzig0-cc=tcc` for testing.
5. **Avoid modifying files under `src/`.** Changing any `src/` file
invalidates the `verbose_dumper` cache, triggering a ~7 minute
recompilation of zig_internals (InternPool, Air, Compilation, etc.). To
inspect AIR, use `zig-out/bin/zig build-obj --verbose-air <file>` instead.
If you do need to change `src/` to expose compiler internals, change the
C↔Zig API boundary (e.g. `src/test_exports.zig`, `src/verbose_air.zig`)
so the internals are properly exposed and maintainable long-term, and
commit the change.
6. **NEVER skip tests.** When a corpus test needs complex features, the worker
must drill down: analyze what's missing, add unit tests in the appropriate
test file for each small piece, implement them, and build back up until the
corpus test passes. See "Drill-Down Strategy" below.
## Monitoring Workers
Workers run in the background so the user can observe their progress.
After dispatching a worker, **always** print the output file path:
```
Worker output: <output_file>
```
Then wait for the result using `TaskOutput` with `block=true`. The user
can `tail -f` the output file in another terminal to watch the worker's
full transcript (tool calls, reasoning, output) in real time.
## File Configuration
Based on the test file argument, look up which files are in scope:
| Test file | Modifiable files | Git-add list | Unit test file | Key reference files |
|---|---|---|---|---|
| `astgen_test.zig` | `astgen.c`, `astgen_test.zig` | `stage0/astgen.c stage0/astgen_test.zig` | `astgen_test.zig` | `lib/std/zig/AstGen.zig`, `lib/std/zig/Ast.zig`, `lib/std/zig/Zir.zig` |
| `stages_test.zig` | `astgen.c`, `parser.c`, `sema.c`, `sema_test.zig`, `stages_test.zig` | `stage0/astgen.c stage0/parser.c stage0/sema.c stage0/sema_test.zig stage0/stages_test.zig` | `sema_test.zig` | `lib/std/zig/AstGen.zig`, `lib/std/zig/Parse.zig`, `lib/std/zig/Ast.zig`, `lib/std/zig/Zir.zig`, `src/Sema.zig` |
| `sema_test.zig` | `sema.c`, `intern_pool.c`, `sema_test.zig` | `stage0/sema.c stage0/intern_pool.c stage0/sema_test.zig` | `sema_test.zig` | `src/Sema.zig`, `src/InternPool.zig`, `src/Air.zig` |
All C source paths are under `stage0/`. Reference paths are relative to the
repository root.
If the test file is not in the table, infer by inspecting the file's imports
and the types of failures it can produce. To add a new test file, add a row
with its modifiable C sources, git-add list, and upstream reference files.
## Drill-Down Strategy
When a corpus test (e.g., `neghf2.zig`) fails because the C sema is missing
features, **do NOT skip or comment it out**. Instead, the worker must:
1. **Analyze**: dump the ZIR for the failing file (`zig ast-check -t <file>`)
and identify which ZIR instructions in the function body are unhandled.
2. **Decompose**: for each missing instruction, write the smallest possible
unit test in the unit test file (e.g., `sema_test.zig`) that exercises it.
Example: if `shl` is missing, add:
```zig
test "sema air: bit shift left" {
try semaAirRawCheck("export fn f(x: u32) u32 { return x << 1; }");
}
```
3. **Implement**: port the handler from upstream, make the unit test pass.
4. **Repeat**: move on to the next missing instruction. Each iteration adds
one unit test + one handler.
5. **Re-test corpus**: after all constituent pieces pass as unit tests, try
enabling the corpus test again. If it still fails (e.g., a new instruction
appears), repeat from step 1.
The worker should commit unit tests and their implementations together. The
corpus test stays disabled until ALL its pieces work, then it's enabled in a
final commit.
**When a feature is too complex for a single unit test** (e.g., inline
function calls require `decl_ref` + `call` + `dbg_inline_block` + `br`),
decompose further:
- First test `decl_ref` alone (if possible with a simpler construct)
- Then `call` with a same-file function
- Then inline call mechanics (`dbg_inline_block`, `dbg_arg_inline`, `br`)
- Then the full construct
**The unit test file always stays green.** Every test you add must pass
before the worker returns. Failing tests are never committed. The corpus test
stays commented until it's ready.
## Phase 0: Verify baseline is green
Run the full test suite:
```sh
cd ~/code/zig
./zig-out/bin/zig build test-zig0 -Dzig0-cc=tcc 2>&1 | tail -5
```
If non-zero exit or failures: dispatch a worker (Step 3) with the failure
context. After the worker returns, follow Steps 4-5. Re-run Phase 0 until
clean.
If clean: proceed to the main loop.
## Main Loop
Repeat until no disabled tests remain in the test file.
### Step 1: Enable the next disabled test
Search the test file in priority order:
**Priority A -- SkipZigTest lines:**
Search for lines matching:
```
if (true) return error.SkipZigTest
```
Pick the first one. Comment out the skip line to enable the test.
**Priority B -- Commented corpus entries:**
Search for lines matching `//"..` inside the `corpus_files` tuple
(between `const corpus_files = .{` and `};`). Pick the **first** one —
do not skip ahead. Uncomment the line (remove `//` prefix).
If neither is found, all tests are enabled -- go to Final Check.
### Step 2: Accumulate trivially-passing tests
Before dispatching a worker, check if the test passes without code changes.
Dispatch a worker (Step 3). If the worker reports `pass` with no C code
changes, **do not commit yet** — go back to Step 1 and enable the next
test. Accumulate these trivially-passing tests.
When a worker reports anything other than `pass`-without-code-changes, or
when you have accumulated ~10 passing tests, flush the batch:
1. Verify (Step 5).
2. If green, commit all accumulated enables together.
3. Then handle the current worker's non-pass result (Step 4).
### Step 3: Dispatch worker
1. Read `stage0/.claude/skills/enable-tests/worker-prompt.md`.
2. Replace `{{TEST_CONTEXT}}` with:
- Test file name
- What was enabled: test name or corpus file path + line number
- Modifiable C files (from the configuration table above)
- Unit test file (from the configuration table above)
- Key reference files (from the configuration table above)
- If this is a retry after `progress`, include the previous worker's
COMMIT_MSG so the new worker knows what was already done.
3. Launch via the Task tool with `run_in_background=true`:
```
subagent_type=general-purpose
prompt=<the worker prompt with context filled in>
run_in_background=true
```
4. Print the `output_file` path from the Task result so the user can
monitor with `tail -f <output_file>`.
5. Wait for the worker to finish using `TaskOutput` with `block=true`
on the returned task ID.
### Step 4: Handle worker result
Extract from the worker response:
- `STATUS`: one of `pass`, `progress`, or `no-progress`
- `COMMIT_MSG`: a descriptive commit message
If the worker response does not end with the expected STATUS format,
treat it as `no-progress`.
**If STATUS is `pass`:**
The worker made the test pass (possibly with C code changes and new unit
tests). Go to Step 5 to verify and commit.
**If STATUS is `progress`:**
The worker made partial progress but the corpus test still fails. The worker
has re-disabled the corpus test but added passing unit tests and C handlers.
Go to Step 5 to verify and commit the unit tests + C fixes. After committing,
re-enable the same corpus test and go to Step 3, including the previous
COMMIT_MSG in the context.
**If STATUS is `no-progress`:**
Worker couldn't make progress. The worker has re-disabled the test.
Check `git diff stage0/` — if there are no meaningful C code changes,
revert with `git checkout -- <git-add-list>`. Otherwise go to Step 5
to verify and commit any useful changes.
Re-enable the same test and dispatch another worker (Step 3).
**Guard rail:** If the same test gets `no-progress` three times in a
row, stop the loop and report to the user — the test likely needs a
design discussion.
### Step 5: Verify and commit
Run the test command:
```sh
./zig-out/bin/zig build test-zig0 -Dzig0-cc=tcc 2>&1 | tail -10 ; echo "EXIT: $?"
```
**If verification passes (EXIT 0, no failures):**
Commit all staged changes:
```sh
git add <git-add-list from configuration table>
git commit -m "<COMMIT_MSG>
Co-Authored-By: <model name>"
```
Go back to Step 1.
**If verification fails:**
The worker's changes broke something. Revert all uncommitted changes:
```sh
git checkout -- <git-add-list>
```
Re-enable the test (it was reverted too) and dispatch a fresh worker
(Step 3), noting in the context that the previous attempt caused a
regression.
**Guard rail:** If verification fails twice in a row for the same test,
stop the loop and report to the user.
### Step 6: Repeat
Go back to Step 1. **Never stop early** -- continue until all disabled
tests are enabled.
## Commit message conventions
- Batch of trivially-passing tests: `"<test_file>: enable <file1>, <file2>, ..."`
- Single test enabled with C fixes: use the worker's COMMIT_MSG
- Partial progress with unit tests: `"sema: <worker COMMIT_MSG>"`
- If the list is too long: `"<test_file>: enable N files from <dir>/"`
## Final Check
When no disabled tests remain (or the loop was stopped for a stuck test),
run the full suite with valgrind (this is the only place where valgrind
is used -- per-iteration verification uses the fast `test-zig0` command):
```sh
./zig-out/bin/zig build all-zig0 -Dvalgrind |& grep -v Warning | head -10 ; echo "EXIT: $?"
```
Must exit 0 with no unexpected output.

View File

@@ -1,251 +0,0 @@
# Enable Tests -- Worker
You are a worker agent fixing a test failure. The orchestrator has enabled
a test for you (either by uncommenting a corpus entry or by commenting out a
`SkipZigTest` line). Your job: diagnose the failure, port the fix from
upstream Zig to C, and make the test pass.
This is a **mechanical translation** -- no creativity, no invention. When the
C code differs from Zig, copy the Zig structure into C.
**Your goal is to make the test pass.** Do not re-disable the test unless you
have exhausted your iteration budget and cannot make further progress.
**NEVER skip a failing test.** If a test needs complex features, break it
into smaller pieces and build up. See the "Drill-Down Strategy" section.
## Context from orchestrator
{{TEST_CONTEXT}}
## Workflow
### Step 1: Run the test
```sh
cd ~/code/zig
./zig-out/bin/zig build test-zig0 -Dzig0-cc=tcc 2>&1 | tail -50
```
Capture the last ~50 lines. If tests pass, skip to Step 5.
### Step 2: Determine failure type
From the output, identify the failure:
**"Zig function '...' not found in C output":**
- C sema doesn't produce a function that Zig does. This means a sema
feature is missing (e.g. `zirFunc`, inline function analysis, etc.).
- Fix target: `stage0/sema.c`, reference: `src/Sema.zig`
- This is the most common failure type. Port the missing sema
functionality from upstream.
**Parser failure** (stack trace through `parser_test` or `expectAstConsistent`):
- AST node/token mismatch between C and Zig parsers
- Rendered output doesn't match source (canonical form)
- Fix target: `stage0/parser.c`, reference: `lib/std/zig/Parse.zig`
**AstGen failure** (stack trace through `astgen_test` or `expectEqualZir`):
- `has_compile_errors` -- C AstGen emitted compile errors
- `expectEqualZir` -- ZIR instruction/extra/string mismatch
- `unhandled tag N` -- missing tag in `expectEqualData`/`dataMatches`
- Fix target: `stage0/astgen.c` (or test zig for unhandled tags),
reference: `lib/std/zig/AstGen.zig`
**Sema failure** (stack trace through `sema` or Air checks):
- `has_compile_errors` -- C Sema emitted compile errors
- Air instruction/extra mismatch
- Fix target: `stage0/sema.c`, reference: `src/Sema.zig`
### Step 3: Analyze the failure
**For "function not found in C output":** Identify which sema feature
is missing. Check what ZIR instructions the source file produces and
which sema handler needs to be ported. Common missing features:
- `zirFunc` / `zirFuncFancy` -- function declarations
- `zirCall` -- function calls
- Various `zir*` handlers for specific instructions
**For `has_compile_errors`:** Temporarily add `#include <stdio.h>` and
`fprintf(stderr, ...)` to `setCompileError()` (or the sema equivalent)
to find which error fires. Run the test again and note the function
and line.
**For ZIR mismatch:** Note `inst_len`, `extra_len`, `string_bytes_len`
diffs and the first tag mismatch position.
**For `unhandled tag N`:** Add the missing tag to `expectEqualData` and
`dataMatches` switch statements in the test file.
**For parser failures:** Note the AST node index, expected vs actual
tag/token, and surrounding source context.
### Step 4: Compare and port (or drill down)
Find the upstream Zig function that corresponds to the failing code path.
Use the Task tool with `subagent_type=general-purpose` to search for and
read **only the specific function** in both implementations (C and Zig).
Do NOT read entire files -- request just the relevant function(s).
**If the fix is straightforward** (a missing case, a wrong field, etc.),
apply the minimal mechanical change and test.
**If the fix requires complex new infrastructure** (e.g., a corpus test
needs inline function calls, cross-module imports, etc.), use the
**Drill-Down Strategy** below.
After each change, run:
```sh
./zig-out/bin/zig build test-zig0 -Dzig0-cc=tcc 2>&1 | tail -20
```
Check that no previously-passing tests broke.
**Progress** means any of:
- `inst_len` / `extra_len` / `string_bytes_len` diff decreased
- First tag mismatch position moved later
- Compile errors resolved (even if other mismatch remains)
- AST mismatch moved to a later node
- A different function name now appears in "not found" error
- The error changes from "not found" to a different type
- New unit tests were added and pass
## Drill-Down Strategy
When a test (typically a corpus test in `stages_test.zig`) fails because the
C sema is missing multiple features, **do NOT skip or comment out the test
and give up**. Instead, decompose the problem:
### 1. Analyze what's missing
Dump the ZIR for the failing file:
```sh
./zig-out/bin/zig ast-check -t <file>
```
Identify which ZIR instructions in the function body are not handled by the
C sema's `analyzeBodyInner`. Also dump the AIR to see what the Zig compiler
produces:
```sh
./zig-out/bin/zig build-obj <file> --verbose-air -target wasm32-wasi -OReleaseFast
```
### 2. Write the smallest unit test for each missing piece
For each unhandled ZIR instruction, write the smallest possible
`semaAirRawCheck` test in the unit test file that exercises ONLY that
instruction. Examples:
| Missing ZIR | Minimal unit test |
|---|---|
| `shl` | `"export fn f(x: u32) u32 { return x << 1; }"` |
| `int_cast` | `"export fn f(x: u16) u32 { return @intCast(x); }"` |
| `mul` | `"export fn f(x: u32, y: u32) u32 { return x * y; }"` |
| `store_node` | `"export fn f(x: *u32) void { x.* = 42; }"` |
| `dbg_var_val` | `"export fn f(x: u32) u32 { const y = x + 1; return y; }"` |
**Important:** each unit test must be SMALL -- one new feature at a time.
If a test needs TWO new features (e.g., `shl` needs `typeof_log2_int_type`
AND `as_shift_operand` AND `shl` itself), write separate tests if possible,
or at minimum understand all three before implementing.
### 3. Implement one handler at a time
For each unit test:
1. Check the ZIR data format (`zig ast-check -t` on a temp file)
2. Read the upstream Zig handler in `src/Sema.zig` (use Grep, don't read
the whole file)
3. Port the simplified version to C
4. Run tests, verify the unit test passes AND no regressions
### 4. Build up to more complex constructs
After individual instructions work, combine them:
- Simple → compound: first `add`, then `"(x + y) ^ 0xFF"`
- Same type → cross type: first `u32 + u32`, then `@intCast(u16) + u32`
- Single function → inline call: first individual ops, then
`dbg_inline_block` + `br` + `dbg_arg_inline`
### 5. Re-test the corpus test
After all constituent pieces work as unit tests, try enabling the corpus
test again. If it still fails with a NEW error (different instruction),
repeat from step 1. If it passes, you're done.
### Key principle: the unit test file always stays green
- Every unit test you add MUST pass before you return.
- NEVER commit a failing test. If a test fails, fix the handler first.
- NEVER comment out or skip a failing unit test you just added -- either
make it pass or remove it.
- The corpus test stays disabled until ALL its pieces work, then it's
enabled in a final commit.
### Step 5: Clean up
1. Remove ALL `fprintf`/`printf` debug statements from C files.
2. Remove `#include <stdio.h>` if it was added for debugging.
3. Verify with: `grep -n 'fprintf\|printf' stage0/astgen.c stage0/parser.c stage0/sema.c`
4. If the corpus test still fails after exhausting your iteration budget:
- **For SkipZigTest:** re-add the `if (true) return error.SkipZigTest;`
line with a TODO comment describing the remaining diff.
- **For corpus entry:** re-comment the line (add `//` prefix back) and
update the comment describing what's still missing.
- **DO report `progress`** if you added any unit tests + handlers, even
if the corpus test isn't enabled yet. The partial work IS valuable.
5. Final verification -- this must exit 0:
```sh
./zig-out/bin/zig build test-zig0 -Dzig0-cc=tcc 2>&1 | tail -5
```
### Step 6: Return result
You MUST end your response with exactly this format:
```
STATUS: pass | progress | no-progress
COMMIT_MSG: <one-line descriptive message about what was ported/fixed>
```
- `pass` -- the enabled test now passes (entry remains enabled)
- `progress` -- partial progress was made toward making the test pass
(entry was re-disabled with updated comment); C code changes and/or new
unit tests are worth committing
- `no-progress` -- no measurable improvement (entry was re-disabled)
## Rules
- **Mechanical copy only.** Do not invent new approaches. If the upstream does
X, do X in C.
- **Never remove zig-cache.**
- **NEVER print to stdout/stderr in committed code.** Debug prints are
temporary only. Before returning, grep for `fprintf|printf` in all C files
and remove any you find.
- **Do NOT commit.** The orchestrator handles all commits.
- **Functions must appear in the same order as in the upstream Zig file.**
- **Prefer finding systematic differences** instead of debugging by bisection.
Zig code is bug-free for the purposes of porting. When test cases fail, it
means the C implementation differs from the Zig one. Making implementations
consistent is the correct approach.
- From `./zig-out/bin/zig build` commands, use **only** `./zig-out/bin/zig build *-zig0`.
Other targets may build/test zig itself, which takes ages and is unnecessary.
- When reading upstream reference files, **never read the whole file**.
Use Grep to find the function, then Read with offset/limit to get only the
relevant section. AstGen.zig is 568K, Sema.zig is 1.6M -- reading them
whole will blow your context.
- **Budget your iterations.** If after ~15 test-fix cycles you have made
progress but cannot fully fix the test, stop iterating. Clean up (Step 5)
and report `progress`. The orchestrator will commit partial work and
dispatch you again later. Do not exhaust your context chasing diminishing
returns.
- **Avoid modifying files under `src/`.** Changing any `src/` file invalidates
the `verbose_dumper` cache, triggering a ~7 minute recompilation. To inspect
AIR, use `zig-out/bin/zig build-obj --verbose-air <file>`. If you do need to
change `src/` to expose compiler internals, change the C↔Zig API boundary
(e.g. `src/test_exports.zig`, `src/verbose_air.zig`) so the internals are
properly exposed and maintainable long-term, and commit the change.
- **NEVER skip a test.** If a feature is complex, break it into smaller
pieces with unit tests. Every iteration should produce at least one new
passing unit test.