Add 'stage0/' from commit 'b3d106ec971300a9c745f4681fab3df7518c4346'

git-subtree-dir: stage0 git-subtree-mainline: 3db960767d git-subtree-split: b3d106ec97
2026-02-13 23:32:08 +02:00
parent 3db960767d b3d106ec97
commit b81f72bab3
26 changed files with 26186 additions and 0 deletions
--- a/stage0/.clang-format
+++ b/stage0/.clang-format
@@ -0,0 +1,3 @@
+BasedOnStyle: WebKit
+BreakBeforeBraces: Attach
+ColumnLimit: 79
--- a/stage0/.claude/skills/port-astgen/SKILL.md
+++ b/stage0/.claude/skills/port-astgen/SKILL.md
@@ -0,0 +1,122 @@
+---
+name: port-astgen
+description: Iteratively port AstGen.zig to astgen.c by enabling skipped corpus tests, finding divergences, and mechanically copying upstream code.
+allowed-tools: Read, Write, Edit, Bash, Grep, Glob, Task
+disable-model-invocation: true
+---
+
+# Port AstGen — Iterative Corpus Test Loop
+
+You are porting `AstGen.zig` to `astgen.c`. This is a **mechanical
+translation** — no creativity, no invention. When the C code differs
+from Zig, copy the Zig structure into C.
+
+## Key files
+
+- `astgen.c` — C implementation (modify this)
+- `astgen_test.zig` — corpus tests (enable/skip tests here)
+- `~/code/zig/lib/std/zig/AstGen.zig` — upstream reference (~14k lines)
+- `~/code/zig/lib/std/zig/Ast.zig` — AST node accessors
+- `~/code/zig/lib/std/zig/Zir.zig` — ZIR instruction definitions
+
+## Loop
+
+Repeat the following steps until all corpus tests pass or you've made
+3 consecutive iterations with zero progress.
+
+### Step 1: Find the first skipped corpus test
+
+Search `astgen_test.zig` for lines matching:
+```
+if (true) return error.SkipZigTest
+```
+Pick the first one. If none found, all corpus tests pass — stop.
+
+### Step 2: Enable it
+
+Remove or comment out the `if (true) return error.SkipZigTest` line.
+
+### Step 3: Run tests
+
+```sh
+zig build test 2>&1
+```
+
+Record the output. If tests pass, go to Step 7.
+
+### Step 4: Analyze the failure
+
+From the test output, determine the failure type:
+
+- **`has_compile_errors`**: Temporarily add `#include <stdio.h>` and
+  `fprintf(stderr, ...)` to `setCompileError()` in `astgen.c` to find
+  which `SET_ERROR` fires. Run the test again and note the function and
+  line.
+- **`zir mismatch`**: Note `inst_len`, `extra_len`, `string_bytes_len`
+  diffs and the first tag mismatch position.
+- **`unhandled tag N`**: Add the missing ZIR tag to the `expectEqualData`
+  and `dataMatches` switch statements in `astgen_test.zig`.
+
+### Step 5: Compare implementations
+
+Find the upstream Zig function that corresponds to the failing code
+path. Use the Task tool with `subagent_type=general-purpose` to read
+both implementations and enumerate **every difference**.
+
+Focus on differences that affect output:
+- Extra data written (field order, conditional fields, body lengths)
+- Instruction tags emitted
+- String table entries
+- Break payload values (operand_src_node)
+
+Do NOT guess. Read both implementations completely and compare
+mechanically.
+
+### Step 6: Port the fix
+
+Apply the minimal mechanical change to `astgen.c` to match the upstream.
+Run `zig build test` after each change to check for progress.
+
+**Progress** means any of:
+- `inst_len` diff decreased
+- `extra_len` diff decreased
+- `string_bytes_len` diff decreased
+- First tag mismatch position moved later
+
+If after porting a fix the test still fails but progress was made,
+continue to Step 7 (commit progress, re-skip).
+
+### Step 7: Clean up and commit
+
+1. If the corpus test still fails: re-add the `SkipZigTest` line with
+   a TODO comment describing the remaining diff.
+2. Remove ALL `fprintf`/`printf` debug statements from `astgen.c`.
+3. Remove `#include <stdio.h>` if it was added for debugging.
+4. Verify: `zig build fmt && zig build all` must exit 0 with no unexpected output.
+5. Commit:
+   ```sh
+   git add astgen.c astgen_test.zig
+   git commit -m "<descriptive message>
+
+   Co-Authored-By: <whatever model is running this>"
+   ```
+
+### Step 8: Repeat
+
+Go back to Step 1.
+
+## Rules
+
+- **Mechanical copy only.** Do not invent new approaches. If the upstream does
+  X, do X in C.
+- **Never remove zig-cache.**
+- **Never print to stdout/stderr in committed code.** Debug prints are
+  temporary only.
+- **Functions must appear in the same order as in the upstream Zig file.**
+- **Commit after every iteration**, even partial positive progress.
+- **Prefer finding systematic differences for catching bugs** instead of
+  debugging and hunting for them. Zig code is bug-free for the purposes of
+  porting. When test cases fail, it means the C implementation differs from the
+  Zig one, which is the source of the bug. So standard "bug hunting" methods no
+  longer apply -- making implementations consistent is a much better approach
+  in all ways.
--- a/stage0/.gitignore
+++ b/stage0/.gitignore
@@ -0,0 +1,3 @@
+/.zig-cache/
+/zig-out/
+*.o
--- a/stage0/CLAUDE.md
+++ b/stage0/CLAUDE.md
@@ -0,0 +1,25 @@
+- when porting features from upstream Zig, it should be a mechanical copy.
+  Don't invent. Most of what you are doing is invented, but needs to be re-done
+  in C. Keep the structure in place, name functions and types the same way (or
+  within reason equivalently if there are namespacing constraints). It should
+  be easy to reference one from the other; and, if there are semantic
+  differences, they *must* be because Zig or C does not support certain
+  features (like errdefer).
+- See README.md for useful information about this project, incl. how to test
+  this.
+- **Never ever** remove zig-cache, nether local nor global.
+- Zig code is in ~/code/zig, don't look at /nix/...
+- when translating functions from Zig to C (mechanically, remember?), add them
+  in the same order as in the original Zig file.
+- debug printfs: add printfs only when debugging a specific issue; when done
+  debugging, remove them (or comment them if you may find them useful later). I
+  prefer committing code only when `zig build` returns no output.
+- Always complete all tasks before stopping. Do not stop to ask for
+  confirmation mid-task. If you have remaining work, continue without waiting
+  for input.
+- no `cppcheck` suppressions. They are here for a reason. If it is complaining
+  about automatic variables, make it non-automatic. I.e. find a way to satisfy
+  the linter, do not suppress it.
+- if you are in the middle of porting AstGen, load up the skill
+  .claude/skills/port-astgen/SKILL.md and proceed with it.
+- remember: **mechanical copy** when porting existing stuff, no new creativity.
--- a/stage0/LICENSE
+++ b/stage0/LICENSE
@@ -0,0 +1,34 @@
+NOTICE TO PROSPECTIVE UPSTREAM CONTRIBUTORS
+
+This software is licensed under the MIT License below. However, the
+author politely but firmly requests that you do not submit this work, or
+any derivative thereof, to the Zig project upstream unless you have
+obtained explicit written permission from a Zig core team member
+authorizing the submission.
+
+This notice is not a license restriction. The MIT License governs all
+use of this software. This is a social contract: please honor it.
+
+---
+
+The MIT License (Expat)
+
+Copyright (c) Motiejus Jakštys
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
--- a/stage0/README.md
+++ b/stage0/README.md
@@ -0,0 +1,45 @@
+# About
+
+zig0 aspires to be an interpreter of zig 0.15.1 written in C.
+
+This is written with help from LLM:
+
+- Lexer:
+  - Datastructures 100% human.
+  - Helper functions 100% human.
+  - Lexing functions 50/50 human/bot.
+- Parser:
+  - Datastructures 100% human.
+  - Helper functions 50/50.
+  - Parser functions 5/95 human/bot.
+- AstGen: TBD.
+
+# Testing
+
+Quick test:
+
+    zig build fmt test
+
+Full test and static analysis with all supported compilers and valgrind (run
+before commit, takes a while):
+
+    zig build -Dvalgrind
+
+# Debugging tips
+
+Test runs infinitely? Build the test program executable:
+
+    $ zig build test -Dno-exec
+
+And then run it, capturing the stack trace:
+
+```
+gdb -batch \
+    -ex "python import threading; threading.Timer(1.0, lambda: gdb.post_event(lambda: gdb.execute('interrupt'))).start()" \
+    -ex run \
+    -ex "bt full" \
+    -ex quit \
+    zig-out/bin/test
+```
+
+You are welcome to replace `-ex "bt full"` with anything other of interest.
--- a/stage0/ast.c
+++ b/stage0/ast.c
@@ -0,0 +1,122 @@
+#include "common.h"
+
+#include <setjmp.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "ast.h"
+#include "parser.h"
+
+#define N 1024
+
+static void astTokenListEnsureCapacity(
+    AstTokenList* list, uint32_t additional) {
+    const uint32_t new_len = list->len + additional;
+    if (new_len <= list->cap) {
+        return;
+    }
+
+    const uint32_t new_cap = new_len > list->cap * 2 ? new_len : list->cap * 2;
+    list->tags = realloc(list->tags, new_cap * sizeof(TokenizerTag));
+    list->starts = realloc(list->starts, new_cap * sizeof(AstIndex));
+    if (!list->tags || !list->starts)
+        exit(1);
+    list->cap = new_cap;
+}
+
+Ast astParse(const char* source, const uint32_t len) {
+    uint32_t estimated_token_count = len / 8;
+
+    AstTokenList tokens = {
+        .len = 0,
+        .cap = estimated_token_count,
+        .tags = ARR_INIT(TokenizerTag, estimated_token_count),
+        .starts = ARR_INIT(AstIndex, estimated_token_count),
+    };
+
+    Tokenizer tok = tokenizerInit(source, len);
+    while (true) {
+        astTokenListEnsureCapacity(&tokens, 1);
+        TokenizerToken token = tokenizerNext(&tok);
+        tokens.tags[tokens.len] = token.tag;
+        tokens.starts[tokens.len++] = token.loc.start;
+        if (token.tag == TOKEN_EOF)
+            break;
+    }
+
+    uint32_t estimated_node_count = (tokens.len + 2) / 2;
+
+    char err_buf[PARSE_ERR_BUF_SIZE];
+    err_buf[0] = '\0';
+
+    Parser p = {
+        .source = source,
+        .source_len = len,
+        .token_tags = tokens.tags,
+        .token_starts = tokens.starts,
+        .tokens_len = tokens.len,
+        .tok_i = 0,
+        .nodes = {
+            .len = 0,
+            .cap = estimated_node_count,
+            .tags = ARR_INIT(AstNodeTag, estimated_node_count),
+            .main_tokens = ARR_INIT(AstTokenIndex, estimated_node_count),
+            .datas = ARR_INIT(AstData, estimated_node_count),
+        },
+        .extra_data = SLICE_INIT(AstNodeIndex, N),
+        .scratch = SLICE_INIT(AstNodeIndex, N),
+        .err_buf = err_buf,
+    };
+
+    bool has_error = false;
+    if (setjmp(p.error_jmp) != 0) {
+        has_error = true;
+    }
+    if (!has_error)
+        parseRoot(&p);
+
+    p.scratch.cap = p.scratch.len = 0;
+    free(p.scratch.arr);
+
+    char* err_msg = NULL;
+    if (has_error && err_buf[0] != '\0') {
+        const size_t len2 = strlen(err_buf);
+        err_msg = malloc(len2 + 1);
+        if (!err_msg)
+            exit(1);
+        memcpy(err_msg, err_buf, len2 + 1);
+    }
+
+    return (Ast) {
+        .source = source,
+        .source_len = len,
+        .tokens = tokens,
+        .nodes = p.nodes,
+        .extra_data = {
+            .len = p.extra_data.len,
+            .cap = p.extra_data.cap,
+            .arr = p.extra_data.arr,
+        },
+        .has_error = has_error,
+        .err_msg = err_msg,
+    };
+}
+
+void astDeinit(Ast* tree) {
+    free(tree->err_msg);
+
+    tree->tokens.cap = tree->tokens.len = 0;
+    free(tree->tokens.tags);
+    free(tree->tokens.starts);
+
+    tree->nodes.cap = 0;
+    tree->nodes.len = 0;
+    free(tree->nodes.tags);
+    free(tree->nodes.main_tokens);
+    free(tree->nodes.datas);
+
+    tree->extra_data.cap = 0;
+    tree->extra_data.len = 0;
+    free(tree->extra_data.arr);
+}
--- a/stage0/ast.h
+++ b/stage0/ast.h
@@ -0,0 +1,625 @@
+#ifndef _ZIG0_AST_H__
+#define _ZIG0_AST_H__
+
+#include <stdbool.h>
+#include <stdint.h>
+
+#include "common.h"
+#include "tokenizer.h"
+
+typedef enum {
+    /// sub_list[lhs...rhs]
+    AST_NODE_ROOT,
+    /// `usingnamespace lhs;`. rhs unused. main_token is `usingnamespace`.
+    AST_NODE_USINGNAMESPACE,
+    /// lhs is test name token (must be string literal or identifier), if any.
+    /// rhs is the body node.
+    AST_NODE_TEST_DECL,
+    /// lhs is the index into extra_data.
+    /// rhs is the initialization expression, if any.
+    /// main_token is `var` or `const`.
+    AST_NODE_GLOBAL_VAR_DECL,
+    /// `var a: x align(y) = rhs`
+    /// lhs is the index into extra_data.
+    /// main_token is `var` or `const`.
+    AST_NODE_LOCAL_VAR_DECL,
+    /// `var a: lhs = rhs`. lhs and rhs may be unused.
+    /// Can be local or global.
+    /// main_token is `var` or `const`.
+    AST_NODE_SIMPLE_VAR_DECL,
+    /// `var a align(lhs) = rhs`. lhs and rhs may be unused.
+    /// Can be local or global.
+    /// main_token is `var` or `const`.
+    AST_NODE_ALIGNED_VAR_DECL,
+    /// lhs is the identifier token payload if any,
+    /// rhs is the deferred expression.
+    AST_NODE_ERRDEFER,
+    /// lhs is unused.
+    /// rhs is the deferred expression.
+    AST_NODE_DEFER,
+    /// lhs catch rhs
+    /// lhs catch |err| rhs
+    /// main_token is the `catch` keyword.
+    /// payload is determined by looking at the next token after the `catch`
+    /// keyword.
+    AST_NODE_CATCH,
+    /// `lhs.a`. main_token is the dot. rhs is the identifier token index.
+    AST_NODE_FIELD_ACCESS,
+    /// `lhs.?`. main_token is the dot. rhs is the `?` token index.
+    AST_NODE_UNWRAP_OPTIONAL,
+    /// `lhs == rhs`. main_token is op.
+    AST_NODE_EQUAL_EQUAL,
+    /// `lhs != rhs`. main_token is op.
+    AST_NODE_BANG_EQUAL,
+    /// `lhs < rhs`. main_token is op.
+    AST_NODE_LESS_THAN,
+    /// `lhs > rhs`. main_token is op.
+    AST_NODE_GREATER_THAN,
+    /// `lhs <= rhs`. main_token is op.
+    AST_NODE_LESS_OR_EQUAL,
+    /// `lhs >= rhs`. main_token is op.
+    AST_NODE_GREATER_OR_EQUAL,
+    /// `lhs *= rhs`. main_token is op.
+    AST_NODE_ASSIGN_MUL,
+    /// `lhs /= rhs`. main_token is op.
+    AST_NODE_ASSIGN_DIV,
+    /// `lhs %= rhs`. main_token is op.
+    AST_NODE_ASSIGN_MOD,
+    /// `lhs += rhs`. main_token is op.
+    AST_NODE_ASSIGN_ADD,
+    /// `lhs -= rhs`. main_token is op.
+    AST_NODE_ASSIGN_SUB,
+    /// `lhs <<= rhs`. main_token is op.
+    AST_NODE_ASSIGN_SHL,
+    /// `lhs <<|= rhs`. main_token is op.
+    AST_NODE_ASSIGN_SHL_SAT,
+    /// `lhs >>= rhs`. main_token is op.
+    AST_NODE_ASSIGN_SHR,
+    /// `lhs &= rhs`. main_token is op.
+    AST_NODE_ASSIGN_BIT_AND,
+    /// `lhs ^= rhs`. main_token is op.
+    AST_NODE_ASSIGN_BIT_XOR,
+    /// `lhs |= rhs`. main_token is op.
+    AST_NODE_ASSIGN_BIT_OR,
+    /// `lhs *%= rhs`. main_token is op.
+    AST_NODE_ASSIGN_MUL_WRAP,
+    /// `lhs +%= rhs`. main_token is op.
+    AST_NODE_ASSIGN_ADD_WRAP,
+    /// `lhs -%= rhs`. main_token is op.
+    AST_NODE_ASSIGN_SUB_WRAP,
+    /// `lhs *|= rhs`. main_token is op.
+    AST_NODE_ASSIGN_MUL_SAT,
+    /// `lhs +|= rhs`. main_token is op.
+    AST_NODE_ASSIGN_ADD_SAT,
+    /// `lhs -|= rhs`. main_token is op.
+    AST_NODE_ASSIGN_SUB_SAT,
+    /// `lhs = rhs`. main_token is op.
+    AST_NODE_ASSIGN,
+    /// `a, b, ... = rhs`. main_token is op. lhs is index into `extra_data`
+    /// of an lhs elem count followed by an array of that many `Node.Index`,
+    /// with each node having one of the following types:
+    /// * `global_var_decl`
+    /// * `local_var_decl`
+    /// * `simple_var_decl`
+    /// * `aligned_var_decl`
+    /// * Any expression node
+    /// The first 3 types correspond to a `var` or `const` lhs node (note
+    /// that their `rhs` is always 0). An expression node corresponds to a
+    /// standard assignment LHS (which must be evaluated as an lvalue).
+    /// There may be a preceding `comptime` token, which does not create a
+    /// corresponding `comptime` node so must be manually detected.
+    AST_NODE_ASSIGN_DESTRUCTURE,
+    /// `lhs || rhs`. main_token is the `||`.
+    AST_NODE_MERGE_ERROR_SETS,
+    /// `lhs * rhs`. main_token is the `*`.
+    AST_NODE_MUL,
+    /// `lhs / rhs`. main_token is the `/`.
+    AST_NODE_DIV,
+    /// `lhs % rhs`. main_token is the `%`.
+    AST_NODE_MOD,
+    /// `lhs ** rhs`. main_token is the `**`.
+    AST_NODE_ARRAY_MULT,
+    /// `lhs *% rhs`. main_token is the `*%`.
+    AST_NODE_MUL_WRAP,
+    /// `lhs *| rhs`. main_token is the `*|`.
+    AST_NODE_MUL_SAT,
+    /// `lhs + rhs`. main_token is the `+`.
+    AST_NODE_ADD,
+    /// `lhs - rhs`. main_token is the `-`.
+    AST_NODE_SUB,
+    /// `lhs ++ rhs`. main_token is the `++`.
+    AST_NODE_ARRAY_CAT,
+    /// `lhs +% rhs`. main_token is the `+%`.
+    AST_NODE_ADD_WRAP,
+    /// `lhs -% rhs`. main_token is the `-%`.
+    AST_NODE_SUB_WRAP,
+    /// `lhs +| rhs`. main_token is the `+|`.
+    AST_NODE_ADD_SAT,
+    /// `lhs -| rhs`. main_token is the `-|`.
+    AST_NODE_SUB_SAT,
+    /// `lhs << rhs`. main_token is the `<<`.
+    AST_NODE_SHL,
+    /// `lhs <<| rhs`. main_token is the `<<|`.
+    AST_NODE_SHL_SAT,
+    /// `lhs >> rhs`. main_token is the `>>`.
+    AST_NODE_SHR,
+    /// `lhs & rhs`. main_token is the `&`.
+    AST_NODE_BIT_AND,
+    /// `lhs ^ rhs`. main_token is the `^`.
+    AST_NODE_BIT_XOR,
+    /// `lhs | rhs`. main_token is the `|`.
+    AST_NODE_BIT_OR,
+    /// `lhs orelse rhs`. main_token is the `orelse`.
+    AST_NODE_ORELSE,
+    /// `lhs and rhs`. main_token is the `and`.
+    AST_NODE_BOOL_AND,
+    /// `lhs or rhs`. main_token is the `or`.
+    AST_NODE_BOOL_OR,
+    /// `op lhs`. rhs unused. main_token is op.
+    AST_NODE_BOOL_NOT,
+    /// `op lhs`. rhs unused. main_token is op.
+    AST_NODE_NEGATION,
+    /// `op lhs`. rhs unused. main_token is op.
+    AST_NODE_BIT_NOT,
+    /// `op lhs`. rhs unused. main_token is op.
+    AST_NODE_NEGATION_WRAP,
+    /// `op lhs`. rhs unused. main_token is op.
+    AST_NODE_ADDRESS_OF,
+    /// `op lhs`. rhs unused. main_token is op.
+    AST_NODE_TRY,
+    /// `op lhs`. rhs unused. main_token is op.
+    AST_NODE_AWAIT,
+    /// `?lhs`. rhs unused. main_token is the `?`.
+    AST_NODE_OPTIONAL_TYPE,
+    /// `[lhs]rhs`.
+    AST_NODE_ARRAY_TYPE,
+    /// `[lhs:a]b`. `ArrayTypeSentinel[rhs]`.
+    AST_NODE_ARRAY_TYPE_SENTINEL,
+    /// `[*]align(lhs) rhs`. lhs can be omitted.
+    /// `*align(lhs) rhs`. lhs can be omitted.
+    /// `[]rhs`.
+    /// main_token is the asterisk if a single item pointer or the lbracket
+    /// if a slice, many-item pointer, or C-pointer
+    /// main_token might be a ** token, which is shared with a parent/child
+    /// pointer type and may require special handling.
+    AST_NODE_PTR_TYPE_ALIGNED,
+    /// `[*:lhs]rhs`. lhs can be omitted.
+    /// `*rhs`.
+    /// `[:lhs]rhs`.
+    /// main_token is the asterisk if a single item pointer or the lbracket
+    /// if a slice, many-item pointer, or C-pointer
+    /// main_token might be a ** token, which is shared with a parent/child
+    /// pointer type and may require special handling.
+    AST_NODE_PTR_TYPE_SENTINEL,
+    /// lhs is index into ptr_type. rhs is the element type expression.
+    /// main_token is the asterisk if a single item pointer or the lbracket
+    /// if a slice, many-item pointer, or C-pointer
+    /// main_token might be a ** token, which is shared with a parent/child
+    /// pointer type and may require special handling.
+    AST_NODE_PTR_TYPE,
+    /// lhs is index into ptr_type_bit_range. rhs is the element type
+    /// expression.
+    /// main_token is the asterisk if a single item pointer or the lbracket
+    /// if a slice, many-item pointer, or C-pointer
+    /// main_token might be a ** token, which is shared with a parent/child
+    /// pointer type and may require special handling.
+    AST_NODE_PTR_TYPE_BIT_RANGE,
+    /// `lhs[rhs..]`
+    /// main_token is the lbracket.
+    AST_NODE_SLICE_OPEN,
+    /// `lhs[b..c]`. rhs is index into Slice
+    /// main_token is the lbracket.
+    AST_NODE_SLICE,
+    /// `lhs[b..c :d]`. rhs is index into SliceSentinel. Slice end c can be
+    /// omitted.
+    /// main_token is the lbracket.
+    AST_NODE_SLICE_SENTINEL,
+    /// `lhs.*`. rhs is unused.
+    AST_NODE_DEREF,
+    /// `lhs[rhs]`.
+    AST_NODE_ARRAY_ACCESS,
+    /// `lhs{rhs}`. rhs can be omitted.
+    AST_NODE_ARRAY_INIT_ONE,
+    /// `lhs{rhs,}`. rhs can *not* be omitted
+    AST_NODE_ARRAY_INIT_ONE_COMMA,
+    /// `.{lhs, rhs}`. lhs and rhs can be omitted.
+    AST_NODE_ARRAY_INIT_DOT_TWO,
+    /// Same as `array_init_dot_two` except there is known to be a trailing
+    /// comma
+    /// before the final rbrace.
+    AST_NODE_ARRAY_INIT_DOT_TWO_COMMA,
+    /// `.{a, b}`. `sub_list[lhs..rhs]`.
+    AST_NODE_ARRAY_INIT_DOT,
+    /// Same as `array_init_dot` except there is known to be a trailing comma
+    /// before the final rbrace.
+    AST_NODE_ARRAY_INIT_DOT_COMMA,
+    /// `lhs{a, b}`. `sub_range_list[rhs]`. lhs can be omitted which means
+    /// `.{a, b}`.
+    AST_NODE_ARRAY_INIT,
+    /// Same as `array_init` except there is known to be a trailing comma
+    /// before the final rbrace.
+    AST_NODE_ARRAY_INIT_COMMA,
+    /// `lhs{.a = rhs}`. rhs can be omitted making it empty.
+    /// main_token is the lbrace.
+    AST_NODE_STRUCT_INIT_ONE,
+    /// `lhs{.a = rhs,}`. rhs can *not* be omitted.
+    /// main_token is the lbrace.
+    AST_NODE_STRUCT_INIT_ONE_COMMA,
+    /// `.{.a = lhs, .b = rhs}`. lhs and rhs can be omitted.
+    /// main_token is the lbrace.
+    /// No trailing comma before the rbrace.
+    AST_NODE_STRUCT_INIT_DOT_TWO,
+    /// Same as `struct_init_dot_two` except there is known to be a trailing
+    /// comma
+    /// before the final rbrace.
+    AST_NODE_STRUCT_INIT_DOT_TWO_COMMA,
+    /// `.{.a = b, .c = d}`. `sub_list[lhs..rhs]`.
+    /// main_token is the lbrace.
+    AST_NODE_STRUCT_INIT_DOT,
+    /// Same as `struct_init_dot` except there is known to be a trailing comma
+    /// before the final rbrace.
+    AST_NODE_STRUCT_INIT_DOT_COMMA,
+    /// `lhs{.a = b, .c = d}`. `sub_range_list[rhs]`.
+    /// lhs can be omitted which means `.{.a = b, .c = d}`.
+    /// main_token is the lbrace.
+    AST_NODE_STRUCT_INIT,
+    /// Same as `struct_init` except there is known to be a trailing comma
+    /// before the final rbrace.
+    AST_NODE_STRUCT_INIT_COMMA,
+    /// `lhs(rhs)`. rhs can be omitted.
+    /// main_token is the lparen.
+    AST_NODE_CALL_ONE,
+    /// `lhs(rhs,)`. rhs can be omitted.
+    /// main_token is the lparen.
+    AST_NODE_CALL_ONE_COMMA,
+    /// `async lhs(rhs)`. rhs can be omitted.
+    AST_NODE_ASYNC_CALL_ONE,
+    /// `async lhs(rhs,)`.
+    AST_NODE_ASYNC_CALL_ONE_COMMA,
+    /// `lhs(a, b, c)`. `SubRange[rhs]`.
+    /// main_token is the `(`.
+    AST_NODE_CALL,
+    /// `lhs(a, b, c,)`. `SubRange[rhs]`.
+    /// main_token is the `(`.
+    AST_NODE_CALL_COMMA,
+    /// `async lhs(a, b, c)`. `SubRange[rhs]`.
+    /// main_token is the `(`.
+    AST_NODE_ASYNC_CALL,
+    /// `async lhs(a, b, c,)`. `SubRange[rhs]`.
+    /// main_token is the `(`.
+    AST_NODE_ASYNC_CALL_COMMA,
+    /// `switch(lhs) {}`. `SubRange[rhs]`.
+    /// `main_token` is the identifier of a preceding label, if any; otherwise
+    /// `switch`.
+    AST_NODE_SWITCH,
+    /// Same as switch except there is known to be a trailing comma
+    /// before the final rbrace
+    AST_NODE_SWITCH_COMMA,
+    /// `lhs => rhs`. If lhs is omitted it means `else`.
+    /// main_token is the `=>`
+    AST_NODE_SWITCH_CASE_ONE,
+    /// Same ast `switch_case_one` but the case is inline
+    AST_NODE_SWITCH_CASE_INLINE_ONE,
+    /// `a, b, c => rhs`. `SubRange[lhs]`.
+    /// main_token is the `=>`
+    AST_NODE_SWITCH_CASE,
+    /// Same ast `switch_case` but the case is inline
+    AST_NODE_SWITCH_CASE_INLINE,
+    /// `lhs...rhs`.
+    AST_NODE_SWITCH_RANGE,
+    /// `while (lhs) rhs`.
+    /// `while (lhs) |x| rhs`.
+    AST_NODE_WHILE_SIMPLE,
+    /// `while (lhs) : (a) b`. `WhileCont[rhs]`.
+    /// `while (lhs) : (a) b`. `WhileCont[rhs]`.
+    AST_NODE_WHILE_CONT,
+    /// `while (lhs) : (a) b else c`. `While[rhs]`.
+    /// `while (lhs) |x| : (a) b else c`. `While[rhs]`.
+    /// `while (lhs) |x| : (a) b else |y| c`. `While[rhs]`.
+    /// The cont expression part `: (a)` may be omitted.
+    AST_NODE_WHILE,
+    /// `for (lhs) rhs`.
+    AST_NODE_FOR_SIMPLE,
+    /// `for (lhs[0..inputs]) lhs[inputs + 1] else lhs[inputs + 2]`.
+    /// `For[rhs]`.
+    AST_NODE_FOR,
+    /// `lhs..rhs`. rhs can be omitted.
+    AST_NODE_FOR_RANGE,
+    /// `if (lhs) rhs`.
+    /// `if (lhs) |a| rhs`.
+    AST_NODE_IF_SIMPLE,
+    /// `if (lhs) a else b`. `If[rhs]`.
+    /// `if (lhs) |x| a else b`. `If[rhs]`.
+    /// `if (lhs) |x| a else |y| b`. `If[rhs]`.
+    AST_NODE_IF,
+    /// `suspend lhs`. lhs can be omitted. rhs is unused.
+    AST_NODE_SUSPEND,
+    /// `resume lhs`. rhs is unused.
+    AST_NODE_RESUME,
+    /// `continue :lhs rhs`
+    /// both lhs and rhs may be omitted.
+    AST_NODE_CONTINUE,
+    /// `break :lhs rhs`
+    /// both lhs and rhs may be omitted.
+    AST_NODE_BREAK,
+    /// `return lhs`. lhs can be omitted. rhs is unused.
+    AST_NODE_RETURN,
+    /// `fn (a: lhs) rhs`. lhs can be omitted.
+    /// anytype and ... parameters are omitted from the AST tree.
+    /// main_token is the `fn` keyword.
+    /// extern function declarations use this tag.
+    AST_NODE_FN_PROTO_SIMPLE,
+    /// `fn (a: b, c: d) rhs`. `sub_range_list[lhs]`.
+    /// anytype and ... parameters are omitted from the AST tree.
+    /// main_token is the `fn` keyword.
+    /// extern function declarations use this tag.
+    AST_NODE_FN_PROTO_MULTI,
+    /// `fn (a: b) addrspace(e) linksection(f) callconv(g) rhs`.
+    /// `FnProtoOne[lhs]`.
+    /// zero or one parameters.
+    /// anytype and ... parameters are omitted from the AST tree.
+    /// main_token is the `fn` keyword.
+    /// extern function declarations use this tag.
+    AST_NODE_FN_PROTO_ONE,
+    /// `fn (a: b, c: d) addrspace(e) linksection(f) callconv(g) rhs`.
+    /// `FnProto[lhs]`.
+    /// anytype and ... parameters are omitted from the AST tree.
+    /// main_token is the `fn` keyword.
+    /// extern function declarations use this tag.
+    AST_NODE_FN_PROTO,
+    /// lhs is the fn_proto.
+    /// rhs is the function body block.
+    /// Note that extern function declarations use the fn_proto tags rather
+    /// than this one.
+    AST_NODE_FN_DECL,
+    /// `anyframe->rhs`. main_token is `anyframe`. `lhs` is arrow token index.
+    AST_NODE_ANYFRAME_TYPE,
+    /// Both lhs and rhs unused.
+    AST_NODE_ANYFRAME_LITERAL,
+    /// Both lhs and rhs unused.
+    AST_NODE_CHAR_LITERAL,
+    /// Both lhs and rhs unused.
+    AST_NODE_NUMBER_LITERAL,
+    /// Both lhs and rhs unused.
+    AST_NODE_UNREACHABLE_LITERAL,
+    /// Both lhs and rhs unused.
+    /// Most identifiers will not have explicit AST nodes, however for
+    /// expressions
+    /// which could be one of many different kinds of AST nodes, there will be
+    /// an
+    /// identifier AST node for it.
+    AST_NODE_IDENTIFIER,
+    /// lhs is the dot token index, rhs unused, main_token is the identifier.
+    AST_NODE_ENUM_LITERAL,
+    /// main_token is the string literal token
+    /// Both lhs and rhs unused.
+    AST_NODE_STRING_LITERAL,
+    /// main_token is the first token index (redundant with lhs)
+    /// lhs is the first token index; rhs is the last token index.
+    /// Could be a series of multiline_string_literal_line tokens, or a single
+    /// string_literal token.
+    AST_NODE_MULTILINE_STRING_LITERAL,
+    /// `(lhs)`. main_token is the `(`; rhs is the token index of the `)`.
+    AST_NODE_GROUPED_EXPRESSION,
+    /// `@a(lhs, rhs)`. lhs and rhs may be omitted.
+    /// main_token is the builtin token.
+    AST_NODE_BUILTIN_CALL_TWO,
+    /// Same as builtin_call_two but there is known to be a trailing comma
+    /// before the rparen.
+    AST_NODE_BUILTIN_CALL_TWO_COMMA,
+    /// `@a(b, c)`. `sub_list[lhs..rhs]`.
+    /// main_token is the builtin token.
+    AST_NODE_BUILTIN_CALL,
+    /// Same as builtin_call but there is known to be a trailing comma before
+    /// the rparen.
+    AST_NODE_BUILTIN_CALL_COMMA,
+    /// `error{a, b}`.
+    /// rhs is the rbrace, lhs is unused.
+    AST_NODE_ERROR_SET_DECL,
+    /// `struct {}`, `union {}`, `opaque {}`, `enum {}`.
+    /// `extra_data[lhs..rhs]`.
+    /// main_token is `struct`, `union`, `opaque`, `enum` keyword.
+    AST_NODE_CONTAINER_DECL,
+    /// Same as ContainerDecl but there is known to be a trailing comma
+    /// or semicolon before the rbrace.
+    AST_NODE_CONTAINER_DECL_TRAILING,
+    /// `struct {lhs, rhs}`, `union {lhs, rhs}`, `opaque {lhs, rhs}`, `enum
+    /// {lhs, rhs}`.
+    /// lhs or rhs can be omitted.
+    /// main_token is `struct`, `union`, `opaque`, `enum` keyword.
+    AST_NODE_CONTAINER_DECL_TWO,
+    /// Same as ContainerDeclTwo except there is known to be a trailing comma
+    /// or semicolon before the rbrace.
+    AST_NODE_CONTAINER_DECL_TWO_TRAILING,
+    /// `struct(lhs)` / `union(lhs)` / `enum(lhs)`. `SubRange[rhs]`.
+    AST_NODE_CONTAINER_DECL_ARG,
+    /// Same as container_decl_arg but there is known to be a trailing
+    /// comma or semicolon before the rbrace.
+    AST_NODE_CONTAINER_DECL_ARG_TRAILING,
+    /// `union(enum) {}`. `sub_list[lhs..rhs]`.
+    /// Note that tagged unions with explicitly provided enums are represented
+    /// by `container_decl_arg`.
+    AST_NODE_TAGGED_UNION,
+    /// Same as tagged_union but there is known to be a trailing comma
+    /// or semicolon before the rbrace.
+    AST_NODE_TAGGED_UNION_TRAILING,
+    /// `union(enum) {lhs, rhs}`. lhs or rhs may be omitted.
+    /// Note that tagged unions with explicitly provided enums are represented
+    /// by `container_decl_arg`.
+    AST_NODE_TAGGED_UNION_TWO,
+    /// Same as tagged_union_two but there is known to be a trailing comma
+    /// or semicolon before the rbrace.
+    AST_NODE_TAGGED_UNION_TWO_TRAILING,
+    /// `union(enum(lhs)) {}`. `SubRange[rhs]`.
+    AST_NODE_TAGGED_UNION_ENUM_TAG,
+    /// Same as tagged_union_enum_tag but there is known to be a trailing comma
+    /// or semicolon before the rbrace.
+    AST_NODE_TAGGED_UNION_ENUM_TAG_TRAILING,
+    /// `a: lhs = rhs,`. lhs and rhs can be omitted.
+    /// main_token is the field name identifier.
+    /// lastToken() does not include the possible trailing comma.
+    AST_NODE_CONTAINER_FIELD_INIT,
+    /// `a: lhs align(rhs),`. rhs can be omitted.
+    /// main_token is the field name identifier.
+    /// lastToken() does not include the possible trailing comma.
+    AST_NODE_CONTAINER_FIELD_ALIGN,
+    /// `a: lhs align(c) = d,`. `container_field_list[rhs]`.
+    /// main_token is the field name identifier.
+    /// lastToken() does not include the possible trailing comma.
+    AST_NODE_CONTAINER_FIELD,
+    /// `comptime lhs`. rhs unused.
+    AST_NODE_COMPTIME,
+    /// `nosuspend lhs`. rhs unused.
+    AST_NODE_NOSUSPEND,
+    /// `{lhs rhs}`. rhs or lhs can be omitted.
+    /// main_token points at the lbrace.
+    AST_NODE_BLOCK_TWO,
+    /// Same as block_two but there is known to be a semicolon before the
+    /// rbrace.
+    AST_NODE_BLOCK_TWO_SEMICOLON,
+    /// `{}`. `sub_list[lhs..rhs]`.
+    /// main_token points at the lbrace.
+    AST_NODE_BLOCK,
+    /// Same as block but there is known to be a semicolon before the rbrace.
+    AST_NODE_BLOCK_SEMICOLON,
+    /// `asm(lhs)`. rhs is the token index of the rparen.
+    AST_NODE_ASM_SIMPLE,
+    /// Legacy asm with string clobbers. `asm(lhs, a)`.
+    /// `AsmLegacy[rhs]`.
+    AST_NODE_ASM_LEGACY,
+    /// `asm(lhs, a)`. `Asm[rhs]`.
+    AST_NODE_ASM,
+    /// `[a] "b" (c)`. lhs is 0, rhs is token index of the rparen.
+    /// `[a] "b" (-> lhs)`. rhs is token index of the rparen.
+    /// main_token is `a`.
+    AST_NODE_ASM_OUTPUT,
+    /// `[a] "b" (lhs)`. rhs is token index of the rparen.
+    /// main_token is `a`.
+    AST_NODE_ASM_INPUT,
+    /// `error.a`. lhs is token index of `.`. rhs is token index of `a`.
+    AST_NODE_ERROR_VALUE,
+    /// `lhs!rhs`. main_token is the `!`.
+    AST_NODE_ERROR_UNION,
+} AstNodeTag;
+
+typedef uint32_t AstTokenIndex;
+typedef uint32_t AstNodeIndex;
+typedef uint32_t AstIndex;
+
+typedef struct {
+    AstIndex lhs;
+    AstIndex rhs;
+} AstData;
+
+typedef struct {
+    uint32_t len;
+    uint32_t cap;
+    AstNodeTag* tags;
+    AstTokenIndex* main_tokens;
+    AstData* datas;
+} AstNodeList;
+
+typedef struct {
+    AstNodeTag tag;
+    AstTokenIndex main_token;
+    AstData data;
+} AstNodeItem;
+
+typedef struct {
+    uint32_t len;
+    uint32_t cap;
+    TokenizerTag* tags;
+    AstIndex* starts;
+} AstTokenList;
+
+typedef SLICE(AstNodeIndex) AstNodeIndexSlice;
+
+typedef struct {
+    const char* source;
+    uint32_t source_len;
+    AstTokenList tokens;
+    AstNodeList nodes;
+    AstNodeIndexSlice extra_data;
+    bool has_error;
+    char* err_msg;
+} Ast;
+
+typedef struct AstPtrType {
+    AstNodeIndex sentinel;
+    AstNodeIndex align_node;
+    AstNodeIndex addrspace_node;
+} AstPtrType;
+
+typedef struct AstPtrTypeBitRange {
+    AstNodeIndex sentinel;
+    AstNodeIndex align_node;
+    AstNodeIndex addrspace_node;
+    AstNodeIndex bit_range_start;
+    AstNodeIndex bit_range_end;
+} AstPtrTypeBitRange;
+
+typedef struct AstFnProtoOne {
+    AstNodeIndex param;
+    AstNodeIndex align_expr;
+    AstNodeIndex addrspace_expr;
+    AstNodeIndex section_expr;
+    AstNodeIndex callconv_expr;
+} AstFnProtoOne;
+
+typedef struct AstFnProto {
+    AstNodeIndex params_start;
+    AstNodeIndex params_end;
+    AstNodeIndex align_expr;
+    AstNodeIndex addrspace_expr;
+    AstNodeIndex section_expr;
+    AstNodeIndex callconv_expr;
+} AstFnProto;
+
+typedef struct AstSubRange {
+    AstNodeIndex start;
+    AstNodeIndex end;
+} AstSubRange;
+
+typedef struct AstSliceSentinel {
+    AstNodeIndex start;
+    AstNodeIndex end;
+    AstNodeIndex sentinel;
+} AstSliceSentinel;
+
+typedef struct AstWhileCont {
+    AstNodeIndex cont_expr;
+    AstNodeIndex then_expr;
+} AstWhileCont;
+
+typedef struct AstWhile {
+    AstNodeIndex cont_expr;
+    AstNodeIndex then_expr;
+    AstNodeIndex else_expr;
+} AstWhile;
+
+typedef struct AstFor {
+    unsigned int inputs : 31;
+    unsigned int has_else : 1;
+} AstFor;
+
+typedef struct AstIf {
+    AstNodeIndex then_expr;
+    AstNodeIndex else_expr;
+} AstIf;
+
+typedef struct AstError {
+    bool is_note;
+    AstTokenIndex token;
+    union {
+        struct {
+            TokenizerTag expected_tag;
+        } expected;
+        struct {
+        } none;
+    } extra;
+} AstError;
+
+Ast astParse(const char* source, uint32_t len);
+void astDeinit(Ast*);
+
+#endif
--- a/stage0/astgen.c
+++ b/stage0/astgen.c
--- a/stage0/astgen.h
+++ b/stage0/astgen.h
@@ -0,0 +1,11 @@
+// astgen.h — AST to ZIR conversion, ported from lib/std/zig/AstGen.zig.
+#ifndef _ZIG0_ASTGEN_H__
+#define _ZIG0_ASTGEN_H__
+
+#include "ast.h"
+#include "zir.h"
+
+// Convert AST to ZIR.
+Zir astGen(const Ast* ast);
+
+#endif
--- a/stage0/astgen_test.zig
+++ b/stage0/astgen_test.zig
@@ -0,0 +1,851 @@
+const std = @import("std");
+const Ast = std.zig.Ast;
+const Zir = std.zig.Zir;
+const AstGen = std.zig.AstGen;
+const Allocator = std.mem.Allocator;
+
+const c = @cImport({
+    @cInclude("astgen.h");
+});
+
+fn refZir(gpa: Allocator, source: [:0]const u8) !Zir {
+    var tree = try Ast.parse(gpa, source, .zig);
+    defer tree.deinit(gpa);
+    return try AstGen.generate(gpa, tree);
+}
+
+test "astgen dump: simple cases" {
+    const gpa = std.testing.allocator;
+
+    const cases = .{
+        .{ "empty", "" },
+        .{ "comptime {}", "comptime {}" },
+        .{ "const x = 0;", "const x = 0;" },
+        .{ "const x = 1;", "const x = 1;" },
+        .{ "const x = 0; const y = 0;", "const x = 0; const y = 0;" },
+        .{ "test \"t\" {}", "test \"t\" {}" },
+        .{ "const std = @import(\"std\");", "const std = @import(\"std\");" },
+        .{ "test_all.zig", @embedFile("test_all.zig") },
+    };
+
+    inline for (cases) |case| {
+        // std.debug.print("--- {s} ---\n", .{case[0]});
+        const source: [:0]const u8 = case[1];
+        var zir = try refZir(gpa, source);
+        zir.deinit(gpa);
+    }
+}
+
+/// Build a mask of extra[] indices that contain hash data (src_hash or
+/// fields_hash). These are zero-filled in the C output but contain real
+/// Blake3 hashes in the Zig reference. We skip these positions during
+/// comparison.
+fn buildHashSkipMask(gpa: Allocator, ref: Zir) ![]bool {
+    const ref_extra_len: u32 = @intCast(ref.extra.len);
+    const skip = try gpa.alloc(bool, ref_extra_len);
+    @memset(skip, false);
+
+    const ref_len: u32 = @intCast(ref.instructions.len);
+    const ref_tags = ref.instructions.items(.tag);
+    const ref_datas = ref.instructions.items(.data);
+    for (0..ref_len) |i| {
+        switch (ref_tags[i]) {
+            .extended => {
+                const ext = ref_datas[i].extended;
+                if (ext.opcode == .struct_decl or ext.opcode == .enum_decl) {
+                    // StructDecl/EnumDecl starts with fields_hash[4].
+                    const pi = ext.operand;
+                    for (0..4) |j| skip[pi + j] = true;
+                }
+            },
+            .declaration => {
+                // Declaration starts with src_hash[4].
+                const pi = ref_datas[i].declaration.payload_index;
+                for (0..4) |j| skip[pi + j] = true;
+            },
+            .func, .func_inferred => {
+                // Func payload: ret_ty(1) + param_block(1) + body_len(1)
+                // + trailing ret_ty + body + SrcLocs(3) + proto_hash(4).
+                const pi = ref_datas[i].pl_node.payload_index;
+                const ret_ty_raw: u32 = ref.extra[pi];
+                const ret_body_len: u32 = ret_ty_raw & 0x7FFFFFFF;
+                const body_len: u32 = ref.extra[pi + 2];
+                // ret_ty trailing: if body_len > 1, it's a body; if == 1, it's a ref; if 0, void.
+                const ret_trailing: u32 = if (ret_body_len > 1) ret_body_len else if (ret_body_len == 1) 1 else 0;
+                // proto_hash is at: pi + 3 + ret_trailing + body_len + 3
+                if (body_len > 0) {
+                    const hash_start = pi + 3 + ret_trailing + body_len + 3;
+                    for (0..4) |j| {
+                        if (hash_start + j < ref_extra_len)
+                            skip[hash_start + j] = true;
+                    }
+                }
+            },
+            else => {},
+        }
+    }
+    return skip;
+}
+
+test "astgen: empty source" {
+    const gpa = std.testing.allocator;
+    const source: [:0]const u8 = "";
+
+    var ref_zir = try refZir(gpa, source);
+    defer ref_zir.deinit(gpa);
+
+    var c_ast = c.astParse(source.ptr, @intCast(source.len));
+    defer c.astDeinit(&c_ast);
+    var c_zir = c.astGen(&c_ast);
+    defer c.zirDeinit(&c_zir);
+
+    try expectEqualZir(gpa, ref_zir, c_zir);
+}
+
+test "astgen: comptime {}" {
+    const gpa = std.testing.allocator;
+    const source: [:0]const u8 = "comptime {}";
+
+    var ref_zir = try refZir(gpa, source);
+    defer ref_zir.deinit(gpa);
+
+    var c_ast = c.astParse(source.ptr, @intCast(source.len));
+    defer c.astDeinit(&c_ast);
+    var c_zir = c.astGen(&c_ast);
+    defer c.zirDeinit(&c_zir);
+
+    try expectEqualZir(gpa, ref_zir, c_zir);
+}
+
+test "astgen: const x = 0;" {
+    const gpa = std.testing.allocator;
+    const source: [:0]const u8 = "const x = 0;";
+
+    var ref_zir = try refZir(gpa, source);
+    defer ref_zir.deinit(gpa);
+
+    var c_ast = c.astParse(source.ptr, @intCast(source.len));
+    defer c.astDeinit(&c_ast);
+    var c_zir = c.astGen(&c_ast);
+    defer c.zirDeinit(&c_zir);
+
+    try expectEqualZir(gpa, ref_zir, c_zir);
+}
+
+test "astgen: const x = 1;" {
+    const gpa = std.testing.allocator;
+    const source: [:0]const u8 = "const x = 1;";
+
+    var ref_zir = try refZir(gpa, source);
+    defer ref_zir.deinit(gpa);
+
+    var c_ast = c.astParse(source.ptr, @intCast(source.len));
+    defer c.astDeinit(&c_ast);
+    var c_zir = c.astGen(&c_ast);
+    defer c.zirDeinit(&c_zir);
+
+    try expectEqualZir(gpa, ref_zir, c_zir);
+}
+
+test "astgen: const x = 0; const y = 0;" {
+    const gpa = std.testing.allocator;
+    const source: [:0]const u8 = "const x = 0; const y = 0;";
+
+    var ref_zir = try refZir(gpa, source);
+    defer ref_zir.deinit(gpa);
+
+    var c_ast = c.astParse(source.ptr, @intCast(source.len));
+    defer c.astDeinit(&c_ast);
+    var c_zir = c.astGen(&c_ast);
+    defer c.zirDeinit(&c_zir);
+
+    try expectEqualZir(gpa, ref_zir, c_zir);
+}
+
+test "astgen: field_access" {
+    const gpa = std.testing.allocator;
+    const source: [:0]const u8 = "const std = @import(\"std\");\nconst mem = std.mem;";
+
+    var ref_zir = try refZir(gpa, source);
+    defer ref_zir.deinit(gpa);
+
+    var c_ast = c.astParse(source.ptr, @intCast(source.len));
+    defer c.astDeinit(&c_ast);
+    var c_zir = c.astGen(&c_ast);
+    defer c.zirDeinit(&c_zir);
+
+    try expectEqualZir(gpa, ref_zir, c_zir);
+}
+
+test "astgen: addr array init" {
+    const gpa = std.testing.allocator;
+    const source: [:0]const u8 = "const x = &[_][]const u8{\"a\",\"b\"};";
+
+    var ref_zir = try refZir(gpa, source);
+    defer ref_zir.deinit(gpa);
+
+    var c_ast = c.astParse(source.ptr, @intCast(source.len));
+    defer c.astDeinit(&c_ast);
+    var c_zir = c.astGen(&c_ast);
+    defer c.zirDeinit(&c_zir);
+
+    try expectEqualZir(gpa, ref_zir, c_zir);
+}
+
+test "astgen: test empty body" {
+    const gpa = std.testing.allocator;
+    const source: [:0]const u8 = "test \"t\" {}";
+
+    var ref_zir = try refZir(gpa, source);
+    defer ref_zir.deinit(gpa);
+
+    var c_ast = c.astParse(source.ptr, @intCast(source.len));
+    defer c.astDeinit(&c_ast);
+    var c_zir = c.astGen(&c_ast);
+    defer c.zirDeinit(&c_zir);
+
+    try expectEqualZir(gpa, ref_zir, c_zir);
+}
+
+test "astgen: test_all.zig" {
+    const gpa = std.testing.allocator;
+    const source: [:0]const u8 = @embedFile("test_all.zig");
+
+    var ref_zir = try refZir(gpa, source);
+    defer ref_zir.deinit(gpa);
+
+    var c_ast = c.astParse(source.ptr, @intCast(source.len));
+    defer c.astDeinit(&c_ast);
+    var c_zir = c.astGen(&c_ast);
+    defer c.zirDeinit(&c_zir);
+
+    try expectEqualZir(gpa, ref_zir, c_zir);
+}
+
+test "astgen: @import" {
+    const gpa = std.testing.allocator;
+    const source: [:0]const u8 = "const std = @import(\"std\");";
+
+    var ref_zir = try refZir(gpa, source);
+    defer ref_zir.deinit(gpa);
+
+    var c_ast = c.astParse(source.ptr, @intCast(source.len));
+    defer c.astDeinit(&c_ast);
+    var c_zir = c.astGen(&c_ast);
+    defer c.zirDeinit(&c_zir);
+
+    try expectEqualZir(gpa, ref_zir, c_zir);
+}
+
+fn expectEqualZir(gpa: Allocator, ref: Zir, got: c.Zir) !void {
+    const ref_len: u32 = @intCast(ref.instructions.len);
+    const ref_tags = ref.instructions.items(.tag);
+    const ref_datas = ref.instructions.items(.data);
+
+    // 1. Compare lengths.
+    try std.testing.expectEqual(ref_len, got.inst_len);
+
+    // 2. Compare instruction tags.
+    for (0..ref_len) |i| {
+        const ref_tag: u8 = @intFromEnum(ref_tags[i]);
+        const got_tag: u8 = @intCast(got.inst_tags[i]);
+        if (ref_tag != got_tag) {
+            std.debug.print(
+                "inst_tags[{d}] mismatch: ref={d} got={d}\n",
+                .{ i, ref_tag, got_tag },
+            );
+            return error.TestExpectedEqual;
+        }
+    }
+
+    // 3. Compare instruction data field-by-field.
+    for (0..ref_len) |i| {
+        try expectEqualData(i, ref_tags[i], ref_datas[i], got.inst_datas[i]);
+    }
+    // 4. Compare string bytes.
+    const ref_sb_len: u32 = @intCast(ref.string_bytes.len);
+    try std.testing.expectEqual(ref_sb_len, got.string_bytes_len);
+    for (0..ref_sb_len) |i| {
+        if (ref.string_bytes[i] != got.string_bytes[i]) {
+            std.debug.print(
+                "string_bytes[{d}] mismatch: ref=0x{x:0>2} got=0x{x:0>2}\n",
+                .{ i, ref.string_bytes[i], got.string_bytes[i] },
+            );
+            return error.TestExpectedEqual;
+        }
+    }
+
+    // 5. Compare extra data (skipping hash positions).
+    const skip = try buildHashSkipMask(gpa, ref);
+    defer gpa.free(skip);
+    const ref_extra_len: u32 = @intCast(ref.extra.len);
+    try std.testing.expectEqual(ref_extra_len, got.extra_len);
+    for (0..ref_extra_len) |i| {
+        if (skip[i]) continue;
+        if (ref.extra[i] != got.extra[i]) {
+            // Show first 10 extra diffs.
+            var count: u32 = 0;
+            for (0..ref_extra_len) |j| {
+                if (!skip[j] and ref.extra[j] != got.extra[j]) {
+                    std.debug.print(
+                        "extra[{d}] mismatch: ref={d} got={d}\n",
+                        .{ j, ref.extra[j], got.extra[j] },
+                    );
+                    count += 1;
+                    if (count >= 10) break;
+                }
+            }
+            return error.TestExpectedEqual;
+        }
+    }
+}
+
+/// Compare a single instruction's data, dispatching by tag.
+/// Zig's Data union has no guaranteed in-memory layout, so we
+/// compare each variant's fields individually.
+fn expectEqualData(
+    idx: usize,
+    tag: Zir.Inst.Tag,
+    ref: Zir.Inst.Data,
+    got: c.ZirInstData,
+) !void {
+    switch (tag) {
+        .extended => {
+            const r = ref.extended;
+            const g = got.extended;
+            // Some extended opcodes have undefined/unused small+operand.
+            const skip_data = switch (r.opcode) {
+                .dbg_empty_stmt, .astgen_error => true,
+                else => false,
+            };
+            const skip_small = switch (r.opcode) {
+                .add_with_overflow,
+                .sub_with_overflow,
+                .mul_with_overflow,
+                .shl_with_overflow,
+                .restore_err_ret_index,
+                .branch_hint,
+                => true,
+                else => false,
+            };
+            if (@intFromEnum(r.opcode) != g.opcode or
+                (!skip_data and !skip_small and r.small != g.small) or
+                (!skip_data and r.operand != g.operand))
+            {
+                std.debug.print(
+                    "inst_datas[{d}] (extended) mismatch:\n" ++
+                        "  ref: opcode={d} small=0x{x:0>4} operand={d}\n" ++
+                        "  got: opcode={d} small=0x{x:0>4} operand={d}\n",
+                    .{
+                        idx,
+                        @intFromEnum(r.opcode),
+                        r.small,
+                        r.operand,
+                        g.opcode,
+                        g.small,
+                        g.operand,
+                    },
+                );
+                return error.TestExpectedEqual;
+            }
+        },
+        .declaration => {
+            const r = ref.declaration;
+            const g = got.declaration;
+            if (@intFromEnum(r.src_node) != g.src_node or
+                r.payload_index != g.payload_index)
+            {
+                std.debug.print(
+                    "inst_datas[{d}] (declaration) mismatch:\n" ++
+                        "  ref: src_node={d} payload_index={d}\n" ++
+                        "  got: src_node={d} payload_index={d}\n",
+                    .{
+                        idx,
+                        @intFromEnum(r.src_node),
+                        r.payload_index,
+                        g.src_node,
+                        g.payload_index,
+                    },
+                );
+                return error.TestExpectedEqual;
+            }
+        },
+        .break_inline => {
+            const r = ref.@"break";
+            const g = got.break_data;
+            if (@intFromEnum(r.operand) != g.operand or
+                r.payload_index != g.payload_index)
+            {
+                std.debug.print(
+                    "inst_datas[{d}] (break_inline) mismatch:\n" ++
+                        "  ref: operand={d} payload_index={d}\n" ++
+                        "  got: operand={d} payload_index={d}\n",
+                    .{
+                        idx,
+                        @intFromEnum(r.operand),
+                        r.payload_index,
+                        g.operand,
+                        g.payload_index,
+                    },
+                );
+                return error.TestExpectedEqual;
+            }
+        },
+        .import => {
+            const r = ref.pl_tok;
+            const g = got.pl_tok;
+            if (@intFromEnum(r.src_tok) != g.src_tok or
+                r.payload_index != g.payload_index)
+            {
+                std.debug.print(
+                    "inst_datas[{d}] (import) mismatch:\n" ++
+                        "  ref: src_tok={d} payload_index={d}\n" ++
+                        "  got: src_tok={d} payload_index={d}\n",
+                    .{
+                        idx,
+                        @intFromEnum(r.src_tok),
+                        r.payload_index,
+                        g.src_tok,
+                        g.payload_index,
+                    },
+                );
+                return error.TestExpectedEqual;
+            }
+        },
+        .dbg_stmt => {
+            const r = ref.dbg_stmt;
+            const g = got.dbg_stmt;
+            if (r.line != g.line or r.column != g.column) {
+                std.debug.print(
+                    "inst_datas[{d}] (dbg_stmt) mismatch:\n" ++
+                        "  ref: line={d} column={d}\n" ++
+                        "  got: line={d} column={d}\n",
+                    .{ idx, r.line, r.column, g.line, g.column },
+                );
+                return error.TestExpectedEqual;
+            }
+        },
+        .ensure_result_non_error,
+        .restore_err_ret_index_unconditional,
+        .validate_struct_init_ty,
+        .validate_struct_init_result_ty,
+        .struct_init_empty_result,
+        .struct_init_empty,
+        .struct_init_empty_ref_result,
+        => {
+            const r = ref.un_node;
+            const g = got.un_node;
+            if (@intFromEnum(r.src_node) != g.src_node or
+                @intFromEnum(r.operand) != g.operand)
+            {
+                std.debug.print(
+                    "inst_datas[{d}] ({s}) mismatch:\n" ++
+                        "  ref: src_node={d} operand={d}\n" ++
+                        "  got: src_node={d} operand={d}\n",
+                    .{
+                        idx,
+                        @tagName(tag),
+                        @intFromEnum(r.src_node),
+                        @intFromEnum(r.operand),
+                        g.src_node,
+                        g.operand,
+                    },
+                );
+                return error.TestExpectedEqual;
+            }
+        },
+        .ret_implicit => {
+            const r = ref.un_tok;
+            const g = got.un_tok;
+            if (@intFromEnum(r.src_tok) != g.src_tok or
+                @intFromEnum(r.operand) != g.operand)
+            {
+                std.debug.print(
+                    "inst_datas[{d}] (ret_implicit) mismatch:\n" ++
+                        "  ref: src_tok={d} operand={d}\n" ++
+                        "  got: src_tok={d} operand={d}\n",
+                    .{
+                        idx,
+                        @intFromEnum(r.src_tok),
+                        @intFromEnum(r.operand),
+                        g.src_tok,
+                        g.operand,
+                    },
+                );
+                return error.TestExpectedEqual;
+            }
+        },
+        .func,
+        .func_inferred,
+        .array_type,
+        .array_type_sentinel,
+        .array_cat,
+        .array_init,
+        .array_init_ref,
+        .error_set_decl,
+        .struct_init_field_type,
+        .struct_init,
+        .struct_init_ref,
+        .validate_array_init_ref_ty,
+        .validate_array_init_ty,
+        => {
+            const r = ref.pl_node;
+            const g = got.pl_node;
+            if (@intFromEnum(r.src_node) != g.src_node or
+                r.payload_index != g.payload_index)
+            {
+                std.debug.print(
+                    "inst_datas[{d}] ({s}) mismatch:\n" ++
+                        "  ref: src_node={d} payload_index={d}\n" ++
+                        "  got: src_node={d} payload_index={d}\n",
+                    .{
+                        idx,
+                        @tagName(tag),
+                        @intFromEnum(r.src_node),
+                        r.payload_index,
+                        g.src_node,
+                        g.payload_index,
+                    },
+                );
+                return error.TestExpectedEqual;
+            }
+        },
+        .decl_val, .decl_ref => {
+            const r = ref.str_tok;
+            const g = got.str_tok;
+            if (@intFromEnum(r.start) != g.start or @intFromEnum(r.src_tok) != g.src_tok) {
+                std.debug.print(
+                    "inst_datas[{d}] ({s}) mismatch:\n" ++
+                        "  ref: start={d} src_tok={d}\n" ++
+                        "  got: start={d} src_tok={d}\n",
+                    .{
+                        idx,
+                        @tagName(tag),
+                        @intFromEnum(r.start),
+                        @intFromEnum(r.src_tok),
+                        g.start,
+                        g.src_tok,
+                    },
+                );
+                return error.TestExpectedEqual;
+            }
+        },
+        .field_val, .field_ptr, .field_val_named, .field_ptr_named => {
+            const r = ref.pl_node;
+            const g = got.pl_node;
+            if (@intFromEnum(r.src_node) != g.src_node or
+                r.payload_index != g.payload_index)
+            {
+                std.debug.print(
+                    "inst_datas[{d}] ({s}) mismatch:\n" ++
+                        "  ref: src_node={d} payload_index={d}\n" ++
+                        "  got: src_node={d} payload_index={d}\n",
+                    .{
+                        idx,
+                        @tagName(tag),
+                        @intFromEnum(r.src_node),
+                        r.payload_index,
+                        g.src_node,
+                        g.payload_index,
+                    },
+                );
+                return error.TestExpectedEqual;
+            }
+        },
+        .int => {
+            if (ref.int != got.int_val) {
+                std.debug.print(
+                    "inst_datas[{d}] (int) mismatch: ref={d} got={d}\n",
+                    .{ idx, ref.int, got.int_val },
+                );
+                return error.TestExpectedEqual;
+            }
+        },
+        .ptr_type => {
+            // Compare ptr_type data: flags, size, payload_index.
+            if (@as(u8, @bitCast(ref.ptr_type.flags)) != got.ptr_type.flags or
+                @intFromEnum(ref.ptr_type.size) != got.ptr_type.size or
+                ref.ptr_type.payload_index != got.ptr_type.payload_index)
+            {
+                std.debug.print(
+                    "inst_datas[{d}] (ptr_type) mismatch:\n" ++
+                        "  ref: flags=0x{x} size={d} pi={d}\n" ++
+                        "  got: flags=0x{x} size={d} pi={d}\n",
+                    .{
+                        idx,
+                        @as(u8, @bitCast(ref.ptr_type.flags)),
+                        @intFromEnum(ref.ptr_type.size),
+                        ref.ptr_type.payload_index,
+                        got.ptr_type.flags,
+                        got.ptr_type.size,
+                        got.ptr_type.payload_index,
+                    },
+                );
+                return error.TestExpectedEqual;
+            }
+        },
+        .int_type => {
+            const r = ref.int_type;
+            const g = got.int_type;
+            if (@intFromEnum(r.src_node) != g.src_node or
+                @intFromEnum(r.signedness) != g.signedness or
+                r.bit_count != g.bit_count)
+            {
+                std.debug.print(
+                    "inst_datas[{d}] (int_type) mismatch\n",
+                    .{idx},
+                );
+                return error.TestExpectedEqual;
+            }
+        },
+        .str => {
+            const r = ref.str;
+            const g = got.str;
+            if (@intFromEnum(r.start) != g.start or r.len != g.len) {
+                std.debug.print(
+                    "inst_datas[{d}] (str) mismatch:\n" ++
+                        "  ref: start={d} len={d}\n" ++
+                        "  got: start={d} len={d}\n",
+                    .{ idx, @intFromEnum(r.start), r.len, g.start, g.len },
+                );
+                return error.TestExpectedEqual;
+            }
+        },
+        else => {
+            // Generic raw comparison: treat data as two u32 words.
+            // Tags using .node data format have undefined second word.
+            const ref_raw = @as([*]const u32, @ptrCast(&ref));
+            const got_raw = @as([*]const u32, @ptrCast(&got));
+            // Tags where only the first u32 word is meaningful
+            // (second word is padding/undefined).
+            const first_word_only = switch (tag) {
+                // .node data format (single i32):
+                .repeat,
+                .repeat_inline,
+                .ret_ptr,
+                .ret_type,
+                .trap,
+                .alloc_inferred,
+                .alloc_inferred_mut,
+                .alloc_inferred_comptime,
+                .alloc_inferred_comptime_mut,
+                // .@"unreachable" data format (src_node + padding):
+                .@"unreachable",
+                // .save_err_ret_index data format (operand only):
+                .save_err_ret_index,
+                => true,
+                else => false,
+            };
+            const w1_match = ref_raw[0] == got_raw[0];
+            const w2_match = first_word_only or ref_raw[1] == got_raw[1];
+            if (!w1_match or !w2_match) {
+                std.debug.print(
+                    "inst_datas[{d}] ({s}) raw mismatch:\n" ++
+                        "  ref: 0x{x:0>8} 0x{x:0>8}\n" ++
+                        "  got: 0x{x:0>8} 0x{x:0>8}\n",
+                    .{
+                        idx,
+                        @tagName(tag),
+                        ref_raw[0],
+                        ref_raw[1],
+                        got_raw[0],
+                        got_raw[1],
+                    },
+                );
+                return error.TestExpectedEqual;
+            }
+        },
+    }
+}
+
+const corpus_files = .{
+    .{ "astgen_test.zig", @embedFile("astgen_test.zig") },
+    .{ "build.zig", @embedFile("build.zig") },
+    .{ "parser_test.zig", @embedFile("parser_test.zig") },
+    .{ "test_all.zig", @embedFile("test_all.zig") },
+    .{ "tokenizer_test.zig", @embedFile("tokenizer_test.zig") },
+};
+
+fn corpusCheck(gpa: Allocator, source: [:0]const u8) !void {
+    var tree = try Ast.parse(gpa, source, .zig);
+    defer tree.deinit(gpa);
+
+    var ref_zir = try AstGen.generate(gpa, tree);
+    defer ref_zir.deinit(gpa);
+
+    var c_ast = c.astParse(source.ptr, @intCast(source.len));
+    defer c.astDeinit(&c_ast);
+    var c_zir = c.astGen(&c_ast);
+    defer c.zirDeinit(&c_zir);
+
+    if (c_zir.has_compile_errors) {
+        std.debug.print("C port returned compile errors (inst_len={d})\n", .{c_zir.inst_len});
+        return error.TestUnexpectedResult;
+    }
+
+    try expectEqualZir(gpa, ref_zir, c_zir);
+}
+
+test "astgen: struct single field" {
+    const gpa = std.testing.allocator;
+    const source: [:0]const u8 = "const T = struct { x: u32 };";
+    var ref_zir = try refZir(gpa, source);
+    defer ref_zir.deinit(gpa);
+    var c_ast = c.astParse(source.ptr, @intCast(source.len));
+    defer c.astDeinit(&c_ast);
+    var c_zir = c.astGen(&c_ast);
+    defer c.zirDeinit(&c_zir);
+    try expectEqualZir(gpa, ref_zir, c_zir);
+}
+
+test "astgen: struct multiple fields" {
+    const gpa = std.testing.allocator;
+    const source: [:0]const u8 = "const T = struct { x: u32, y: bool };";
+    var ref_zir = try refZir(gpa, source);
+    defer ref_zir.deinit(gpa);
+    var c_ast = c.astParse(source.ptr, @intCast(source.len));
+    defer c.astDeinit(&c_ast);
+    var c_zir = c.astGen(&c_ast);
+    defer c.zirDeinit(&c_zir);
+    try expectEqualZir(gpa, ref_zir, c_zir);
+}
+
+test "astgen: struct field with default" {
+    const gpa = std.testing.allocator;
+    const source: [:0]const u8 = "const T = struct { x: u32 = 0 };";
+    var ref_zir = try refZir(gpa, source);
+    defer ref_zir.deinit(gpa);
+    var c_ast = c.astParse(source.ptr, @intCast(source.len));
+    defer c.astDeinit(&c_ast);
+    var c_zir = c.astGen(&c_ast);
+    defer c.zirDeinit(&c_zir);
+    try expectEqualZir(gpa, ref_zir, c_zir);
+}
+
+test "astgen: struct field with align" {
+    const gpa = std.testing.allocator;
+    const source: [:0]const u8 = "const T = struct { x: u32 align(4) };";
+    var ref_zir = try refZir(gpa, source);
+    defer ref_zir.deinit(gpa);
+    var c_ast = c.astParse(source.ptr, @intCast(source.len));
+    defer c.astDeinit(&c_ast);
+    var c_zir = c.astGen(&c_ast);
+    defer c.zirDeinit(&c_zir);
+    try expectEqualZir(gpa, ref_zir, c_zir);
+}
+
+test "astgen: struct comptime field" {
+    const gpa = std.testing.allocator;
+    const source: [:0]const u8 = "const T = struct { comptime x: u32 = 0 };";
+    var ref_zir = try refZir(gpa, source);
+    defer ref_zir.deinit(gpa);
+    var c_ast = c.astParse(source.ptr, @intCast(source.len));
+    defer c.astDeinit(&c_ast);
+    var c_zir = c.astGen(&c_ast);
+    defer c.zirDeinit(&c_zir);
+    try expectEqualZir(gpa, ref_zir, c_zir);
+}
+
+test "astgen: empty error set" {
+    const gpa = std.testing.allocator;
+    const source: [:0]const u8 = "const E = error{};";
+    var ref_zir = try refZir(gpa, source);
+    defer ref_zir.deinit(gpa);
+    var c_ast = c.astParse(source.ptr, @intCast(source.len));
+    defer c.astDeinit(&c_ast);
+    var c_zir = c.astGen(&c_ast);
+    defer c.zirDeinit(&c_zir);
+    try expectEqualZir(gpa, ref_zir, c_zir);
+}
+
+test "astgen: error set with members" {
+    const gpa = std.testing.allocator;
+    const source: [:0]const u8 = "const E = error{ OutOfMemory, OutOfTime };";
+    var ref_zir = try refZir(gpa, source);
+    defer ref_zir.deinit(gpa);
+    var c_ast = c.astParse(source.ptr, @intCast(source.len));
+    defer c.astDeinit(&c_ast);
+    var c_zir = c.astGen(&c_ast);
+    defer c.zirDeinit(&c_zir);
+    try expectEqualZir(gpa, ref_zir, c_zir);
+}
+
+test "astgen: extern var" {
+    const gpa = std.testing.allocator;
+    const source: [:0]const u8 = "extern var x: u32;";
+    var ref_zir = try refZir(gpa, source);
+    defer ref_zir.deinit(gpa);
+    var c_ast = c.astParse(source.ptr, @intCast(source.len));
+    defer c.astDeinit(&c_ast);
+    var c_zir = c.astGen(&c_ast);
+    defer c.zirDeinit(&c_zir);
+    try expectEqualZir(gpa, ref_zir, c_zir);
+}
+
+test "astgen: corpus test_all.zig" {
+    const gpa = std.testing.allocator;
+    try corpusCheck(gpa, @embedFile("test_all.zig"));
+}
+
+test "astgen: corpus build.zig" {
+    const gpa = std.testing.allocator;
+    try corpusCheck(gpa, @embedFile("build.zig"));
+}
+
+test "astgen: corpus tokenizer_test.zig" {
+    const gpa = std.testing.allocator;
+    try corpusCheck(gpa, @embedFile("tokenizer_test.zig"));
+}
+
+test "astgen: corpus parser_test.zig" {
+    // TODO: 10+ extra data mismatches (ref=48 got=32, bit 4 = propagate_error_trace)
+    // in call instruction flags — ctx propagation differs from upstream.
+    if (true) return error.SkipZigTest;
+    const gpa = std.testing.allocator;
+    try corpusCheck(gpa, @embedFile("parser_test.zig"));
+}
+
+test "astgen: corpus astgen_test.zig" {
+    const gpa = std.testing.allocator;
+    try corpusCheck(gpa, @embedFile("astgen_test.zig"));
+}
+
+test "astgen: enum decl" {
+    const gpa = std.testing.allocator;
+    const source: [:0]const u8 = "const E = enum { a, b, c };";
+    var ref_zir = try refZir(gpa, source);
+    defer ref_zir.deinit(gpa);
+    var c_ast = c.astParse(source.ptr, @intCast(source.len));
+    defer c.astDeinit(&c_ast);
+    var c_zir = c.astGen(&c_ast);
+    defer c.zirDeinit(&c_zir);
+    try expectEqualZir(gpa, ref_zir, c_zir);
+}
+
+test "astgen: struct init typed" {
+    const gpa = std.testing.allocator;
+    const source: [:0]const u8 =
+        \\const T = struct { x: u32 };
+        \\const v = T{ .x = 1 };
+    ;
+    var ref_zir = try refZir(gpa, source);
+    defer ref_zir.deinit(gpa);
+    var c_ast = c.astParse(source.ptr, @intCast(source.len));
+    defer c.astDeinit(&c_ast);
+    var c_zir = c.astGen(&c_ast);
+    defer c.zirDeinit(&c_zir);
+    try expectEqualZir(gpa, ref_zir, c_zir);
+}
+
+test "astgen: corpus" {
+    if (true) return error.SkipZigTest; // TODO: parser_test.zig fails
+    const gpa = std.testing.allocator;
+
+    var any_fail = false;
+    inline for (corpus_files) |entry| {
+        corpusCheck(gpa, entry[1]) catch {
+            any_fail = true;
+        };
+    }
+    if (any_fail) return error.ZirMismatch;
+}
--- a/stage0/build.zig
+++ b/stage0/build.zig
@@ -0,0 +1,248 @@
+const std = @import("std");
+const builtin = @import("builtin");
+
+const headers = &[_][]const u8{
+    "common.h",
+    "ast.h",
+    "parser.h",
+    "zir.h",
+    "astgen.h",
+};
+
+const c_lib_files = &[_][]const u8{
+    "tokenizer.c",
+    "ast.c",
+    "zig0.c",
+    "parser.c",
+    "zir.c",
+    "astgen.c",
+};
+
+const all_c_files = c_lib_files ++ &[_][]const u8{"main.c"};
+
+const cflags = &[_][]const u8{
+    "-std=c11",
+    "-Wall",
+    "-Wvla",
+    "-Wextra",
+    "-Werror",
+    "-Wshadow",
+    "-Wswitch",
+    "-Walloca",
+    "-Wformat=2",
+    "-fno-common",
+    "-Wconversion",
+    "-Wuninitialized",
+    "-Wdouble-promotion",
+    "-fstack-protector-all",
+    "-Wimplicit-fallthrough",
+    "-Wno-unused-function", // TODO remove once refactoring is done
+    //"-D_FORTIFY_SOURCE=2", // consider when optimization flags are enabled
+};
+
+const compilers = &[_][]const u8{ "zig", "clang", "gcc", "tcc" };
+
+pub fn build(b: *std.Build) !void {
+    const optimize = b.standardOptimizeOption(.{});
+
+    const cc = b.option([]const u8, "cc", "C compiler") orelse "zig";
+    const no_exec = b.option(bool, "no-exec", "Compile test binary without running it") orelse false;
+    const valgrind = b.option(bool, "valgrind", "Run tests under valgrind") orelse false;
+    const test_timeout = b.option([]const u8, "test-timeout", "Test execution timeout (default: 10s, none with valgrind)");
+
+    const target = blk: {
+        var query = b.standardTargetOptionsQueryOnly(.{});
+        if (valgrind) {
+            const arch = query.cpu_arch orelse builtin.cpu.arch;
+            if (arch == .x86_64) {
+                query.cpu_features_sub.addFeature(@intFromEnum(std.Target.x86.Feature.avx512f));
+            }
+        }
+        break :blk b.resolveTargetQuery(query);
+    };
+
+    const test_step = b.step("test", "Run unit tests");
+    addTestStep(b, test_step, target, optimize, cc, no_exec, valgrind, test_timeout);
+
+    const fmt_step = b.step("fmt", "clang-format");
+    const clang_format = b.addSystemCommand(&.{ "clang-format", "-i" });
+    for (all_c_files ++ headers) |f| clang_format.addFileArg(b.path(f));
+    fmt_step.dependOn(&clang_format.step);
+
+    const lint_step = b.step("lint", "Run linters");
+
+    for (all_c_files) |cfile| {
+        const clang_analyze = b.addSystemCommand(&.{
+            "clang",
+            "--analyze",
+            "--analyzer-output",
+            "text",
+            "-Wno-unused-command-line-argument",
+            "-Werror",
+            // false positive in astgen.c comptimeDecl: analyzer cannot track
+            // scratch_instructions ownership through pointer parameters.
+            "-Xclang",
+            "-analyzer-disable-checker",
+            "-Xclang",
+            "unix.Malloc",
+        });
+        clang_analyze.addFileArg(b.path(cfile));
+        clang_analyze.expectExitCode(0);
+        lint_step.dependOn(&clang_analyze.step);
+
+        // TODO(motiejus) re-enable once project
+        // nears completion. Takes too long for comfort.
+        //const gcc_analyze = b.addSystemCommand(&.{
+        //    "gcc",
+        //    "-c",
+        //    "--analyzer",
+        //    "-Werror",
+        //    "-o",
+        //    "/dev/null",
+        //});
+        //gcc_analyze.addFileArg(b.path(cfile));
+        //gcc_analyze.expectExitCode(0);
+        //lint_step.dependOn(&gcc_analyze.step);
+
+        const cppcheck = b.addSystemCommand(&.{
+            "cppcheck",
+            "--quiet",
+            "--error-exitcode=1",
+            "--check-level=exhaustive",
+            "--enable=all",
+            "--inline-suppr",
+            "--suppress=missingIncludeSystem",
+            "--suppress=checkersReport",
+            "--suppress=unusedFunction", // TODO remove after plumbing is done
+            "--suppress=unusedStructMember", // TODO remove after plumbing is done
+            "--suppress=unmatchedSuppression",
+        });
+        cppcheck.addFileArg(b.path(cfile));
+        cppcheck.expectExitCode(0);
+        lint_step.dependOn(&cppcheck.step);
+    }
+
+    const fmt_check = b.addSystemCommand(&.{ "clang-format", "--dry-run", "-Werror" });
+    for (all_c_files ++ headers) |f| fmt_check.addFileArg(b.path(f));
+    fmt_check.expectExitCode(0);
+    b.default_step.dependOn(&fmt_check.step);
+
+    for (compilers) |compiler| {
+        addTestStep(b, b.default_step, target, optimize, compiler, false, valgrind, test_timeout);
+    }
+
+    const all_step = b.step("all", "Run fmt check, lint, and tests with all compilers");
+    all_step.dependOn(b.default_step);
+    all_step.dependOn(lint_step);
+}
+
+fn addTestStep(
+    b: *std.Build,
+    step: *std.Build.Step,
+    target: std.Build.ResolvedTarget,
+    optimize: std.builtin.OptimizeMode,
+    cc: []const u8,
+    no_exec: bool,
+    valgrind: bool,
+    test_timeout: ?[]const u8,
+) void {
+    const test_mod = b.createModule(.{
+        .root_source_file = b.path("test_all.zig"),
+        .optimize = optimize,
+        .target = target,
+    });
+    test_mod.addIncludePath(b.path("."));
+
+    // TODO(zig 0.16+): remove this if block entirely; keep only the addLibrary branch.
+    // Also delete addCObjectsDirectly.
+    // Zig 0.15's ELF archive parser fails on archives containing odd-sized objects
+    // (off-by-one after 2-byte alignment). This is fixed on zig master/0.16.
+    if (comptime builtin.zig_version.order(.{ .major = 0, .minor = 16, .patch = 0 }) == .lt) {
+        addCObjectsDirectly(b, test_mod, cc, optimize);
+    } else {
+        const lib_mod = b.createModule(.{
+            .optimize = optimize,
+            .target = target,
+            .link_libc = true,
+        });
+        const lib = b.addLibrary(.{
+            .name = b.fmt("zig0-{s}", .{cc}),
+            .root_module = lib_mod,
+        });
+        addCSources(b, lib.root_module, cc, optimize);
+        test_mod.linkLibrary(lib);
+    }
+
+    const test_exe = b.addTest(.{
+        .root_module = test_mod,
+        .use_llvm = false,
+        .use_lld = false,
+    });
+    const timeout: ?[]const u8 = test_timeout orelse if (valgrind) null else "10";
+    if (valgrind) {
+        if (timeout) |t|
+            test_exe.setExecCmd(&.{
+                "timeout",
+                t,
+                "valgrind",
+                "--error-exitcode=2",
+                "--leak-check=full",
+                "--show-leak-kinds=all",
+                "--errors-for-leak-kinds=all",
+                "--track-fds=yes",
+                null,
+            })
+        else
+            test_exe.setExecCmd(&.{
+                "valgrind",
+                "--error-exitcode=2",
+                "--leak-check=full",
+                "--show-leak-kinds=all",
+                "--errors-for-leak-kinds=all",
+                "--track-fds=yes",
+                null,
+            });
+    } else {
+        test_exe.setExecCmd(&.{ "timeout", timeout orelse "10", null });
+    }
+    if (no_exec) {
+        const install = b.addInstallArtifact(test_exe, .{});
+        step.dependOn(&install.step);
+    } else {
+        step.dependOn(&b.addRunArtifact(test_exe).step);
+    }
+}
+
+fn addCSources(
+    b: *std.Build,
+    mod: *std.Build.Module,
+    cc: []const u8,
+    optimize: std.builtin.OptimizeMode,
+) void {
+    if (std.mem.eql(u8, cc, "zig")) {
+        mod.addCSourceFiles(.{ .files = c_lib_files, .flags = cflags });
+    } else for (c_lib_files) |cfile| {
+        const cc1 = b.addSystemCommand(&.{cc});
+        cc1.addArgs(cflags ++ .{"-g"});
+        cc1.addArg(switch (optimize) {
+            .Debug => "-O0",
+            .ReleaseFast, .ReleaseSafe => "-O3",
+            .ReleaseSmall => "-Os",
+        });
+        cc1.addArg("-c");
+        cc1.addFileArg(b.path(cfile));
+        cc1.addArg("-o");
+        mod.addObjectFile(cc1.addOutputFileArg(b.fmt("{s}.o", .{cfile[0 .. cfile.len - 2]})));
+    }
+}
+
+// TODO(zig 0.16+): delete this function.
+fn addCObjectsDirectly(
+    b: *std.Build,
+    mod: *std.Build.Module,
+    cc: []const u8,
+    optimize: std.builtin.OptimizeMode,
+) void {
+    addCSources(b, mod, cc, optimize);
+    mod.linkSystemLibrary("c", .{});
+}
--- a/stage0/check_test_order.py
+++ b/stage0/check_test_order.py
@@ -0,0 +1,143 @@
+#!/usr/bin/env python3
+"""Check and optionally fix test order in parser_test.zig to match upstream."""
+
+import re
+import sys
+
+OURS = "parser_test.zig"
+UPSTREAM = "../zig/lib/std/zig/parser_test.zig"
+
+
+def extract_test_names(path):
+    with open(path) as f:
+        return re.findall(r'^test "(.+?)" \{', f.read(), re.M)
+
+
+def extract_test_blocks(path):
+    """Split file into: header, list of (name, content) test blocks, footer."""
+    with open(path) as f:
+        lines = f.readlines()
+
+    header = []
+    footer = []
+    blocks = []
+    current_name = None
+    current_lines = []
+    brace_depth = 0
+    in_test = False
+    found_first_test = False
+
+    for line in lines:
+        m = re.match(r'^test "(.+?)" \{', line)
+        if m and not in_test:
+            found_first_test = True
+            if current_name is not None:
+                blocks.append((current_name, "".join(current_lines)))
+            current_name = m.group(1)
+            current_lines = [line]
+            brace_depth = 1
+            in_test = True
+            continue
+
+        if in_test:
+            current_lines.append(line)
+            brace_depth += line.count("{") - line.count("}")
+            if brace_depth == 0:
+                in_test = False
+        elif not found_first_test:
+            header.append(line)
+        else:
+            # Non-test content after tests started — could be blank lines
+            # between tests or footer content
+            if current_name is not None:
+                # Append to previous test block as trailing content
+                current_lines.append(line)
+            else:
+                footer.append(line)
+
+    if current_name is not None:
+        blocks.append((current_name, "".join(current_lines)))
+
+    # Anything after the last test block is footer
+    # Split last block's trailing non-test content into footer
+    if blocks:
+        last_name, last_content = blocks[-1]
+        last_lines = last_content.split('\n')
+        # Find where the test block ends (} at column 0)
+        test_end = len(last_lines)
+        for i, line in enumerate(last_lines):
+            if line == '}' and i > 0:
+                test_end = i + 1
+        if test_end < len(last_lines):
+            blocks[-1] = (last_name, '\n'.join(last_lines[:test_end]) + '\n')
+            footer = ['\n'.join(last_lines[test_end:]) + '\n'] + footer
+
+    return "".join(header), blocks, "".join(footer)
+
+
+def main():
+    fix = "--fix" in sys.argv
+
+    upstream_order = extract_test_names(UPSTREAM)
+    our_names = extract_test_names(OURS)
+
+    # Build position map for upstream
+    upstream_pos = {name: i for i, name in enumerate(upstream_order)}
+
+    # Check order
+    our_in_upstream = [n for n in our_names if n in upstream_pos]
+    positions = [upstream_pos[n] for n in our_in_upstream]
+    is_sorted = positions == sorted(positions)
+
+    if is_sorted:
+        print(f"OK: {len(our_names)} tests in correct order")
+        return 0
+
+    # Find out-of-order tests
+    out_of_order = []
+    prev_pos = -1
+    for name in our_in_upstream:
+        pos = upstream_pos[name]
+        if pos < prev_pos:
+            out_of_order.append(name)
+        prev_pos = max(prev_pos, pos)
+
+    print(f"WARN: {len(out_of_order)} tests out of order:")
+    for name in out_of_order[:10]:
+        print(f"  - {name}")
+    if len(out_of_order) > 10:
+        print(f"  ... and {len(out_of_order) - 10} more")
+
+    if not fix:
+        print("\nRun with --fix to reorder")
+        return 1
+
+    # Fix: reorder
+    header, blocks, footer = extract_test_blocks(OURS)
+    block_map = {name: content for name, content in blocks}
+
+    # Reorder: upstream-ordered first, then extras
+    ordered = []
+    seen = set()
+    for name in upstream_order:
+        if name in block_map and name not in seen:
+            ordered.append((name, block_map[name]))
+            seen.add(name)
+    for name, content in blocks:
+        if name not in seen:
+            ordered.append((name, content))
+            seen.add(name)
+
+    with open(OURS, "w") as f:
+        f.write(header)
+        for _, content in ordered:
+            f.write("\n")
+            f.write(content)
+        f.write(footer)
+
+    print(f"Fixed: {len(ordered)} tests reordered")
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
--- a/stage0/common.h
+++ b/stage0/common.h
@@ -0,0 +1,54 @@
+// common.h — must be included before any system headers.
+#ifndef _ZIG0_COMMON_H__
+#define _ZIG0_COMMON_H__
+
+#include <stdint.h>
+#include <stdlib.h>
+
+#define SLICE(Type)                                                           \
+    struct Type##Slice {                                                      \
+        uint32_t len;                                                         \
+        uint32_t cap;                                                         \
+        Type* arr;                                                            \
+    }
+
+#define ARR_INIT(Type, initial_cap)                                           \
+    ({                                                                        \
+        Type* arr = calloc(initial_cap, sizeof(Type));                        \
+        if (!arr)                                                             \
+            exit(1);                                                          \
+        arr;                                                                  \
+    })
+
+#define SLICE_INIT(Type, initial_cap)                                         \
+    { .len = 0, .cap = (initial_cap), .arr = ARR_INIT(Type, initial_cap) }
+
+#define SLICE_RESIZE(Type, slice, new_cap)                                    \
+    ({                                                                        \
+        const uint32_t cap = (new_cap);                                       \
+        Type* new_arr = realloc((slice)->arr, cap * sizeof(Type));            \
+        if (new_arr == NULL) {                                                \
+            free((slice)->arr);                                               \
+            exit(1);                                                          \
+        }                                                                     \
+        (slice)->arr = new_arr;                                               \
+        (slice)->cap = cap;                                                   \
+    })
+
+#define SLICE_ENSURE_CAPACITY(Type, slice, additional)                        \
+    ({                                                                        \
+        if ((slice)->len + (additional) > (slice)->cap) {                     \
+            SLICE_RESIZE(Type, slice,                                         \
+                ((slice)->cap * 2 > (slice)->len + (additional))              \
+                    ? (slice)->cap * 2                                        \
+                    : (slice)->len + (additional));                           \
+        }                                                                     \
+    })
+
+#define SLICE_APPEND(Type, slice, item)                                       \
+    ({                                                                        \
+        SLICE_ENSURE_CAPACITY(Type, slice, 1);                                \
+        (slice)->arr[(slice)->len++] = (item);                                \
+    })
+
+#endif
--- a/stage0/main.c
+++ b/stage0/main.c
@@ -0,0 +1,39 @@
+#include "common.h"
+
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+int zig0Run(char* program, char** msg);
+int zig0RunFile(char* fname, char** msg);
+
+static void usage(const char* argv0) {
+    fprintf(stderr, "Usage: %s program.zig\n", argv0);
+}
+
+int main(int argc, char** argv) {
+    if (argc != 2) {
+        usage(argv[0]);
+        return 1;
+    }
+
+    char* msg;
+    switch (zig0RunFile(argv[1], &msg)) {
+    case 0:
+        return 0;
+        break;
+    case 1:
+        fprintf(stderr, "panic: %s\n", msg);
+        free(msg);
+        return 0;
+        break;
+    case 2:
+        fprintf(stderr, "interpreter error: %s\n", msg);
+        free(msg);
+        return 1;
+        break;
+    case 3:
+        return 1;
+        break;
+    }
+}
--- a/stage0/parser.c
+++ b/stage0/parser.c
--- a/stage0/parser.h
+++ b/stage0/parser.h
@@ -0,0 +1,44 @@
+// parser.h
+#ifndef _ZIG0_PARSE_H__
+#define _ZIG0_PARSE_H__
+
+#include "ast.h"
+#include "common.h"
+#include <setjmp.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <string.h>
+
+typedef struct {
+    const char* source;
+    uint32_t source_len;
+
+    TokenizerTag* token_tags;
+    AstIndex* token_starts;
+    uint32_t tokens_len;
+
+    AstTokenIndex tok_i;
+
+    AstNodeList nodes;
+    AstNodeIndexSlice extra_data;
+    AstNodeIndexSlice scratch;
+    jmp_buf error_jmp;
+    char* err_buf;
+} Parser;
+
+#define PARSE_ERR_BUF_SIZE 200
+
+_Noreturn static inline void fail(Parser* p, const char* msg) {
+    size_t len = strlen(msg);
+    if (len >= PARSE_ERR_BUF_SIZE)
+        len = PARSE_ERR_BUF_SIZE - 1;
+    memcpy(p->err_buf, msg, len);
+    p->err_buf[len] = '\0';
+    longjmp(p->error_jmp, 1);
+}
+
+Parser* parserInit(const char* source, uint32_t len);
+void parserDeinit(Parser* parser);
+void parseRoot(Parser* parser);
+
+#endif
--- a/stage0/parser_test.zig
+++ b/stage0/parser_test.zig
--- a/stage0/test_all.zig
+++ b/stage0/test_all.zig
@@ -0,0 +1,5 @@
+test "zig0 test suite" {
+    _ = @import("tokenizer_test.zig");
+    _ = @import("parser_test.zig");
+    _ = @import("astgen_test.zig");
+}
--- a/stage0/tokenizer.c
+++ b/stage0/tokenizer.c
--- a/stage0/tokenizer.h
+++ b/stage0/tokenizer.h
@@ -0,0 +1,204 @@
+#ifndef _ZIG0_TOKENIZER_H__
+#define _ZIG0_TOKENIZER_H__
+
+#include <stdbool.h>
+#include <stdint.h>
+
+#define TOKENIZER_FOREACH_TAG_ENUM(TAG)                    \
+    TAG(TOKEN_INVALID)                                     \
+    TAG(TOKEN_INVALID_PERIODASTERISKS)                     \
+    TAG(TOKEN_IDENTIFIER)                                  \
+    TAG(TOKEN_STRING_LITERAL)                              \
+    TAG(TOKEN_MULTILINE_STRING_LITERAL_LINE)               \
+    TAG(TOKEN_CHAR_LITERAL)                                \
+    TAG(TOKEN_EOF)                                         \
+    TAG(TOKEN_BUILTIN)                                     \
+    TAG(TOKEN_BANG)                                        \
+    TAG(TOKEN_PIPE)                                        \
+    TAG(TOKEN_PIPE_PIPE)                                   \
+    TAG(TOKEN_PIPE_EQUAL)                                  \
+    TAG(TOKEN_EQUAL)                                       \
+    TAG(TOKEN_EQUAL_EQUAL)                                 \
+    TAG(TOKEN_EQUAL_ANGLE_BRACKET_RIGHT)                   \
+    TAG(TOKEN_BANG_EQUAL)                                  \
+    TAG(TOKEN_L_PAREN)                                     \
+    TAG(TOKEN_R_PAREN)                                     \
+    TAG(TOKEN_SEMICOLON)                                   \
+    TAG(TOKEN_PERCENT)                                     \
+    TAG(TOKEN_PERCENT_EQUAL)                               \
+    TAG(TOKEN_L_BRACE)                                     \
+    TAG(TOKEN_R_BRACE)                                     \
+    TAG(TOKEN_L_BRACKET)                                   \
+    TAG(TOKEN_R_BRACKET)                                   \
+    TAG(TOKEN_PERIOD)                                      \
+    TAG(TOKEN_PERIOD_ASTERISK)                             \
+    TAG(TOKEN_ELLIPSIS2)                                   \
+    TAG(TOKEN_ELLIPSIS3)                                   \
+    TAG(TOKEN_CARET)                                       \
+    TAG(TOKEN_CARET_EQUAL)                                 \
+    TAG(TOKEN_PLUS)                                        \
+    TAG(TOKEN_PLUS_PLUS)                                   \
+    TAG(TOKEN_PLUS_EQUAL)                                  \
+    TAG(TOKEN_PLUS_PERCENT)                                \
+    TAG(TOKEN_PLUS_PERCENT_EQUAL)                          \
+    TAG(TOKEN_PLUS_PIPE)                                   \
+    TAG(TOKEN_PLUS_PIPE_EQUAL)                             \
+    TAG(TOKEN_MINUS)                                       \
+    TAG(TOKEN_MINUS_EQUAL)                                 \
+    TAG(TOKEN_MINUS_PERCENT)                               \
+    TAG(TOKEN_MINUS_PERCENT_EQUAL)                         \
+    TAG(TOKEN_MINUS_PIPE)                                  \
+    TAG(TOKEN_MINUS_PIPE_EQUAL)                            \
+    TAG(TOKEN_ASTERISK)                                    \
+    TAG(TOKEN_ASTERISK_EQUAL)                              \
+    TAG(TOKEN_ASTERISK_ASTERISK)                           \
+    TAG(TOKEN_ASTERISK_PERCENT)                            \
+    TAG(TOKEN_ASTERISK_PERCENT_EQUAL)                      \
+    TAG(TOKEN_ASTERISK_PIPE)                               \
+    TAG(TOKEN_ASTERISK_PIPE_EQUAL)                         \
+    TAG(TOKEN_ARROW)                                       \
+    TAG(TOKEN_COLON)                                       \
+    TAG(TOKEN_SLASH)                                       \
+    TAG(TOKEN_SLASH_EQUAL)                                 \
+    TAG(TOKEN_COMMA)                                       \
+    TAG(TOKEN_AMPERSAND)                                   \
+    TAG(TOKEN_AMPERSAND_EQUAL)                             \
+    TAG(TOKEN_QUESTION_MARK)                               \
+    TAG(TOKEN_ANGLE_BRACKET_LEFT)                          \
+    TAG(TOKEN_ANGLE_BRACKET_LEFT_EQUAL)                    \
+    TAG(TOKEN_ANGLE_BRACKET_ANGLE_BRACKET_LEFT)            \
+    TAG(TOKEN_ANGLE_BRACKET_ANGLE_BRACKET_LEFT_EQUAL)      \
+    TAG(TOKEN_ANGLE_BRACKET_ANGLE_BRACKET_LEFT_PIPE)       \
+    TAG(TOKEN_ANGLE_BRACKET_ANGLE_BRACKET_LEFT_PIPE_EQUAL) \
+    TAG(TOKEN_ANGLE_BRACKET_RIGHT)                         \
+    TAG(TOKEN_ANGLE_BRACKET_RIGHT_EQUAL)                   \
+    TAG(TOKEN_ANGLE_BRACKET_ANGLE_BRACKET_RIGHT)           \
+    TAG(TOKEN_ANGLE_BRACKET_ANGLE_BRACKET_RIGHT_EQUAL)     \
+    TAG(TOKEN_TILDE)                                       \
+    TAG(TOKEN_NUMBER_LITERAL)                              \
+    TAG(TOKEN_DOC_COMMENT)                                 \
+    TAG(TOKEN_CONTAINER_DOC_COMMENT)                       \
+    TAG(TOKEN_KEYWORD_ADDRSPACE)                           \
+    TAG(TOKEN_KEYWORD_ALIGN)                               \
+    TAG(TOKEN_KEYWORD_ALLOWZERO)                           \
+    TAG(TOKEN_KEYWORD_AND)                                 \
+    TAG(TOKEN_KEYWORD_ANYFRAME)                            \
+    TAG(TOKEN_KEYWORD_ANYTYPE)                             \
+    TAG(TOKEN_KEYWORD_ASM)                                 \
+    TAG(TOKEN_KEYWORD_BREAK)                               \
+    TAG(TOKEN_KEYWORD_CALLCONV)                            \
+    TAG(TOKEN_KEYWORD_CATCH)                               \
+    TAG(TOKEN_KEYWORD_COMPTIME)                            \
+    TAG(TOKEN_KEYWORD_CONST)                               \
+    TAG(TOKEN_KEYWORD_CONTINUE)                            \
+    TAG(TOKEN_KEYWORD_DEFER)                               \
+    TAG(TOKEN_KEYWORD_ELSE)                                \
+    TAG(TOKEN_KEYWORD_ENUM)                                \
+    TAG(TOKEN_KEYWORD_ERRDEFER)                            \
+    TAG(TOKEN_KEYWORD_ERROR)                               \
+    TAG(TOKEN_KEYWORD_EXPORT)                              \
+    TAG(TOKEN_KEYWORD_EXTERN)                              \
+    TAG(TOKEN_KEYWORD_FN)                                  \
+    TAG(TOKEN_KEYWORD_FOR)                                 \
+    TAG(TOKEN_KEYWORD_IF)                                  \
+    TAG(TOKEN_KEYWORD_INLINE)                              \
+    TAG(TOKEN_KEYWORD_NOALIAS)                             \
+    TAG(TOKEN_KEYWORD_NOINLINE)                            \
+    TAG(TOKEN_KEYWORD_NOSUSPEND)                           \
+    TAG(TOKEN_KEYWORD_OPAQUE)                              \
+    TAG(TOKEN_KEYWORD_OR)                                  \
+    TAG(TOKEN_KEYWORD_ORELSE)                              \
+    TAG(TOKEN_KEYWORD_PACKED)                              \
+    TAG(TOKEN_KEYWORD_PUB)                                 \
+    TAG(TOKEN_KEYWORD_RESUME)                              \
+    TAG(TOKEN_KEYWORD_RETURN)                              \
+    TAG(TOKEN_KEYWORD_LINKSECTION)                         \
+    TAG(TOKEN_KEYWORD_STRUCT)                              \
+    TAG(TOKEN_KEYWORD_SUSPEND)                             \
+    TAG(TOKEN_KEYWORD_SWITCH)                              \
+    TAG(TOKEN_KEYWORD_TEST)                                \
+    TAG(TOKEN_KEYWORD_THREADLOCAL)                         \
+    TAG(TOKEN_KEYWORD_TRY)                                 \
+    TAG(TOKEN_KEYWORD_UNION)                               \
+    TAG(TOKEN_KEYWORD_UNREACHABLE)                         \
+    TAG(TOKEN_KEYWORD_VAR)                                 \
+    TAG(TOKEN_KEYWORD_VOLATILE)                            \
+    TAG(TOKEN_KEYWORD_WHILE)
+
+#define TOKENIZER_GENERATE_ENUM(ENUM) ENUM,
+#define TOKENIZER_GENERATE_CASE(ENUM) \
+    case ENUM:                        \
+        return #ENUM;
+
+// First define the enum
+typedef enum {
+    TOKENIZER_FOREACH_TAG_ENUM(TOKENIZER_GENERATE_ENUM)
+} TokenizerTag;
+
+const char* tokenizerGetTagString(TokenizerTag tag);
+
+typedef enum {
+    TOKENIZER_STATE_START,
+    TOKENIZER_STATE_EXPECT_NEWLINE,
+    TOKENIZER_STATE_IDENTIFIER,
+    TOKENIZER_STATE_BUILTIN,
+    TOKENIZER_STATE_STRING_LITERAL,
+    TOKENIZER_STATE_STRING_LITERAL_BACKSLASH,
+    TOKENIZER_STATE_MULTILINE_STRING_LITERAL_LINE,
+    TOKENIZER_STATE_CHAR_LITERAL,
+    TOKENIZER_STATE_CHAR_LITERAL_BACKSLASH,
+    TOKENIZER_STATE_BACKSLASH,
+    TOKENIZER_STATE_EQUAL,
+    TOKENIZER_STATE_BANG,
+    TOKENIZER_STATE_PIPE,
+    TOKENIZER_STATE_MINUS,
+    TOKENIZER_STATE_MINUS_PERCENT,
+    TOKENIZER_STATE_MINUS_PIPE,
+    TOKENIZER_STATE_ASTERISK,
+    TOKENIZER_STATE_ASTERISK_PERCENT,
+    TOKENIZER_STATE_ASTERISK_PIPE,
+    TOKENIZER_STATE_SLASH,
+    TOKENIZER_STATE_LINE_COMMENT_START,
+    TOKENIZER_STATE_LINE_COMMENT,
+    TOKENIZER_STATE_DOC_COMMENT_START,
+    TOKENIZER_STATE_DOC_COMMENT,
+    TOKENIZER_STATE_INT,
+    TOKENIZER_STATE_INT_EXPONENT,
+    TOKENIZER_STATE_INT_PERIOD,
+    TOKENIZER_STATE_FLOAT,
+    TOKENIZER_STATE_FLOAT_EXPONENT,
+    TOKENIZER_STATE_AMPERSAND,
+    TOKENIZER_STATE_CARET,
+    TOKENIZER_STATE_PERCENT,
+    TOKENIZER_STATE_PLUS,
+    TOKENIZER_STATE_PLUS_PERCENT,
+    TOKENIZER_STATE_PLUS_PIPE,
+    TOKENIZER_STATE_ANGLE_BRACKET_LEFT,
+    TOKENIZER_STATE_ANGLE_BRACKET_ANGLE_BRACKET_LEFT,
+    TOKENIZER_STATE_ANGLE_BRACKET_ANGLE_BRACKET_LEFT_PIPE,
+    TOKENIZER_STATE_ANGLE_BRACKET_RIGHT,
+    TOKENIZER_STATE_ANGLE_BRACKET_ANGLE_BRACKET_RIGHT,
+    TOKENIZER_STATE_PERIOD,
+    TOKENIZER_STATE_PERIOD_2,
+    TOKENIZER_STATE_PERIOD_ASTERISK,
+    TOKENIZER_STATE_SAW_AT_SIGN,
+    TOKENIZER_STATE_INVALID,
+} TokenizerState;
+
+typedef struct {
+    TokenizerTag tag;
+    struct {
+        uint32_t start, end;
+    } loc;
+} TokenizerToken;
+
+typedef struct {
+    const char* buffer;
+    const uint32_t buffer_len;
+    uint32_t index;
+} Tokenizer;
+
+Tokenizer tokenizerInit(const char* buffer, uint32_t len);
+TokenizerToken tokenizerNext(Tokenizer* self);
+
+#endif
--- a/stage0/tokenizer_test.zig
+++ b/stage0/tokenizer_test.zig
@@ -0,0 +1,767 @@
+const std = @import("std");
+const testing = std.testing;
+
+const Token = std.zig.Token;
+const Tokenizer = std.zig.Tokenizer;
+
+const c = @cImport({
+    @cInclude("tokenizer.h");
+});
+
+pub fn zigToken(token: c_uint) Token.Tag {
+    return switch (token) {
+        c.TOKEN_INVALID => .invalid,
+        c.TOKEN_INVALID_PERIODASTERISKS => .invalid_periodasterisks,
+        c.TOKEN_IDENTIFIER => .identifier,
+        c.TOKEN_STRING_LITERAL => .string_literal,
+        c.TOKEN_MULTILINE_STRING_LITERAL_LINE => .multiline_string_literal_line,
+        c.TOKEN_CHAR_LITERAL => .char_literal,
+        c.TOKEN_EOF => .eof,
+        c.TOKEN_BUILTIN => .builtin,
+        c.TOKEN_BANG => .bang,
+        c.TOKEN_PIPE => .pipe,
+        c.TOKEN_PIPE_PIPE => .pipe_pipe,
+        c.TOKEN_PIPE_EQUAL => .pipe_equal,
+        c.TOKEN_EQUAL => .equal,
+        c.TOKEN_EQUAL_EQUAL => .equal_equal,
+        c.TOKEN_EQUAL_ANGLE_BRACKET_RIGHT => .equal_angle_bracket_right,
+        c.TOKEN_BANG_EQUAL => .bang_equal,
+        c.TOKEN_L_PAREN => .l_paren,
+        c.TOKEN_R_PAREN => .r_paren,
+        c.TOKEN_SEMICOLON => .semicolon,
+        c.TOKEN_PERCENT => .percent,
+        c.TOKEN_PERCENT_EQUAL => .percent_equal,
+        c.TOKEN_L_BRACE => .l_brace,
+        c.TOKEN_R_BRACE => .r_brace,
+        c.TOKEN_L_BRACKET => .l_bracket,
+        c.TOKEN_R_BRACKET => .r_bracket,
+        c.TOKEN_PERIOD => .period,
+        c.TOKEN_PERIOD_ASTERISK => .period_asterisk,
+        c.TOKEN_ELLIPSIS2 => .ellipsis2,
+        c.TOKEN_ELLIPSIS3 => .ellipsis3,
+        c.TOKEN_CARET => .caret,
+        c.TOKEN_CARET_EQUAL => .caret_equal,
+        c.TOKEN_PLUS => .plus,
+        c.TOKEN_PLUS_PLUS => .plus_plus,
+        c.TOKEN_PLUS_EQUAL => .plus_equal,
+        c.TOKEN_PLUS_PERCENT => .plus_percent,
+        c.TOKEN_PLUS_PERCENT_EQUAL => .plus_percent_equal,
+        c.TOKEN_PLUS_PIPE => .plus_pipe,
+        c.TOKEN_PLUS_PIPE_EQUAL => .plus_pipe_equal,
+        c.TOKEN_MINUS => .minus,
+        c.TOKEN_MINUS_EQUAL => .minus_equal,
+        c.TOKEN_MINUS_PERCENT => .minus_percent,
+        c.TOKEN_MINUS_PERCENT_EQUAL => .minus_percent_equal,
+        c.TOKEN_MINUS_PIPE => .minus_pipe,
+        c.TOKEN_MINUS_PIPE_EQUAL => .minus_pipe_equal,
+        c.TOKEN_ASTERISK => .asterisk,
+        c.TOKEN_ASTERISK_EQUAL => .asterisk_equal,
+        c.TOKEN_ASTERISK_ASTERISK => .asterisk_asterisk,
+        c.TOKEN_ASTERISK_PERCENT => .asterisk_percent,
+        c.TOKEN_ASTERISK_PERCENT_EQUAL => .asterisk_percent_equal,
+        c.TOKEN_ASTERISK_PIPE => .asterisk_pipe,
+        c.TOKEN_ASTERISK_PIPE_EQUAL => .asterisk_pipe_equal,
+        c.TOKEN_ARROW => .arrow,
+        c.TOKEN_COLON => .colon,
+        c.TOKEN_SLASH => .slash,
+        c.TOKEN_SLASH_EQUAL => .slash_equal,
+        c.TOKEN_COMMA => .comma,
+        c.TOKEN_AMPERSAND => .ampersand,
+        c.TOKEN_AMPERSAND_EQUAL => .ampersand_equal,
+        c.TOKEN_QUESTION_MARK => .question_mark,
+        c.TOKEN_ANGLE_BRACKET_LEFT => .angle_bracket_left,
+        c.TOKEN_ANGLE_BRACKET_LEFT_EQUAL => .angle_bracket_left_equal,
+        c.TOKEN_ANGLE_BRACKET_ANGLE_BRACKET_LEFT => .angle_bracket_angle_bracket_left,
+        c.TOKEN_ANGLE_BRACKET_ANGLE_BRACKET_LEFT_EQUAL => .angle_bracket_angle_bracket_left_equal,
+        c.TOKEN_ANGLE_BRACKET_ANGLE_BRACKET_LEFT_PIPE => .angle_bracket_angle_bracket_left_pipe,
+        c.TOKEN_ANGLE_BRACKET_ANGLE_BRACKET_LEFT_PIPE_EQUAL => .angle_bracket_angle_bracket_left_pipe_equal,
+        c.TOKEN_ANGLE_BRACKET_RIGHT => .angle_bracket_right,
+        c.TOKEN_ANGLE_BRACKET_RIGHT_EQUAL => .angle_bracket_right_equal,
+        c.TOKEN_ANGLE_BRACKET_ANGLE_BRACKET_RIGHT => .angle_bracket_angle_bracket_right,
+        c.TOKEN_ANGLE_BRACKET_ANGLE_BRACKET_RIGHT_EQUAL => .angle_bracket_angle_bracket_right_equal,
+        c.TOKEN_TILDE => .tilde,
+        c.TOKEN_NUMBER_LITERAL => .number_literal,
+        c.TOKEN_DOC_COMMENT => .doc_comment,
+        c.TOKEN_CONTAINER_DOC_COMMENT => .container_doc_comment,
+        c.TOKEN_KEYWORD_ADDRSPACE => .keyword_addrspace,
+        c.TOKEN_KEYWORD_ALIGN => .keyword_align,
+        c.TOKEN_KEYWORD_ALLOWZERO => .keyword_allowzero,
+        c.TOKEN_KEYWORD_AND => .keyword_and,
+        c.TOKEN_KEYWORD_ANYFRAME => .keyword_anyframe,
+        c.TOKEN_KEYWORD_ANYTYPE => .keyword_anytype,
+        c.TOKEN_KEYWORD_ASM => .keyword_asm,
+        c.TOKEN_KEYWORD_BREAK => .keyword_break,
+        c.TOKEN_KEYWORD_CALLCONV => .keyword_callconv,
+        c.TOKEN_KEYWORD_CATCH => .keyword_catch,
+        c.TOKEN_KEYWORD_COMPTIME => .keyword_comptime,
+        c.TOKEN_KEYWORD_CONST => .keyword_const,
+        c.TOKEN_KEYWORD_CONTINUE => .keyword_continue,
+        c.TOKEN_KEYWORD_DEFER => .keyword_defer,
+        c.TOKEN_KEYWORD_ELSE => .keyword_else,
+        c.TOKEN_KEYWORD_ENUM => .keyword_enum,
+        c.TOKEN_KEYWORD_ERRDEFER => .keyword_errdefer,
+        c.TOKEN_KEYWORD_ERROR => .keyword_error,
+        c.TOKEN_KEYWORD_EXPORT => .keyword_export,
+        c.TOKEN_KEYWORD_EXTERN => .keyword_extern,
+        c.TOKEN_KEYWORD_FN => .keyword_fn,
+        c.TOKEN_KEYWORD_FOR => .keyword_for,
+        c.TOKEN_KEYWORD_IF => .keyword_if,
+        c.TOKEN_KEYWORD_INLINE => .keyword_inline,
+        c.TOKEN_KEYWORD_NOALIAS => .keyword_noalias,
+        c.TOKEN_KEYWORD_NOINLINE => .keyword_noinline,
+        c.TOKEN_KEYWORD_NOSUSPEND => .keyword_nosuspend,
+        c.TOKEN_KEYWORD_OPAQUE => .keyword_opaque,
+        c.TOKEN_KEYWORD_OR => .keyword_or,
+        c.TOKEN_KEYWORD_ORELSE => .keyword_orelse,
+        c.TOKEN_KEYWORD_PACKED => .keyword_packed,
+        c.TOKEN_KEYWORD_PUB => .keyword_pub,
+        c.TOKEN_KEYWORD_RESUME => .keyword_resume,
+        c.TOKEN_KEYWORD_RETURN => .keyword_return,
+        c.TOKEN_KEYWORD_LINKSECTION => .keyword_linksection,
+        c.TOKEN_KEYWORD_STRUCT => .keyword_struct,
+        c.TOKEN_KEYWORD_SUSPEND => .keyword_suspend,
+        c.TOKEN_KEYWORD_SWITCH => .keyword_switch,
+        c.TOKEN_KEYWORD_TEST => .keyword_test,
+        c.TOKEN_KEYWORD_THREADLOCAL => .keyword_threadlocal,
+        c.TOKEN_KEYWORD_TRY => .keyword_try,
+        c.TOKEN_KEYWORD_UNION => .keyword_union,
+        c.TOKEN_KEYWORD_UNREACHABLE => .keyword_unreachable,
+        c.TOKEN_KEYWORD_VAR => .keyword_var,
+        c.TOKEN_KEYWORD_VOLATILE => .keyword_volatile,
+        c.TOKEN_KEYWORD_WHILE => .keyword_while,
+        else => undefined,
+    };
+}
+
+// Copy-pasted from lib/std/zig/tokenizer.zig
+fn testTokenize(source: [:0]const u8, expected_token_tags: []const Token.Tag) !void {
+    // Do the C thing
+    {
+        var ctokenizer = c.tokenizerInit(source.ptr, @intCast(source.len));
+        for (expected_token_tags) |expected_token_tag| {
+            const token = c.tokenizerNext(&ctokenizer);
+            try std.testing.expectEqual(expected_token_tag, zigToken(token.tag));
+        }
+        const last_token = c.tokenizerNext(&ctokenizer);
+        try std.testing.expectEqual(Token.Tag.eof, zigToken(last_token.tag));
+    }
+
+    {
+        var tokenizer = Tokenizer.init(source);
+        for (expected_token_tags) |expected_token_tag| {
+            const token = tokenizer.next();
+            try std.testing.expectEqual(expected_token_tag, token.tag);
+        }
+        // Last token should always be eof, even when the last token was invalid,
+        // in which case the tokenizer is in an invalid state, which can only be
+        // recovered by opinionated means outside the scope of this implementation.
+        const last_token = tokenizer.next();
+        try std.testing.expectEqual(Token.Tag.eof, last_token.tag);
+        try std.testing.expectEqual(source.len, last_token.loc.start);
+        try std.testing.expectEqual(source.len, last_token.loc.end);
+    }
+}
+
+test "keywords" {
+    try testTokenize("test const else", &.{ .keyword_test, .keyword_const, .keyword_else });
+}
+
+test "line comment followed by top-level comptime" {
+    try testTokenize(
+        \\// line comment
+        \\comptime {}
+        \\
+    , &.{
+        .keyword_comptime,
+        .l_brace,
+        .r_brace,
+    });
+}
+
+test "unknown length pointer and then c pointer" {
+    try testTokenize(
+        \\[*]u8
+        \\[*c]u8
+    , &.{
+        .l_bracket,
+        .asterisk,
+        .r_bracket,
+        .identifier,
+        .l_bracket,
+        .asterisk,
+        .identifier,
+        .r_bracket,
+        .identifier,
+    });
+}
+
+test "code point literal with hex escape" {
+    try testTokenize(
+        \\'\x1b'
+    , &.{.char_literal});
+    try testTokenize(
+        \\'\x1'
+    , &.{.char_literal});
+}
+
+test "newline in char literal" {
+    try testTokenize(
+        \\'
+        \\'
+    , &.{ .invalid, .invalid });
+}
+
+test "newline in string literal" {
+    try testTokenize(
+        \\"
+        \\"
+    , &.{ .invalid, .invalid });
+}
+
+test "code point literal with unicode escapes" {
+    // Valid unicode escapes
+    try testTokenize(
+        \\'\u{3}'
+    , &.{.char_literal});
+    try testTokenize(
+        \\'\u{01}'
+    , &.{.char_literal});
+    try testTokenize(
+        \\'\u{2a}'
+    , &.{.char_literal});
+    try testTokenize(
+        \\'\u{3f9}'
+    , &.{.char_literal});
+    try testTokenize(
+        \\'\u{6E09aBc1523}'
+    , &.{.char_literal});
+    try testTokenize(
+        \\"\u{440}"
+    , &.{.string_literal});
+
+    // Invalid unicode escapes
+    try testTokenize(
+        \\'\u'
+    , &.{.char_literal});
+    try testTokenize(
+        \\'\u{{'
+    , &.{.char_literal});
+    try testTokenize(
+        \\'\u{}'
+    , &.{.char_literal});
+    try testTokenize(
+        \\'\u{s}'
+    , &.{.char_literal});
+    try testTokenize(
+        \\'\u{2z}'
+    , &.{.char_literal});
+    try testTokenize(
+        \\'\u{4a'
+    , &.{.char_literal});
+
+    // Test old-style unicode literals
+    try testTokenize(
+        \\'\u0333'
+    , &.{.char_literal});
+    try testTokenize(
+        \\'\U0333'
+    , &.{.char_literal});
+}
+
+test "code point literal with unicode code point" {
+    try testTokenize(
+        \\'💩'
+    , &.{.char_literal});
+}
+
+test "float literal e exponent" {
+    try testTokenize("a = 4.94065645841246544177e-324;\n", &.{
+        .identifier,
+        .equal,
+        .number_literal,
+        .semicolon,
+    });
+}
+
+test "float literal p exponent" {
+    try testTokenize("a = 0x1.a827999fcef32p+1022;\n", &.{
+        .identifier,
+        .equal,
+        .number_literal,
+        .semicolon,
+    });
+}
+
+test "chars" {
+    try testTokenize("'c'", &.{.char_literal});
+}
+
+test "invalid token characters" {
+    try testTokenize("#", &.{.invalid});
+    try testTokenize("`", &.{.invalid});
+    try testTokenize("'c", &.{.invalid});
+    try testTokenize("'", &.{.invalid});
+    try testTokenize("''", &.{.char_literal});
+    try testTokenize("'\n'", &.{ .invalid, .invalid });
+}
+
+test "invalid literal/comment characters" {
+    try testTokenize("\"\x00\"", &.{.invalid});
+    try testTokenize("`\x00`", &.{.invalid});
+    try testTokenize("//\x00", &.{.invalid});
+    try testTokenize("//\x1f", &.{.invalid});
+    try testTokenize("//\x7f", &.{.invalid});
+}
+
+test "utf8" {
+    try testTokenize("//\xc2\x80", &.{});
+    try testTokenize("//\xf4\x8f\xbf\xbf", &.{});
+}
+
+test "invalid utf8" {
+    try testTokenize("//\x80", &.{});
+    try testTokenize("//\xbf", &.{});
+    try testTokenize("//\xf8", &.{});
+    try testTokenize("//\xff", &.{});
+    try testTokenize("//\xc2\xc0", &.{});
+    try testTokenize("//\xe0", &.{});
+    try testTokenize("//\xf0", &.{});
+    try testTokenize("//\xf0\x90\x80\xc0", &.{});
+}
+
+test "illegal unicode codepoints" {
+    // unicode newline characters.U+0085, U+2028, U+2029
+    try testTokenize("//\xc2\x84", &.{});
+    try testTokenize("//\xc2\x85", &.{});
+    try testTokenize("//\xc2\x86", &.{});
+    try testTokenize("//\xe2\x80\xa7", &.{});
+    try testTokenize("//\xe2\x80\xa8", &.{});
+    try testTokenize("//\xe2\x80\xa9", &.{});
+    try testTokenize("//\xe2\x80\xaa", &.{});
+}
+
+test "string identifier and builtin fns" {
+    try testTokenize(
+        \\const @"if" = @import("std");
+    , &.{
+        .keyword_const,
+        .identifier,
+        .equal,
+        .builtin,
+        .l_paren,
+        .string_literal,
+        .r_paren,
+        .semicolon,
+    });
+}
+
+test "pipe and then invalid" {
+    try testTokenize("||=", &.{
+        .pipe_pipe,
+        .equal,
+    });
+}
+
+test "line comment and doc comment" {
+    try testTokenize("//", &.{});
+    try testTokenize("// a / b", &.{});
+    try testTokenize("// /", &.{});
+    try testTokenize("/// a", &.{.doc_comment});
+    try testTokenize("///", &.{.doc_comment});
+    try testTokenize("////", &.{});
+    try testTokenize("//!", &.{.container_doc_comment});
+    try testTokenize("//!!", &.{.container_doc_comment});
+}
+
+test "line comment followed by identifier" {
+    try testTokenize(
+        \\    Unexpected,
+        \\    // another
+        \\    Another,
+    , &.{
+        .identifier,
+        .comma,
+        .identifier,
+        .comma,
+    });
+}
+
+test "UTF-8 BOM is recognized and skipped" {
+    try testTokenize("\xEF\xBB\xBFa;\n", &.{
+        .identifier,
+        .semicolon,
+    });
+}
+
+test "correctly parse pointer assignment" {
+    try testTokenize("b.*=3;\n", &.{
+        .identifier,
+        .period_asterisk,
+        .equal,
+        .number_literal,
+        .semicolon,
+    });
+}
+
+test "correctly parse pointer dereference followed by asterisk" {
+    try testTokenize("\"b\".* ** 10", &.{
+        .string_literal,
+        .period_asterisk,
+        .asterisk_asterisk,
+        .number_literal,
+    });
+
+    try testTokenize("(\"b\".*)** 10", &.{
+        .l_paren,
+        .string_literal,
+        .period_asterisk,
+        .r_paren,
+        .asterisk_asterisk,
+        .number_literal,
+    });
+
+    try testTokenize("\"b\".*** 10", &.{
+        .string_literal,
+        .invalid_periodasterisks,
+        .asterisk_asterisk,
+        .number_literal,
+    });
+}
+
+test "range literals" {
+    try testTokenize("0...9", &.{ .number_literal, .ellipsis3, .number_literal });
+    try testTokenize("'0'...'9'", &.{ .char_literal, .ellipsis3, .char_literal });
+    try testTokenize("0x00...0x09", &.{ .number_literal, .ellipsis3, .number_literal });
+    try testTokenize("0b00...0b11", &.{ .number_literal, .ellipsis3, .number_literal });
+    try testTokenize("0o00...0o11", &.{ .number_literal, .ellipsis3, .number_literal });
+}
+
+test "number literals decimal" {
+    try testTokenize("0", &.{.number_literal});
+    try testTokenize("1", &.{.number_literal});
+    try testTokenize("2", &.{.number_literal});
+    try testTokenize("3", &.{.number_literal});
+    try testTokenize("4", &.{.number_literal});
+    try testTokenize("5", &.{.number_literal});
+    try testTokenize("6", &.{.number_literal});
+    try testTokenize("7", &.{.number_literal});
+    try testTokenize("8", &.{.number_literal});
+    try testTokenize("9", &.{.number_literal});
+    try testTokenize("1..", &.{ .number_literal, .ellipsis2 });
+    try testTokenize("0a", &.{.number_literal});
+    try testTokenize("9b", &.{.number_literal});
+    try testTokenize("1z", &.{.number_literal});
+    try testTokenize("1z_1", &.{.number_literal});
+    try testTokenize("9z3", &.{.number_literal});
+
+    try testTokenize("0_0", &.{.number_literal});
+    try testTokenize("0001", &.{.number_literal});
+    try testTokenize("01234567890", &.{.number_literal});
+    try testTokenize("012_345_6789_0", &.{.number_literal});
+    try testTokenize("0_1_2_3_4_5_6_7_8_9_0", &.{.number_literal});
+
+    try testTokenize("00_", &.{.number_literal});
+    try testTokenize("0_0_", &.{.number_literal});
+    try testTokenize("0__0", &.{.number_literal});
+    try testTokenize("0_0f", &.{.number_literal});
+    try testTokenize("0_0_f", &.{.number_literal});
+    try testTokenize("0_0_f_00", &.{.number_literal});
+    try testTokenize("1_,", &.{ .number_literal, .comma });
+
+    try testTokenize("0.0", &.{.number_literal});
+    try testTokenize("1.0", &.{.number_literal});
+    try testTokenize("10.0", &.{.number_literal});
+    try testTokenize("0e0", &.{.number_literal});
+    try testTokenize("1e0", &.{.number_literal});
+    try testTokenize("1e100", &.{.number_literal});
+    try testTokenize("1.0e100", &.{.number_literal});
+    try testTokenize("1.0e+100", &.{.number_literal});
+    try testTokenize("1.0e-100", &.{.number_literal});
+    try testTokenize("1_0_0_0.0_0_0_0_0_1e1_0_0_0", &.{.number_literal});
+
+    try testTokenize("1.", &.{ .number_literal, .period });
+    try testTokenize("1e", &.{.number_literal});
+    try testTokenize("1.e100", &.{.number_literal});
+    try testTokenize("1.0e1f0", &.{.number_literal});
+    try testTokenize("1.0p100", &.{.number_literal});
+    try testTokenize("1.0p-100", &.{.number_literal});
+    try testTokenize("1.0p1f0", &.{.number_literal});
+    try testTokenize("1.0_,", &.{ .number_literal, .comma });
+    try testTokenize("1_.0", &.{.number_literal});
+    try testTokenize("1._", &.{.number_literal});
+    try testTokenize("1.a", &.{.number_literal});
+    try testTokenize("1.z", &.{.number_literal});
+    try testTokenize("1._0", &.{.number_literal});
+    try testTokenize("1.+", &.{ .number_literal, .period, .plus });
+    try testTokenize("1._+", &.{ .number_literal, .plus });
+    try testTokenize("1._e", &.{.number_literal});
+    try testTokenize("1.0e", &.{.number_literal});
+    try testTokenize("1.0e,", &.{ .number_literal, .comma });
+    try testTokenize("1.0e_", &.{.number_literal});
+    try testTokenize("1.0e+_", &.{.number_literal});
+    try testTokenize("1.0e-_", &.{.number_literal});
+    try testTokenize("1.0e0_+", &.{ .number_literal, .plus });
+}
+
+test "number literals binary" {
+    try testTokenize("0b0", &.{.number_literal});
+    try testTokenize("0b1", &.{.number_literal});
+    try testTokenize("0b2", &.{.number_literal});
+    try testTokenize("0b3", &.{.number_literal});
+    try testTokenize("0b4", &.{.number_literal});
+    try testTokenize("0b5", &.{.number_literal});
+    try testTokenize("0b6", &.{.number_literal});
+    try testTokenize("0b7", &.{.number_literal});
+    try testTokenize("0b8", &.{.number_literal});
+    try testTokenize("0b9", &.{.number_literal});
+    try testTokenize("0ba", &.{.number_literal});
+    try testTokenize("0bb", &.{.number_literal});
+    try testTokenize("0bc", &.{.number_literal});
+    try testTokenize("0bd", &.{.number_literal});
+    try testTokenize("0be", &.{.number_literal});
+    try testTokenize("0bf", &.{.number_literal});
+    try testTokenize("0bz", &.{.number_literal});
+
+    try testTokenize("0b0000_0000", &.{.number_literal});
+    try testTokenize("0b1111_1111", &.{.number_literal});
+    try testTokenize("0b10_10_10_10", &.{.number_literal});
+    try testTokenize("0b0_1_0_1_0_1_0_1", &.{.number_literal});
+    try testTokenize("0b1.", &.{ .number_literal, .period });
+    try testTokenize("0b1.0", &.{.number_literal});
+
+    try testTokenize("0B0", &.{.number_literal});
+    try testTokenize("0b_", &.{.number_literal});
+    try testTokenize("0b_0", &.{.number_literal});
+    try testTokenize("0b1_", &.{.number_literal});
+    try testTokenize("0b0__1", &.{.number_literal});
+    try testTokenize("0b0_1_", &.{.number_literal});
+    try testTokenize("0b1e", &.{.number_literal});
+    try testTokenize("0b1p", &.{.number_literal});
+    try testTokenize("0b1e0", &.{.number_literal});
+    try testTokenize("0b1p0", &.{.number_literal});
+    try testTokenize("0b1_,", &.{ .number_literal, .comma });
+}
+
+test "number literals octal" {
+    try testTokenize("0o0", &.{.number_literal});
+    try testTokenize("0o1", &.{.number_literal});
+    try testTokenize("0o2", &.{.number_literal});
+    try testTokenize("0o3", &.{.number_literal});
+    try testTokenize("0o4", &.{.number_literal});
+    try testTokenize("0o5", &.{.number_literal});
+    try testTokenize("0o6", &.{.number_literal});
+    try testTokenize("0o7", &.{.number_literal});
+    try testTokenize("0o8", &.{.number_literal});
+    try testTokenize("0o9", &.{.number_literal});
+    try testTokenize("0oa", &.{.number_literal});
+    try testTokenize("0ob", &.{.number_literal});
+    try testTokenize("0oc", &.{.number_literal});
+    try testTokenize("0od", &.{.number_literal});
+    try testTokenize("0oe", &.{.number_literal});
+    try testTokenize("0of", &.{.number_literal});
+    try testTokenize("0oz", &.{.number_literal});
+
+    try testTokenize("0o01234567", &.{.number_literal});
+    try testTokenize("0o0123_4567", &.{.number_literal});
+    try testTokenize("0o01_23_45_67", &.{.number_literal});
+    try testTokenize("0o0_1_2_3_4_5_6_7", &.{.number_literal});
+    try testTokenize("0o7.", &.{ .number_literal, .period });
+    try testTokenize("0o7.0", &.{.number_literal});
+
+    try testTokenize("0O0", &.{.number_literal});
+    try testTokenize("0o_", &.{.number_literal});
+    try testTokenize("0o_0", &.{.number_literal});
+    try testTokenize("0o1_", &.{.number_literal});
+    try testTokenize("0o0__1", &.{.number_literal});
+    try testTokenize("0o0_1_", &.{.number_literal});
+    try testTokenize("0o1e", &.{.number_literal});
+    try testTokenize("0o1p", &.{.number_literal});
+    try testTokenize("0o1e0", &.{.number_literal});
+    try testTokenize("0o1p0", &.{.number_literal});
+    try testTokenize("0o_,", &.{ .number_literal, .comma });
+}
+
+test "number literals hexadecimal" {
+    try testTokenize("0x0", &.{.number_literal});
+    try testTokenize("0x1", &.{.number_literal});
+    try testTokenize("0x2", &.{.number_literal});
+    try testTokenize("0x3", &.{.number_literal});
+    try testTokenize("0x4", &.{.number_literal});
+    try testTokenize("0x5", &.{.number_literal});
+    try testTokenize("0x6", &.{.number_literal});
+    try testTokenize("0x7", &.{.number_literal});
+    try testTokenize("0x8", &.{.number_literal});
+    try testTokenize("0x9", &.{.number_literal});
+    try testTokenize("0xa", &.{.number_literal});
+    try testTokenize("0xb", &.{.number_literal});
+    try testTokenize("0xc", &.{.number_literal});
+    try testTokenize("0xd", &.{.number_literal});
+    try testTokenize("0xe", &.{.number_literal});
+    try testTokenize("0xf", &.{.number_literal});
+    try testTokenize("0xA", &.{.number_literal});
+    try testTokenize("0xB", &.{.number_literal});
+    try testTokenize("0xC", &.{.number_literal});
+    try testTokenize("0xD", &.{.number_literal});
+    try testTokenize("0xE", &.{.number_literal});
+    try testTokenize("0xF", &.{.number_literal});
+    try testTokenize("0x0z", &.{.number_literal});
+    try testTokenize("0xz", &.{.number_literal});
+
+    try testTokenize("0x0123456789ABCDEF", &.{.number_literal});
+    try testTokenize("0x0123_4567_89AB_CDEF", &.{.number_literal});
+    try testTokenize("0x01_23_45_67_89AB_CDE_F", &.{.number_literal});
+    try testTokenize("0x0_1_2_3_4_5_6_7_8_9_A_B_C_D_E_F", &.{.number_literal});
+
+    try testTokenize("0X0", &.{.number_literal});
+    try testTokenize("0x_", &.{.number_literal});
+    try testTokenize("0x_1", &.{.number_literal});
+    try testTokenize("0x1_", &.{.number_literal});
+    try testTokenize("0x0__1", &.{.number_literal});
+    try testTokenize("0x0_1_", &.{.number_literal});
+    try testTokenize("0x_,", &.{ .number_literal, .comma });
+
+    try testTokenize("0x1.0", &.{.number_literal});
+    try testTokenize("0xF.0", &.{.number_literal});
+    try testTokenize("0xF.F", &.{.number_literal});
+    try testTokenize("0xF.Fp0", &.{.number_literal});
+    try testTokenize("0xF.FP0", &.{.number_literal});
+    try testTokenize("0x1p0", &.{.number_literal});
+    try testTokenize("0xfp0", &.{.number_literal});
+    try testTokenize("0x1.0+0xF.0", &.{ .number_literal, .plus, .number_literal });
+
+    try testTokenize("0x1.", &.{ .number_literal, .period });
+    try testTokenize("0xF.", &.{ .number_literal, .period });
+    try testTokenize("0x1.+0xF.", &.{ .number_literal, .period, .plus, .number_literal, .period });
+    try testTokenize("0xff.p10", &.{.number_literal});
+
+    try testTokenize("0x0123456.789ABCDEF", &.{.number_literal});
+    try testTokenize("0x0_123_456.789_ABC_DEF", &.{.number_literal});
+    try testTokenize("0x0_1_2_3_4_5_6.7_8_9_A_B_C_D_E_F", &.{.number_literal});
+    try testTokenize("0x0p0", &.{.number_literal});
+    try testTokenize("0x0.0p0", &.{.number_literal});
+    try testTokenize("0xff.ffp10", &.{.number_literal});
+    try testTokenize("0xff.ffP10", &.{.number_literal});
+    try testTokenize("0xffp10", &.{.number_literal});
+    try testTokenize("0xff_ff.ff_ffp1_0_0_0", &.{.number_literal});
+    try testTokenize("0xf_f_f_f.f_f_f_fp+1_000", &.{.number_literal});
+    try testTokenize("0xf_f_f_f.f_f_f_fp-1_00_0", &.{.number_literal});
+
+    try testTokenize("0x1e", &.{.number_literal});
+    try testTokenize("0x1e0", &.{.number_literal});
+    try testTokenize("0x1p", &.{.number_literal});
+    try testTokenize("0xfp0z1", &.{.number_literal});
+    try testTokenize("0xff.ffpff", &.{.number_literal});
+    try testTokenize("0x0.p", &.{.number_literal});
+    try testTokenize("0x0.z", &.{.number_literal});
+    try testTokenize("0x0._", &.{.number_literal});
+    try testTokenize("0x0_.0", &.{.number_literal});
+    try testTokenize("0x0_.0.0", &.{ .number_literal, .period, .number_literal });
+    try testTokenize("0x0._0", &.{.number_literal});
+    try testTokenize("0x0.0_", &.{.number_literal});
+    try testTokenize("0x0_p0", &.{.number_literal});
+    try testTokenize("0x0_.p0", &.{.number_literal});
+    try testTokenize("0x0._p0", &.{.number_literal});
+    try testTokenize("0x0.0_p0", &.{.number_literal});
+    try testTokenize("0x0._0p0", &.{.number_literal});
+    try testTokenize("0x0.0p_0", &.{.number_literal});
+    try testTokenize("0x0.0p+_0", &.{.number_literal});
+    try testTokenize("0x0.0p-_0", &.{.number_literal});
+    try testTokenize("0x0.0p0_", &.{.number_literal});
+}
+
+test "multi line string literal with only 1 backslash" {
+    try testTokenize("x \\\n;", &.{ .identifier, .invalid, .semicolon });
+}
+
+test "invalid builtin identifiers" {
+    try testTokenize("@()", &.{.invalid});
+    try testTokenize("@0()", &.{.invalid});
+}
+
+test "invalid token with unfinished escape right before eof" {
+    try testTokenize("\"\\", &.{.invalid});
+    try testTokenize("'\\", &.{.invalid});
+    try testTokenize("'\\u", &.{.invalid});
+}
+
+test "saturating operators" {
+    try testTokenize("<<", &.{.angle_bracket_angle_bracket_left});
+    try testTokenize("<<|", &.{.angle_bracket_angle_bracket_left_pipe});
+    try testTokenize("<<|=", &.{.angle_bracket_angle_bracket_left_pipe_equal});
+
+    try testTokenize("*", &.{.asterisk});
+    try testTokenize("*|", &.{.asterisk_pipe});
+    try testTokenize("*|=", &.{.asterisk_pipe_equal});
+
+    try testTokenize("+", &.{.plus});
+    try testTokenize("+|", &.{.plus_pipe});
+    try testTokenize("+|=", &.{.plus_pipe_equal});
+
+    try testTokenize("-", &.{.minus});
+    try testTokenize("-|", &.{.minus_pipe});
+    try testTokenize("-|=", &.{.minus_pipe_equal});
+}
+
+test "null byte before eof" {
+    try testTokenize("123 \x00 456", &.{ .number_literal, .invalid });
+    try testTokenize("//\x00", &.{.invalid});
+    try testTokenize("\\\\\x00", &.{.invalid});
+    try testTokenize("\x00", &.{.invalid});
+    try testTokenize("// NUL\x00\n", &.{.invalid});
+    try testTokenize("///\x00\n", &.{ .doc_comment, .invalid });
+    try testTokenize("/// NUL\x00\n", &.{ .doc_comment, .invalid });
+}
+
+test "invalid tabs and carriage returns" {
+    // "Inside Line Comments and Documentation Comments, Any TAB is rejected by
+    // the grammar since it is ambiguous how it should be rendered."
+    // https://github.com/ziglang/zig-spec/issues/38
+    try testTokenize("//\t", &.{.invalid});
+    try testTokenize("// \t", &.{.invalid});
+    try testTokenize("///\t", &.{.invalid});
+    try testTokenize("/// \t", &.{.invalid});
+    try testTokenize("//!\t", &.{.invalid});
+    try testTokenize("//! \t", &.{.invalid});
+
+    // "Inside Line Comments and Documentation Comments, CR directly preceding
+    // NL is unambiguously part of the newline sequence. It is accepted by the
+    // grammar and removed by zig fmt, leaving only NL. CR anywhere else is
+    // rejected by the grammar."
+    // https://github.com/ziglang/zig-spec/issues/38
+    try testTokenize("//\r", &.{.invalid});
+    try testTokenize("// \r", &.{.invalid});
+    try testTokenize("///\r", &.{.invalid});
+    try testTokenize("/// \r", &.{.invalid});
+    try testTokenize("//\r ", &.{.invalid});
+    try testTokenize("// \r ", &.{.invalid});
+    try testTokenize("///\r ", &.{.invalid});
+    try testTokenize("/// \r ", &.{.invalid});
+    try testTokenize("//\r\n", &.{});
+    try testTokenize("// \r\n", &.{});
+    try testTokenize("///\r\n", &.{.doc_comment});
+    try testTokenize("/// \r\n", &.{.doc_comment});
+    try testTokenize("//!\r", &.{.invalid});
+    try testTokenize("//! \r", &.{.invalid});
+    try testTokenize("//!\r ", &.{.invalid});
+    try testTokenize("//! \r ", &.{.invalid});
+    try testTokenize("//!\r\n", &.{.container_doc_comment});
+    try testTokenize("//! \r\n", &.{.container_doc_comment});
+
+    // The control characters TAB and CR are rejected by the grammar inside multi-line string literals,
+    // except if CR is directly before NL.
+    // https://github.com/ziglang/zig-spec/issues/38
+    try testTokenize("\\\\\r", &.{.invalid});
+    try testTokenize("\\\\\r ", &.{.invalid});
+    try testTokenize("\\\\ \r", &.{.invalid});
+    try testTokenize("\\\\\t", &.{.invalid});
+    try testTokenize("\\\\\t ", &.{.invalid});
+    try testTokenize("\\\\ \t", &.{.invalid});
+    try testTokenize("\\\\\r\n", &.{.multiline_string_literal_line});
+
+    // "TAB used as whitespace is...accepted by the grammar. CR used as
+    // whitespace, whether directly preceding NL or stray, is...accepted by the
+    // grammar."
+    // https://github.com/ziglang/zig-spec/issues/38
+    try testTokenize("\tpub\tswitch\t", &.{ .keyword_pub, .keyword_switch });
+    try testTokenize("\rpub\rswitch\r", &.{ .keyword_pub, .keyword_switch });
+}
--- a/stage0/zig-interp.txt
+++ b/stage0/zig-interp.txt
@@ -0,0 +1,5 @@
+1. implement @panic, write a test that does it.
+2. local variables.
+3. control flow.
+4. functions.
+5. imports until one can import stdlib.
--- a/stage0/zig0.c
+++ b/stage0/zig0.c
@@ -0,0 +1,59 @@
+#include "common.h"
+
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+// API:
+// - code = 0: program successfully terminated.
+// - code = 1: panicked, panic message in msg. Caller should free msg.
+// - code = 2: interpreter error, error in msg. Caller should free msg.
+static int zig0Run(const char* program, char** msg) {
+    (void)program;
+    (void)msg;
+    return 0;
+}
+
+// API: run and:
+// code = 3: abnormal error, expect something in stderr.
+int zig0RunFile(const char* fname, char** msg) {
+    FILE* f = fopen(fname, "r");
+    if (f == NULL) {
+        perror("fopen");
+        return 3;
+    }
+    fseek(f, 0, SEEK_END);
+    long fsizel = ftell(f);
+    if (fsizel == -1) {
+        perror("ftell");
+        fclose(f);
+        return 3;
+    }
+    unsigned long fsize = (unsigned long)fsizel;
+    fseek(f, 0, SEEK_SET);
+
+    char* program = malloc(fsize + 1);
+    if (program == NULL) {
+        perror("malloc");
+        fclose(f);
+        return 3;
+    }
+
+    size_t bytes_read = fread(program, 1, fsize, f);
+    if (bytes_read < fsize) {
+        if (ferror(f)) {
+            perror("fread");
+        } else {
+            fprintf(stderr, "Unexpected end of file\n");
+        }
+        free(program);
+        fclose(f);
+        return 3;
+    }
+    fclose(f);
+    program[fsize] = 0;
+
+    int code = zig0Run(program, msg);
+    free(program);
+    return code;
+}
--- a/stage0/zir.c
+++ b/stage0/zir.c
@@ -0,0 +1,19 @@
+#include "zir.h"
+#include <stdlib.h>
+
+void zirDeinit(Zir* zir) {
+    free(zir->inst_tags);
+    free(zir->inst_datas);
+    free(zir->extra);
+    free(zir->string_bytes);
+    zir->inst_tags = NULL;
+    zir->inst_datas = NULL;
+    zir->extra = NULL;
+    zir->string_bytes = NULL;
+    zir->inst_len = 0;
+    zir->inst_cap = 0;
+    zir->extra_len = 0;
+    zir->extra_cap = 0;
+    zir->string_bytes_len = 0;
+    zir->string_bytes_cap = 0;
+}
--- a/stage0/zir.h
+++ b/stage0/zir.h
@@ -0,0 +1,544 @@
+// zir.h — ZIR data structures, ported from lib/std/zig/Zir.zig.
+#ifndef _ZIG0_ZIR_H__
+#define _ZIG0_ZIR_H__
+
+#include "common.h"
+#include <stdbool.h>
+#include <stdint.h>
+
+// --- ZIR instruction tags (uint8_t) ---
+// Matches Zir.Inst.Tag enum order from Zir.zig.
+
+#define ZIR_INST_FOREACH_TAG(TAG)                                             \
+    TAG(ZIR_INST_ADD)                                                         \
+    TAG(ZIR_INST_ADDWRAP)                                                     \
+    TAG(ZIR_INST_ADD_SAT)                                                     \
+    TAG(ZIR_INST_ADD_UNSAFE)                                                  \
+    TAG(ZIR_INST_SUB)                                                         \
+    TAG(ZIR_INST_SUBWRAP)                                                     \
+    TAG(ZIR_INST_SUB_SAT)                                                     \
+    TAG(ZIR_INST_MUL)                                                         \
+    TAG(ZIR_INST_MULWRAP)                                                     \
+    TAG(ZIR_INST_MUL_SAT)                                                     \
+    TAG(ZIR_INST_DIV_EXACT)                                                   \
+    TAG(ZIR_INST_DIV_FLOOR)                                                   \
+    TAG(ZIR_INST_DIV_TRUNC)                                                   \
+    TAG(ZIR_INST_MOD)                                                         \
+    TAG(ZIR_INST_REM)                                                         \
+    TAG(ZIR_INST_MOD_REM)                                                     \
+    TAG(ZIR_INST_SHL)                                                         \
+    TAG(ZIR_INST_SHL_EXACT)                                                   \
+    TAG(ZIR_INST_SHL_SAT)                                                     \
+    TAG(ZIR_INST_SHR)                                                         \
+    TAG(ZIR_INST_SHR_EXACT)                                                   \
+    TAG(ZIR_INST_PARAM)                                                       \
+    TAG(ZIR_INST_PARAM_COMPTIME)                                              \
+    TAG(ZIR_INST_PARAM_ANYTYPE)                                               \
+    TAG(ZIR_INST_PARAM_ANYTYPE_COMPTIME)                                      \
+    TAG(ZIR_INST_ARRAY_CAT)                                                   \
+    TAG(ZIR_INST_ARRAY_MUL)                                                   \
+    TAG(ZIR_INST_ARRAY_TYPE)                                                  \
+    TAG(ZIR_INST_ARRAY_TYPE_SENTINEL)                                         \
+    TAG(ZIR_INST_VECTOR_TYPE)                                                 \
+    TAG(ZIR_INST_ELEM_TYPE)                                                   \
+    TAG(ZIR_INST_INDEXABLE_PTR_ELEM_TYPE)                                     \
+    TAG(ZIR_INST_SPLAT_OP_RESULT_TY)                                          \
+    TAG(ZIR_INST_INDEXABLE_PTR_LEN)                                           \
+    TAG(ZIR_INST_ANYFRAME_TYPE)                                               \
+    TAG(ZIR_INST_AS_NODE)                                                     \
+    TAG(ZIR_INST_AS_SHIFT_OPERAND)                                            \
+    TAG(ZIR_INST_BIT_AND)                                                     \
+    TAG(ZIR_INST_BITCAST)                                                     \
+    TAG(ZIR_INST_BIT_NOT)                                                     \
+    TAG(ZIR_INST_BIT_OR)                                                      \
+    TAG(ZIR_INST_BLOCK)                                                       \
+    TAG(ZIR_INST_BLOCK_COMPTIME)                                              \
+    TAG(ZIR_INST_BLOCK_INLINE)                                                \
+    TAG(ZIR_INST_DECLARATION)                                                 \
+    TAG(ZIR_INST_SUSPEND_BLOCK)                                               \
+    TAG(ZIR_INST_BOOL_NOT)                                                    \
+    TAG(ZIR_INST_BOOL_BR_AND)                                                 \
+    TAG(ZIR_INST_BOOL_BR_OR)                                                  \
+    TAG(ZIR_INST_BREAK)                                                       \
+    TAG(ZIR_INST_BREAK_INLINE)                                                \
+    TAG(ZIR_INST_SWITCH_CONTINUE)                                             \
+    TAG(ZIR_INST_CHECK_COMPTIME_CONTROL_FLOW)                                 \
+    TAG(ZIR_INST_CALL)                                                        \
+    TAG(ZIR_INST_FIELD_CALL)                                                  \
+    TAG(ZIR_INST_BUILTIN_CALL)                                                \
+    TAG(ZIR_INST_CMP_LT)                                                      \
+    TAG(ZIR_INST_CMP_LTE)                                                     \
+    TAG(ZIR_INST_CMP_EQ)                                                      \
+    TAG(ZIR_INST_CMP_GTE)                                                     \
+    TAG(ZIR_INST_CMP_GT)                                                      \
+    TAG(ZIR_INST_CMP_NEQ)                                                     \
+    TAG(ZIR_INST_CONDBR)                                                      \
+    TAG(ZIR_INST_CONDBR_INLINE)                                               \
+    TAG(ZIR_INST_TRY)                                                         \
+    TAG(ZIR_INST_TRY_PTR)                                                     \
+    TAG(ZIR_INST_ERROR_SET_DECL)                                              \
+    TAG(ZIR_INST_DBG_STMT)                                                    \
+    TAG(ZIR_INST_DBG_VAR_PTR)                                                 \
+    TAG(ZIR_INST_DBG_VAR_VAL)                                                 \
+    TAG(ZIR_INST_DECL_REF)                                                    \
+    TAG(ZIR_INST_DECL_VAL)                                                    \
+    TAG(ZIR_INST_LOAD)                                                        \
+    TAG(ZIR_INST_DIV)                                                         \
+    TAG(ZIR_INST_ELEM_PTR_NODE)                                               \
+    TAG(ZIR_INST_ELEM_PTR)                                                    \
+    TAG(ZIR_INST_ELEM_VAL_NODE)                                               \
+    TAG(ZIR_INST_ELEM_VAL)                                                    \
+    TAG(ZIR_INST_ELEM_VAL_IMM)                                                \
+    TAG(ZIR_INST_ENSURE_RESULT_USED)                                          \
+    TAG(ZIR_INST_ENSURE_RESULT_NON_ERROR)                                     \
+    TAG(ZIR_INST_ENSURE_ERR_UNION_PAYLOAD_VOID)                               \
+    TAG(ZIR_INST_ERROR_UNION_TYPE)                                            \
+    TAG(ZIR_INST_ERROR_VALUE)                                                 \
+    TAG(ZIR_INST_EXPORT)                                                      \
+    TAG(ZIR_INST_FIELD_PTR)                                                   \
+    TAG(ZIR_INST_FIELD_VAL)                                                   \
+    TAG(ZIR_INST_FIELD_PTR_NAMED)                                             \
+    TAG(ZIR_INST_FIELD_VAL_NAMED)                                             \
+    TAG(ZIR_INST_FUNC)                                                        \
+    TAG(ZIR_INST_FUNC_INFERRED)                                               \
+    TAG(ZIR_INST_FUNC_FANCY)                                                  \
+    TAG(ZIR_INST_IMPORT)                                                      \
+    TAG(ZIR_INST_INT)                                                         \
+    TAG(ZIR_INST_INT_BIG)                                                     \
+    TAG(ZIR_INST_FLOAT)                                                       \
+    TAG(ZIR_INST_FLOAT128)                                                    \
+    TAG(ZIR_INST_INT_TYPE)                                                    \
+    TAG(ZIR_INST_IS_NON_NULL)                                                 \
+    TAG(ZIR_INST_IS_NON_NULL_PTR)                                             \
+    TAG(ZIR_INST_IS_NON_ERR)                                                  \
+    TAG(ZIR_INST_IS_NON_ERR_PTR)                                              \
+    TAG(ZIR_INST_RET_IS_NON_ERR)                                              \
+    TAG(ZIR_INST_LOOP)                                                        \
+    TAG(ZIR_INST_REPEAT)                                                      \
+    TAG(ZIR_INST_REPEAT_INLINE)                                               \
+    TAG(ZIR_INST_FOR_LEN)                                                     \
+    TAG(ZIR_INST_MERGE_ERROR_SETS)                                            \
+    TAG(ZIR_INST_REF)                                                         \
+    TAG(ZIR_INST_RET_NODE)                                                    \
+    TAG(ZIR_INST_RET_LOAD)                                                    \
+    TAG(ZIR_INST_RET_IMPLICIT)                                                \
+    TAG(ZIR_INST_RET_ERR_VALUE)                                               \
+    TAG(ZIR_INST_RET_ERR_VALUE_CODE)                                          \
+    TAG(ZIR_INST_RET_PTR)                                                     \
+    TAG(ZIR_INST_RET_TYPE)                                                    \
+    TAG(ZIR_INST_PTR_TYPE)                                                    \
+    TAG(ZIR_INST_SLICE_START)                                                 \
+    TAG(ZIR_INST_SLICE_END)                                                   \
+    TAG(ZIR_INST_SLICE_SENTINEL)                                              \
+    TAG(ZIR_INST_SLICE_LENGTH)                                                \
+    TAG(ZIR_INST_SLICE_SENTINEL_TY)                                           \
+    TAG(ZIR_INST_STORE_NODE)                                                  \
+    TAG(ZIR_INST_STORE_TO_INFERRED_PTR)                                       \
+    TAG(ZIR_INST_STR)                                                         \
+    TAG(ZIR_INST_NEGATE)                                                      \
+    TAG(ZIR_INST_NEGATE_WRAP)                                                 \
+    TAG(ZIR_INST_TYPEOF)                                                      \
+    TAG(ZIR_INST_TYPEOF_BUILTIN)                                              \
+    TAG(ZIR_INST_TYPEOF_LOG2_INT_TYPE)                                        \
+    TAG(ZIR_INST_UNREACHABLE)                                                 \
+    TAG(ZIR_INST_XOR)                                                         \
+    TAG(ZIR_INST_OPTIONAL_TYPE)                                               \
+    TAG(ZIR_INST_OPTIONAL_PAYLOAD_SAFE)                                       \
+    TAG(ZIR_INST_OPTIONAL_PAYLOAD_UNSAFE)                                     \
+    TAG(ZIR_INST_OPTIONAL_PAYLOAD_SAFE_PTR)                                   \
+    TAG(ZIR_INST_OPTIONAL_PAYLOAD_UNSAFE_PTR)                                 \
+    TAG(ZIR_INST_ERR_UNION_PAYLOAD_UNSAFE)                                    \
+    TAG(ZIR_INST_ERR_UNION_PAYLOAD_UNSAFE_PTR)                                \
+    TAG(ZIR_INST_ERR_UNION_CODE)                                              \
+    TAG(ZIR_INST_ERR_UNION_CODE_PTR)                                          \
+    TAG(ZIR_INST_ENUM_LITERAL)                                                \
+    TAG(ZIR_INST_DECL_LITERAL)                                                \
+    TAG(ZIR_INST_DECL_LITERAL_NO_COERCE)                                      \
+    TAG(ZIR_INST_SWITCH_BLOCK)                                                \
+    TAG(ZIR_INST_SWITCH_BLOCK_REF)                                            \
+    TAG(ZIR_INST_SWITCH_BLOCK_ERR_UNION)                                      \
+    TAG(ZIR_INST_VALIDATE_DEREF)                                              \
+    TAG(ZIR_INST_VALIDATE_DESTRUCTURE)                                        \
+    TAG(ZIR_INST_FIELD_TYPE_REF)                                              \
+    TAG(ZIR_INST_OPT_EU_BASE_PTR_INIT)                                        \
+    TAG(ZIR_INST_COERCE_PTR_ELEM_TY)                                          \
+    TAG(ZIR_INST_VALIDATE_REF_TY)                                             \
+    TAG(ZIR_INST_VALIDATE_CONST)                                              \
+    TAG(ZIR_INST_STRUCT_INIT_EMPTY)                                           \
+    TAG(ZIR_INST_STRUCT_INIT_EMPTY_RESULT)                                    \
+    TAG(ZIR_INST_STRUCT_INIT_EMPTY_REF_RESULT)                                \
+    TAG(ZIR_INST_STRUCT_INIT_ANON)                                            \
+    TAG(ZIR_INST_STRUCT_INIT)                                                 \
+    TAG(ZIR_INST_STRUCT_INIT_REF)                                             \
+    TAG(ZIR_INST_VALIDATE_STRUCT_INIT_TY)                                     \
+    TAG(ZIR_INST_VALIDATE_STRUCT_INIT_RESULT_TY)                              \
+    TAG(ZIR_INST_VALIDATE_PTR_STRUCT_INIT)                                    \
+    TAG(ZIR_INST_STRUCT_INIT_FIELD_TYPE)                                      \
+    TAG(ZIR_INST_STRUCT_INIT_FIELD_PTR)                                       \
+    TAG(ZIR_INST_ARRAY_INIT_ANON)                                             \
+    TAG(ZIR_INST_ARRAY_INIT)                                                  \
+    TAG(ZIR_INST_ARRAY_INIT_REF)                                              \
+    TAG(ZIR_INST_VALIDATE_ARRAY_INIT_TY)                                      \
+    TAG(ZIR_INST_VALIDATE_ARRAY_INIT_RESULT_TY)                               \
+    TAG(ZIR_INST_VALIDATE_ARRAY_INIT_REF_TY)                                  \
+    TAG(ZIR_INST_VALIDATE_PTR_ARRAY_INIT)                                     \
+    TAG(ZIR_INST_ARRAY_INIT_ELEM_TYPE)                                        \
+    TAG(ZIR_INST_ARRAY_INIT_ELEM_PTR)                                         \
+    TAG(ZIR_INST_UNION_INIT)                                                  \
+    TAG(ZIR_INST_TYPE_INFO)                                                   \
+    TAG(ZIR_INST_SIZE_OF)                                                     \
+    TAG(ZIR_INST_BIT_SIZE_OF)                                                 \
+    TAG(ZIR_INST_INT_FROM_PTR)                                                \
+    TAG(ZIR_INST_COMPILE_ERROR)                                               \
+    TAG(ZIR_INST_SET_EVAL_BRANCH_QUOTA)                                       \
+    TAG(ZIR_INST_INT_FROM_ENUM)                                               \
+    TAG(ZIR_INST_ALIGN_OF)                                                    \
+    TAG(ZIR_INST_INT_FROM_BOOL)                                               \
+    TAG(ZIR_INST_EMBED_FILE)                                                  \
+    TAG(ZIR_INST_ERROR_NAME)                                                  \
+    TAG(ZIR_INST_PANIC)                                                       \
+    TAG(ZIR_INST_TRAP)                                                        \
+    TAG(ZIR_INST_SET_RUNTIME_SAFETY)                                          \
+    TAG(ZIR_INST_SQRT)                                                        \
+    TAG(ZIR_INST_SIN)                                                         \
+    TAG(ZIR_INST_COS)                                                         \
+    TAG(ZIR_INST_TAN)                                                         \
+    TAG(ZIR_INST_EXP)                                                         \
+    TAG(ZIR_INST_EXP2)                                                        \
+    TAG(ZIR_INST_LOG)                                                         \
+    TAG(ZIR_INST_LOG2)                                                        \
+    TAG(ZIR_INST_LOG10)                                                       \
+    TAG(ZIR_INST_ABS)                                                         \
+    TAG(ZIR_INST_FLOOR)                                                       \
+    TAG(ZIR_INST_CEIL)                                                        \
+    TAG(ZIR_INST_TRUNC)                                                       \
+    TAG(ZIR_INST_ROUND)                                                       \
+    TAG(ZIR_INST_TAG_NAME)                                                    \
+    TAG(ZIR_INST_TYPE_NAME)                                                   \
+    TAG(ZIR_INST_FRAME_TYPE)                                                  \
+    TAG(ZIR_INST_INT_FROM_FLOAT)                                              \
+    TAG(ZIR_INST_FLOAT_FROM_INT)                                              \
+    TAG(ZIR_INST_PTR_FROM_INT)                                                \
+    TAG(ZIR_INST_ENUM_FROM_INT)                                               \
+    TAG(ZIR_INST_FLOAT_CAST)                                                  \
+    TAG(ZIR_INST_INT_CAST)                                                    \
+    TAG(ZIR_INST_PTR_CAST)                                                    \
+    TAG(ZIR_INST_TRUNCATE)                                                    \
+    TAG(ZIR_INST_HAS_DECL)                                                    \
+    TAG(ZIR_INST_HAS_FIELD)                                                   \
+    TAG(ZIR_INST_CLZ)                                                         \
+    TAG(ZIR_INST_CTZ)                                                         \
+    TAG(ZIR_INST_POP_COUNT)                                                   \
+    TAG(ZIR_INST_BYTE_SWAP)                                                   \
+    TAG(ZIR_INST_BIT_REVERSE)                                                 \
+    TAG(ZIR_INST_BIT_OFFSET_OF)                                               \
+    TAG(ZIR_INST_OFFSET_OF)                                                   \
+    TAG(ZIR_INST_SPLAT)                                                       \
+    TAG(ZIR_INST_REDUCE)                                                      \
+    TAG(ZIR_INST_SHUFFLE)                                                     \
+    TAG(ZIR_INST_ATOMIC_LOAD)                                                 \
+    TAG(ZIR_INST_ATOMIC_RMW)                                                  \
+    TAG(ZIR_INST_ATOMIC_STORE)                                                \
+    TAG(ZIR_INST_MUL_ADD)                                                     \
+    TAG(ZIR_INST_MEMCPY)                                                      \
+    TAG(ZIR_INST_MEMMOVE)                                                     \
+    TAG(ZIR_INST_MEMSET)                                                      \
+    TAG(ZIR_INST_MIN)                                                         \
+    TAG(ZIR_INST_MAX)                                                         \
+    TAG(ZIR_INST_C_IMPORT)                                                    \
+    TAG(ZIR_INST_ALLOC)                                                       \
+    TAG(ZIR_INST_ALLOC_MUT)                                                   \
+    TAG(ZIR_INST_ALLOC_COMPTIME_MUT)                                          \
+    TAG(ZIR_INST_ALLOC_INFERRED)                                              \
+    TAG(ZIR_INST_ALLOC_INFERRED_MUT)                                          \
+    TAG(ZIR_INST_ALLOC_INFERRED_COMPTIME)                                     \
+    TAG(ZIR_INST_ALLOC_INFERRED_COMPTIME_MUT)                                 \
+    TAG(ZIR_INST_RESOLVE_INFERRED_ALLOC)                                      \
+    TAG(ZIR_INST_MAKE_PTR_CONST)                                              \
+    TAG(ZIR_INST_RESUME)                                                      \
+    TAG(ZIR_INST_DEFER)                                                       \
+    TAG(ZIR_INST_DEFER_ERR_CODE)                                              \
+    TAG(ZIR_INST_SAVE_ERR_RET_INDEX)                                          \
+    TAG(ZIR_INST_RESTORE_ERR_RET_INDEX_UNCONDITIONAL)                         \
+    TAG(ZIR_INST_RESTORE_ERR_RET_INDEX_FN_ENTRY)                              \
+    TAG(ZIR_INST_EXTENDED)
+
+#define ZIR_GENERATE_ENUM(e) e,
+typedef enum { ZIR_INST_FOREACH_TAG(ZIR_GENERATE_ENUM) } ZirInstTag;
+
+// --- ZIR extended opcodes (uint16_t) ---
+// Matches Zir.Inst.Extended enum order from Zir.zig.
+
+#define ZIR_EXT_FOREACH_TAG(TAG)                                              \
+    TAG(ZIR_EXT_STRUCT_DECL)                                                  \
+    TAG(ZIR_EXT_ENUM_DECL)                                                    \
+    TAG(ZIR_EXT_UNION_DECL)                                                   \
+    TAG(ZIR_EXT_OPAQUE_DECL)                                                  \
+    TAG(ZIR_EXT_TUPLE_DECL)                                                   \
+    TAG(ZIR_EXT_THIS)                                                         \
+    TAG(ZIR_EXT_RET_ADDR)                                                     \
+    TAG(ZIR_EXT_BUILTIN_SRC)                                                  \
+    TAG(ZIR_EXT_ERROR_RETURN_TRACE)                                           \
+    TAG(ZIR_EXT_FRAME)                                                        \
+    TAG(ZIR_EXT_FRAME_ADDRESS)                                                \
+    TAG(ZIR_EXT_ALLOC)                                                        \
+    TAG(ZIR_EXT_BUILTIN_EXTERN)                                               \
+    TAG(ZIR_EXT_ASM)                                                          \
+    TAG(ZIR_EXT_ASM_EXPR)                                                     \
+    TAG(ZIR_EXT_COMPILE_LOG)                                                  \
+    TAG(ZIR_EXT_TYPEOF_PEER)                                                  \
+    TAG(ZIR_EXT_MIN_MULTI)                                                    \
+    TAG(ZIR_EXT_MAX_MULTI)                                                    \
+    TAG(ZIR_EXT_ADD_WITH_OVERFLOW)                                            \
+    TAG(ZIR_EXT_SUB_WITH_OVERFLOW)                                            \
+    TAG(ZIR_EXT_MUL_WITH_OVERFLOW)                                            \
+    TAG(ZIR_EXT_SHL_WITH_OVERFLOW)                                            \
+    TAG(ZIR_EXT_C_UNDEF)                                                      \
+    TAG(ZIR_EXT_C_INCLUDE)                                                    \
+    TAG(ZIR_EXT_C_DEFINE)                                                     \
+    TAG(ZIR_EXT_WASM_MEMORY_SIZE)                                             \
+    TAG(ZIR_EXT_WASM_MEMORY_GROW)                                             \
+    TAG(ZIR_EXT_PREFETCH)                                                     \
+    TAG(ZIR_EXT_SET_FLOAT_MODE)                                               \
+    TAG(ZIR_EXT_ERROR_CAST)                                                   \
+    TAG(ZIR_EXT_BREAKPOINT)                                                   \
+    TAG(ZIR_EXT_DISABLE_INSTRUMENTATION)                                      \
+    TAG(ZIR_EXT_DISABLE_INTRINSICS)                                           \
+    TAG(ZIR_EXT_SELECT)                                                       \
+    TAG(ZIR_EXT_INT_FROM_ERROR)                                               \
+    TAG(ZIR_EXT_ERROR_FROM_INT)                                               \
+    TAG(ZIR_EXT_REIFY)                                                        \
+    TAG(ZIR_EXT_CMPXCHG)                                                      \
+    TAG(ZIR_EXT_C_VA_ARG)                                                     \
+    TAG(ZIR_EXT_C_VA_COPY)                                                    \
+    TAG(ZIR_EXT_C_VA_END)                                                     \
+    TAG(ZIR_EXT_C_VA_START)                                                   \
+    TAG(ZIR_EXT_PTR_CAST_FULL)                                                \
+    TAG(ZIR_EXT_PTR_CAST_NO_DEST)                                             \
+    TAG(ZIR_EXT_WORK_ITEM_ID)                                                 \
+    TAG(ZIR_EXT_WORK_GROUP_SIZE)                                              \
+    TAG(ZIR_EXT_WORK_GROUP_ID)                                                \
+    TAG(ZIR_EXT_IN_COMPTIME)                                                  \
+    TAG(ZIR_EXT_RESTORE_ERR_RET_INDEX)                                        \
+    TAG(ZIR_EXT_CLOSURE_GET)                                                  \
+    TAG(ZIR_EXT_VALUE_PLACEHOLDER)                                            \
+    TAG(ZIR_EXT_FIELD_PARENT_PTR)                                             \
+    TAG(ZIR_EXT_BUILTIN_VALUE)                                                \
+    TAG(ZIR_EXT_BRANCH_HINT)                                                  \
+    TAG(ZIR_EXT_INPLACE_ARITH_RESULT_TY)                                      \
+    TAG(ZIR_EXT_DBG_EMPTY_STMT)                                               \
+    TAG(ZIR_EXT_ASTGEN_ERROR)
+
+#define ZIR_EXT_GENERATE_ENUM(e) e,
+typedef enum { ZIR_EXT_FOREACH_TAG(ZIR_EXT_GENERATE_ENUM) } ZirInstExtended;
+
+// --- ZIR instruction data (8-byte union) ---
+// Matches Zir.Inst.Data union from Zir.zig.
+
+typedef uint32_t ZirInstIndex;
+typedef uint32_t ZirInstRef;
+
+typedef union {
+    struct {
+        uint16_t opcode;
+        uint16_t small;
+        uint32_t operand;
+    } extended;
+    struct {
+        int32_t src_node;
+        ZirInstRef operand;
+    } un_node;
+    struct {
+        int32_t src_tok;
+        ZirInstRef operand;
+    } un_tok;
+    struct {
+        int32_t src_node;
+        uint32_t payload_index;
+    } pl_node;
+    struct {
+        int32_t src_tok;
+        uint32_t payload_index;
+    } pl_tok;
+    struct {
+        ZirInstRef lhs;
+        ZirInstRef rhs;
+    } bin;
+    struct {
+        uint32_t start;
+        uint32_t len;
+    } str;
+    struct {
+        uint32_t start;
+        int32_t src_tok;
+    } str_tok;
+    int32_t tok;
+    int32_t node;
+    uint64_t int_val;
+    double float_val;
+    struct {
+        uint8_t flags;
+        uint8_t size;
+        uint16_t _pad;
+        uint32_t payload_index;
+    } ptr_type;
+    struct {
+        int32_t src_node;
+        uint16_t bit_count;
+        uint8_t signedness;
+        uint8_t _pad;
+    } int_type;
+    struct {
+        int32_t src_node;
+        uint32_t _pad;
+    } unreachable_data;
+    struct {
+        ZirInstRef operand;
+        uint32_t payload_index;
+    } break_data;
+    struct {
+        uint32_t line;
+        uint32_t column;
+    } dbg_stmt;
+    struct {
+        int32_t src_node;
+        ZirInstIndex inst;
+    } inst_node;
+    struct {
+        uint32_t str;
+        ZirInstRef operand;
+    } str_op;
+    struct {
+        uint32_t index;
+        uint32_t len;
+    } defer_data;
+    struct {
+        ZirInstRef err_code;
+        uint32_t payload_index;
+    } defer_err_code;
+    struct {
+        ZirInstRef operand;
+        uint32_t _pad;
+    } save_err_ret_index;
+    struct {
+        ZirInstRef operand;
+        uint32_t idx;
+    } elem_val_imm;
+    struct {
+        uint32_t src_node;
+        uint32_t payload_index;
+    } declaration;
+} ZirInstData;
+
+// --- ZIR built-in refs ---
+// Matches Zir.Inst.Ref enum from Zir.zig.
+// Values below REF_START_INDEX are InternPool indices.
+
+#define ZIR_REF_START_INDEX 124
+#define ZIR_REF_NONE UINT32_MAX
+#define ZIR_MAIN_STRUCT_INST 0
+
+// Zir.Inst.Ref enum values (matching Zig enum order in Zir.zig).
+// Types (0-103).
+#define ZIR_REF_U1_TYPE 2
+#define ZIR_REF_U8_TYPE 3
+#define ZIR_REF_I8_TYPE 4
+#define ZIR_REF_U16_TYPE 5
+#define ZIR_REF_I16_TYPE 6
+#define ZIR_REF_U29_TYPE 7
+#define ZIR_REF_U32_TYPE 8
+#define ZIR_REF_I32_TYPE 9
+#define ZIR_REF_U64_TYPE 10
+#define ZIR_REF_I64_TYPE 11
+#define ZIR_REF_U128_TYPE 13
+#define ZIR_REF_I128_TYPE 14
+#define ZIR_REF_USIZE_TYPE 16
+#define ZIR_REF_ISIZE_TYPE 17
+#define ZIR_REF_C_CHAR_TYPE 18
+#define ZIR_REF_C_SHORT_TYPE 19
+#define ZIR_REF_C_USHORT_TYPE 20
+#define ZIR_REF_C_INT_TYPE 21
+#define ZIR_REF_C_UINT_TYPE 22
+#define ZIR_REF_C_LONG_TYPE 23
+#define ZIR_REF_C_ULONG_TYPE 24
+#define ZIR_REF_C_LONGLONG_TYPE 25
+#define ZIR_REF_C_ULONGLONG_TYPE 26
+#define ZIR_REF_C_LONGDOUBLE_TYPE 27
+#define ZIR_REF_F16_TYPE 28
+#define ZIR_REF_F32_TYPE 29
+#define ZIR_REF_F64_TYPE 30
+#define ZIR_REF_F80_TYPE 31
+#define ZIR_REF_F128_TYPE 32
+#define ZIR_REF_ANYOPAQUE_TYPE 33
+#define ZIR_REF_BOOL_TYPE 34
+#define ZIR_REF_VOID_TYPE 35
+#define ZIR_REF_TYPE_TYPE 36
+#define ZIR_REF_ANYERROR_TYPE 37
+#define ZIR_REF_COMPTIME_INT_TYPE 38
+#define ZIR_REF_COMPTIME_FLOAT_TYPE 39
+#define ZIR_REF_NORETURN_TYPE 40
+#define ZIR_REF_ANYFRAME_TYPE 41
+#define ZIR_REF_NULL_TYPE 42
+#define ZIR_REF_UNDEFINED_TYPE 43
+#define ZIR_REF_ENUM_LITERAL_TYPE 44
+#define ZIR_REF_PTR_USIZE_TYPE 45
+#define ZIR_REF_PTR_CONST_COMPTIME_INT_TYPE 46
+#define ZIR_REF_MANYPTR_U8_TYPE 47
+#define ZIR_REF_MANYPTR_CONST_U8_TYPE 48
+#define ZIR_REF_MANYPTR_CONST_U8_SENTINEL_0_TYPE 49
+#define ZIR_REF_SLICE_CONST_U8_TYPE 50
+#define ZIR_REF_SLICE_CONST_U8_SENTINEL_0_TYPE 51
+#define ZIR_REF_ANYERROR_VOID_ERROR_UNION_TYPE 100
+#define ZIR_REF_GENERIC_POISON_TYPE 102
+#define ZIR_REF_EMPTY_TUPLE_TYPE 103
+// Values (104-123).
+#define ZIR_REF_UNDEF 104
+#define ZIR_REF_UNDEF_BOOL 105
+#define ZIR_REF_UNDEF_USIZE 106
+#define ZIR_REF_UNDEF_U1 107
+#define ZIR_REF_ZERO 108
+#define ZIR_REF_ZERO_USIZE 109
+#define ZIR_REF_ZERO_U1 110
+#define ZIR_REF_ZERO_U8 111
+#define ZIR_REF_ONE 112
+#define ZIR_REF_ONE_USIZE 113
+#define ZIR_REF_ONE_U1 114
+#define ZIR_REF_ONE_U8 115
+#define ZIR_REF_FOUR_U8 116
+#define ZIR_REF_NEGATIVE_ONE 117
+#define ZIR_REF_VOID_VALUE 118
+#define ZIR_REF_UNREACHABLE_VALUE 119
+#define ZIR_REF_NULL_VALUE 120
+#define ZIR_REF_BOOL_TRUE 121
+#define ZIR_REF_BOOL_FALSE 122
+#define ZIR_REF_EMPTY_TUPLE 123
+
+// Ast.Node.OptionalOffset.none = maxInt(i32).
+#define AST_NODE_OFFSET_NONE ((int32_t)0x7FFFFFFF)
+
+// --- Extra indices reserved at the start of extra[] ---
+// Matches Zir.ExtraIndex enum from Zir.zig.
+
+#define ZIR_EXTRA_COMPILE_ERRORS 0
+#define ZIR_EXTRA_IMPORTS 1
+#define ZIR_EXTRA_RESERVED_COUNT 2
+
+// --- Zir output structure ---
+
+typedef struct {
+    uint32_t inst_len;
+    uint32_t inst_cap;
+    ZirInstTag* inst_tags;
+    ZirInstData* inst_datas;
+    uint32_t extra_len;
+    uint32_t extra_cap;
+    uint32_t* extra;
+    uint32_t string_bytes_len;
+    uint32_t string_bytes_cap;
+    uint8_t* string_bytes;
+    bool has_compile_errors;
+} Zir;
+
+void zirDeinit(Zir* zir);
+
+#endif