Add 'stage0/' from commit 'b3d106ec971300a9c745f4681fab3df7518c4346'

git-subtree-dir: stage0
git-subtree-mainline: 3db960767d
git-subtree-split: b3d106ec97
This commit is contained in:
2026-02-13 23:32:08 +02:00
26 changed files with 26186 additions and 0 deletions

3
stage0/.clang-format Normal file
View File

@@ -0,0 +1,3 @@
BasedOnStyle: WebKit
BreakBeforeBraces: Attach
ColumnLimit: 79

View File

@@ -0,0 +1,122 @@
---
name: port-astgen
description: Iteratively port AstGen.zig to astgen.c by enabling skipped corpus tests, finding divergences, and mechanically copying upstream code.
allowed-tools: Read, Write, Edit, Bash, Grep, Glob, Task
disable-model-invocation: true
---
# Port AstGen — Iterative Corpus Test Loop
You are porting `AstGen.zig` to `astgen.c`. This is a **mechanical
translation** — no creativity, no invention. When the C code differs
from Zig, copy the Zig structure into C.
## Key files
- `astgen.c` — C implementation (modify this)
- `astgen_test.zig` — corpus tests (enable/skip tests here)
- `~/code/zig/lib/std/zig/AstGen.zig` — upstream reference (~14k lines)
- `~/code/zig/lib/std/zig/Ast.zig` — AST node accessors
- `~/code/zig/lib/std/zig/Zir.zig` — ZIR instruction definitions
## Loop
Repeat the following steps until all corpus tests pass or you've made
3 consecutive iterations with zero progress.
### Step 1: Find the first skipped corpus test
Search `astgen_test.zig` for lines matching:
```
if (true) return error.SkipZigTest
```
Pick the first one. If none found, all corpus tests pass — stop.
### Step 2: Enable it
Remove or comment out the `if (true) return error.SkipZigTest` line.
### Step 3: Run tests
```sh
zig build test 2>&1
```
Record the output. If tests pass, go to Step 7.
### Step 4: Analyze the failure
From the test output, determine the failure type:
- **`has_compile_errors`**: Temporarily add `#include <stdio.h>` and
`fprintf(stderr, ...)` to `setCompileError()` in `astgen.c` to find
which `SET_ERROR` fires. Run the test again and note the function and
line.
- **`zir mismatch`**: Note `inst_len`, `extra_len`, `string_bytes_len`
diffs and the first tag mismatch position.
- **`unhandled tag N`**: Add the missing ZIR tag to the `expectEqualData`
and `dataMatches` switch statements in `astgen_test.zig`.
### Step 5: Compare implementations
Find the upstream Zig function that corresponds to the failing code
path. Use the Task tool with `subagent_type=general-purpose` to read
both implementations and enumerate **every difference**.
Focus on differences that affect output:
- Extra data written (field order, conditional fields, body lengths)
- Instruction tags emitted
- String table entries
- Break payload values (operand_src_node)
Do NOT guess. Read both implementations completely and compare
mechanically.
### Step 6: Port the fix
Apply the minimal mechanical change to `astgen.c` to match the upstream.
Run `zig build test` after each change to check for progress.
**Progress** means any of:
- `inst_len` diff decreased
- `extra_len` diff decreased
- `string_bytes_len` diff decreased
- First tag mismatch position moved later
If after porting a fix the test still fails but progress was made,
continue to Step 7 (commit progress, re-skip).
### Step 7: Clean up and commit
1. If the corpus test still fails: re-add the `SkipZigTest` line with
a TODO comment describing the remaining diff.
2. Remove ALL `fprintf`/`printf` debug statements from `astgen.c`.
3. Remove `#include <stdio.h>` if it was added for debugging.
4. Verify: `zig build fmt && zig build all` must exit 0 with no unexpected output.
5. Commit:
```sh
git add astgen.c astgen_test.zig
git commit -m "<descriptive message>
Co-Authored-By: <whatever model is running this>"
```
### Step 8: Repeat
Go back to Step 1.
## Rules
- **Mechanical copy only.** Do not invent new approaches. If the upstream does
X, do X in C.
- **Never remove zig-cache.**
- **Never print to stdout/stderr in committed code.** Debug prints are
temporary only.
- **Functions must appear in the same order as in the upstream Zig file.**
- **Commit after every iteration**, even partial positive progress.
- **Prefer finding systematic differences for catching bugs** instead of
debugging and hunting for them. Zig code is bug-free for the purposes of
porting. When test cases fail, it means the C implementation differs from the
Zig one, which is the source of the bug. So standard "bug hunting" methods no
longer apply -- making implementations consistent is a much better approach
in all ways.

3
stage0/.gitignore vendored Normal file
View File

@@ -0,0 +1,3 @@
/.zig-cache/
/zig-out/
*.o

25
stage0/CLAUDE.md Normal file
View File

@@ -0,0 +1,25 @@
- when porting features from upstream Zig, it should be a mechanical copy.
Don't invent. Most of what you are doing is invented, but needs to be re-done
in C. Keep the structure in place, name functions and types the same way (or
within reason equivalently if there are namespacing constraints). It should
be easy to reference one from the other; and, if there are semantic
differences, they *must* be because Zig or C does not support certain
features (like errdefer).
- See README.md for useful information about this project, incl. how to test
this.
- **Never ever** remove zig-cache, nether local nor global.
- Zig code is in ~/code/zig, don't look at /nix/...
- when translating functions from Zig to C (mechanically, remember?), add them
in the same order as in the original Zig file.
- debug printfs: add printfs only when debugging a specific issue; when done
debugging, remove them (or comment them if you may find them useful later). I
prefer committing code only when `zig build` returns no output.
- Always complete all tasks before stopping. Do not stop to ask for
confirmation mid-task. If you have remaining work, continue without waiting
for input.
- no `cppcheck` suppressions. They are here for a reason. If it is complaining
about automatic variables, make it non-automatic. I.e. find a way to satisfy
the linter, do not suppress it.
- if you are in the middle of porting AstGen, load up the skill
.claude/skills/port-astgen/SKILL.md and proceed with it.
- remember: **mechanical copy** when porting existing stuff, no new creativity.

34
stage0/LICENSE Normal file
View File

@@ -0,0 +1,34 @@
NOTICE TO PROSPECTIVE UPSTREAM CONTRIBUTORS
This software is licensed under the MIT License below. However, the
author politely but firmly requests that you do not submit this work, or
any derivative thereof, to the Zig project upstream unless you have
obtained explicit written permission from a Zig core team member
authorizing the submission.
This notice is not a license restriction. The MIT License governs all
use of this software. This is a social contract: please honor it.
---
The MIT License (Expat)
Copyright (c) Motiejus Jakštys
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

45
stage0/README.md Normal file
View File

@@ -0,0 +1,45 @@
# About
zig0 aspires to be an interpreter of zig 0.15.1 written in C.
This is written with help from LLM:
- Lexer:
- Datastructures 100% human.
- Helper functions 100% human.
- Lexing functions 50/50 human/bot.
- Parser:
- Datastructures 100% human.
- Helper functions 50/50.
- Parser functions 5/95 human/bot.
- AstGen: TBD.
# Testing
Quick test:
zig build fmt test
Full test and static analysis with all supported compilers and valgrind (run
before commit, takes a while):
zig build -Dvalgrind
# Debugging tips
Test runs infinitely? Build the test program executable:
$ zig build test -Dno-exec
And then run it, capturing the stack trace:
```
gdb -batch \
-ex "python import threading; threading.Timer(1.0, lambda: gdb.post_event(lambda: gdb.execute('interrupt'))).start()" \
-ex run \
-ex "bt full" \
-ex quit \
zig-out/bin/test
```
You are welcome to replace `-ex "bt full"` with anything other of interest.

122
stage0/ast.c Normal file
View File

@@ -0,0 +1,122 @@
#include "common.h"
#include <setjmp.h>
#include <stdbool.h>
#include <stdlib.h>
#include <string.h>
#include "ast.h"
#include "parser.h"
#define N 1024
static void astTokenListEnsureCapacity(
AstTokenList* list, uint32_t additional) {
const uint32_t new_len = list->len + additional;
if (new_len <= list->cap) {
return;
}
const uint32_t new_cap = new_len > list->cap * 2 ? new_len : list->cap * 2;
list->tags = realloc(list->tags, new_cap * sizeof(TokenizerTag));
list->starts = realloc(list->starts, new_cap * sizeof(AstIndex));
if (!list->tags || !list->starts)
exit(1);
list->cap = new_cap;
}
Ast astParse(const char* source, const uint32_t len) {
uint32_t estimated_token_count = len / 8;
AstTokenList tokens = {
.len = 0,
.cap = estimated_token_count,
.tags = ARR_INIT(TokenizerTag, estimated_token_count),
.starts = ARR_INIT(AstIndex, estimated_token_count),
};
Tokenizer tok = tokenizerInit(source, len);
while (true) {
astTokenListEnsureCapacity(&tokens, 1);
TokenizerToken token = tokenizerNext(&tok);
tokens.tags[tokens.len] = token.tag;
tokens.starts[tokens.len++] = token.loc.start;
if (token.tag == TOKEN_EOF)
break;
}
uint32_t estimated_node_count = (tokens.len + 2) / 2;
char err_buf[PARSE_ERR_BUF_SIZE];
err_buf[0] = '\0';
Parser p = {
.source = source,
.source_len = len,
.token_tags = tokens.tags,
.token_starts = tokens.starts,
.tokens_len = tokens.len,
.tok_i = 0,
.nodes = {
.len = 0,
.cap = estimated_node_count,
.tags = ARR_INIT(AstNodeTag, estimated_node_count),
.main_tokens = ARR_INIT(AstTokenIndex, estimated_node_count),
.datas = ARR_INIT(AstData, estimated_node_count),
},
.extra_data = SLICE_INIT(AstNodeIndex, N),
.scratch = SLICE_INIT(AstNodeIndex, N),
.err_buf = err_buf,
};
bool has_error = false;
if (setjmp(p.error_jmp) != 0) {
has_error = true;
}
if (!has_error)
parseRoot(&p);
p.scratch.cap = p.scratch.len = 0;
free(p.scratch.arr);
char* err_msg = NULL;
if (has_error && err_buf[0] != '\0') {
const size_t len2 = strlen(err_buf);
err_msg = malloc(len2 + 1);
if (!err_msg)
exit(1);
memcpy(err_msg, err_buf, len2 + 1);
}
return (Ast) {
.source = source,
.source_len = len,
.tokens = tokens,
.nodes = p.nodes,
.extra_data = {
.len = p.extra_data.len,
.cap = p.extra_data.cap,
.arr = p.extra_data.arr,
},
.has_error = has_error,
.err_msg = err_msg,
};
}
void astDeinit(Ast* tree) {
free(tree->err_msg);
tree->tokens.cap = tree->tokens.len = 0;
free(tree->tokens.tags);
free(tree->tokens.starts);
tree->nodes.cap = 0;
tree->nodes.len = 0;
free(tree->nodes.tags);
free(tree->nodes.main_tokens);
free(tree->nodes.datas);
tree->extra_data.cap = 0;
tree->extra_data.len = 0;
free(tree->extra_data.arr);
}

625
stage0/ast.h Normal file
View File

@@ -0,0 +1,625 @@
#ifndef _ZIG0_AST_H__
#define _ZIG0_AST_H__
#include <stdbool.h>
#include <stdint.h>
#include "common.h"
#include "tokenizer.h"
typedef enum {
/// sub_list[lhs...rhs]
AST_NODE_ROOT,
/// `usingnamespace lhs;`. rhs unused. main_token is `usingnamespace`.
AST_NODE_USINGNAMESPACE,
/// lhs is test name token (must be string literal or identifier), if any.
/// rhs is the body node.
AST_NODE_TEST_DECL,
/// lhs is the index into extra_data.
/// rhs is the initialization expression, if any.
/// main_token is `var` or `const`.
AST_NODE_GLOBAL_VAR_DECL,
/// `var a: x align(y) = rhs`
/// lhs is the index into extra_data.
/// main_token is `var` or `const`.
AST_NODE_LOCAL_VAR_DECL,
/// `var a: lhs = rhs`. lhs and rhs may be unused.
/// Can be local or global.
/// main_token is `var` or `const`.
AST_NODE_SIMPLE_VAR_DECL,
/// `var a align(lhs) = rhs`. lhs and rhs may be unused.
/// Can be local or global.
/// main_token is `var` or `const`.
AST_NODE_ALIGNED_VAR_DECL,
/// lhs is the identifier token payload if any,
/// rhs is the deferred expression.
AST_NODE_ERRDEFER,
/// lhs is unused.
/// rhs is the deferred expression.
AST_NODE_DEFER,
/// lhs catch rhs
/// lhs catch |err| rhs
/// main_token is the `catch` keyword.
/// payload is determined by looking at the next token after the `catch`
/// keyword.
AST_NODE_CATCH,
/// `lhs.a`. main_token is the dot. rhs is the identifier token index.
AST_NODE_FIELD_ACCESS,
/// `lhs.?`. main_token is the dot. rhs is the `?` token index.
AST_NODE_UNWRAP_OPTIONAL,
/// `lhs == rhs`. main_token is op.
AST_NODE_EQUAL_EQUAL,
/// `lhs != rhs`. main_token is op.
AST_NODE_BANG_EQUAL,
/// `lhs < rhs`. main_token is op.
AST_NODE_LESS_THAN,
/// `lhs > rhs`. main_token is op.
AST_NODE_GREATER_THAN,
/// `lhs <= rhs`. main_token is op.
AST_NODE_LESS_OR_EQUAL,
/// `lhs >= rhs`. main_token is op.
AST_NODE_GREATER_OR_EQUAL,
/// `lhs *= rhs`. main_token is op.
AST_NODE_ASSIGN_MUL,
/// `lhs /= rhs`. main_token is op.
AST_NODE_ASSIGN_DIV,
/// `lhs %= rhs`. main_token is op.
AST_NODE_ASSIGN_MOD,
/// `lhs += rhs`. main_token is op.
AST_NODE_ASSIGN_ADD,
/// `lhs -= rhs`. main_token is op.
AST_NODE_ASSIGN_SUB,
/// `lhs <<= rhs`. main_token is op.
AST_NODE_ASSIGN_SHL,
/// `lhs <<|= rhs`. main_token is op.
AST_NODE_ASSIGN_SHL_SAT,
/// `lhs >>= rhs`. main_token is op.
AST_NODE_ASSIGN_SHR,
/// `lhs &= rhs`. main_token is op.
AST_NODE_ASSIGN_BIT_AND,
/// `lhs ^= rhs`. main_token is op.
AST_NODE_ASSIGN_BIT_XOR,
/// `lhs |= rhs`. main_token is op.
AST_NODE_ASSIGN_BIT_OR,
/// `lhs *%= rhs`. main_token is op.
AST_NODE_ASSIGN_MUL_WRAP,
/// `lhs +%= rhs`. main_token is op.
AST_NODE_ASSIGN_ADD_WRAP,
/// `lhs -%= rhs`. main_token is op.
AST_NODE_ASSIGN_SUB_WRAP,
/// `lhs *|= rhs`. main_token is op.
AST_NODE_ASSIGN_MUL_SAT,
/// `lhs +|= rhs`. main_token is op.
AST_NODE_ASSIGN_ADD_SAT,
/// `lhs -|= rhs`. main_token is op.
AST_NODE_ASSIGN_SUB_SAT,
/// `lhs = rhs`. main_token is op.
AST_NODE_ASSIGN,
/// `a, b, ... = rhs`. main_token is op. lhs is index into `extra_data`
/// of an lhs elem count followed by an array of that many `Node.Index`,
/// with each node having one of the following types:
/// * `global_var_decl`
/// * `local_var_decl`
/// * `simple_var_decl`
/// * `aligned_var_decl`
/// * Any expression node
/// The first 3 types correspond to a `var` or `const` lhs node (note
/// that their `rhs` is always 0). An expression node corresponds to a
/// standard assignment LHS (which must be evaluated as an lvalue).
/// There may be a preceding `comptime` token, which does not create a
/// corresponding `comptime` node so must be manually detected.
AST_NODE_ASSIGN_DESTRUCTURE,
/// `lhs || rhs`. main_token is the `||`.
AST_NODE_MERGE_ERROR_SETS,
/// `lhs * rhs`. main_token is the `*`.
AST_NODE_MUL,
/// `lhs / rhs`. main_token is the `/`.
AST_NODE_DIV,
/// `lhs % rhs`. main_token is the `%`.
AST_NODE_MOD,
/// `lhs ** rhs`. main_token is the `**`.
AST_NODE_ARRAY_MULT,
/// `lhs *% rhs`. main_token is the `*%`.
AST_NODE_MUL_WRAP,
/// `lhs *| rhs`. main_token is the `*|`.
AST_NODE_MUL_SAT,
/// `lhs + rhs`. main_token is the `+`.
AST_NODE_ADD,
/// `lhs - rhs`. main_token is the `-`.
AST_NODE_SUB,
/// `lhs ++ rhs`. main_token is the `++`.
AST_NODE_ARRAY_CAT,
/// `lhs +% rhs`. main_token is the `+%`.
AST_NODE_ADD_WRAP,
/// `lhs -% rhs`. main_token is the `-%`.
AST_NODE_SUB_WRAP,
/// `lhs +| rhs`. main_token is the `+|`.
AST_NODE_ADD_SAT,
/// `lhs -| rhs`. main_token is the `-|`.
AST_NODE_SUB_SAT,
/// `lhs << rhs`. main_token is the `<<`.
AST_NODE_SHL,
/// `lhs <<| rhs`. main_token is the `<<|`.
AST_NODE_SHL_SAT,
/// `lhs >> rhs`. main_token is the `>>`.
AST_NODE_SHR,
/// `lhs & rhs`. main_token is the `&`.
AST_NODE_BIT_AND,
/// `lhs ^ rhs`. main_token is the `^`.
AST_NODE_BIT_XOR,
/// `lhs | rhs`. main_token is the `|`.
AST_NODE_BIT_OR,
/// `lhs orelse rhs`. main_token is the `orelse`.
AST_NODE_ORELSE,
/// `lhs and rhs`. main_token is the `and`.
AST_NODE_BOOL_AND,
/// `lhs or rhs`. main_token is the `or`.
AST_NODE_BOOL_OR,
/// `op lhs`. rhs unused. main_token is op.
AST_NODE_BOOL_NOT,
/// `op lhs`. rhs unused. main_token is op.
AST_NODE_NEGATION,
/// `op lhs`. rhs unused. main_token is op.
AST_NODE_BIT_NOT,
/// `op lhs`. rhs unused. main_token is op.
AST_NODE_NEGATION_WRAP,
/// `op lhs`. rhs unused. main_token is op.
AST_NODE_ADDRESS_OF,
/// `op lhs`. rhs unused. main_token is op.
AST_NODE_TRY,
/// `op lhs`. rhs unused. main_token is op.
AST_NODE_AWAIT,
/// `?lhs`. rhs unused. main_token is the `?`.
AST_NODE_OPTIONAL_TYPE,
/// `[lhs]rhs`.
AST_NODE_ARRAY_TYPE,
/// `[lhs:a]b`. `ArrayTypeSentinel[rhs]`.
AST_NODE_ARRAY_TYPE_SENTINEL,
/// `[*]align(lhs) rhs`. lhs can be omitted.
/// `*align(lhs) rhs`. lhs can be omitted.
/// `[]rhs`.
/// main_token is the asterisk if a single item pointer or the lbracket
/// if a slice, many-item pointer, or C-pointer
/// main_token might be a ** token, which is shared with a parent/child
/// pointer type and may require special handling.
AST_NODE_PTR_TYPE_ALIGNED,
/// `[*:lhs]rhs`. lhs can be omitted.
/// `*rhs`.
/// `[:lhs]rhs`.
/// main_token is the asterisk if a single item pointer or the lbracket
/// if a slice, many-item pointer, or C-pointer
/// main_token might be a ** token, which is shared with a parent/child
/// pointer type and may require special handling.
AST_NODE_PTR_TYPE_SENTINEL,
/// lhs is index into ptr_type. rhs is the element type expression.
/// main_token is the asterisk if a single item pointer or the lbracket
/// if a slice, many-item pointer, or C-pointer
/// main_token might be a ** token, which is shared with a parent/child
/// pointer type and may require special handling.
AST_NODE_PTR_TYPE,
/// lhs is index into ptr_type_bit_range. rhs is the element type
/// expression.
/// main_token is the asterisk if a single item pointer or the lbracket
/// if a slice, many-item pointer, or C-pointer
/// main_token might be a ** token, which is shared with a parent/child
/// pointer type and may require special handling.
AST_NODE_PTR_TYPE_BIT_RANGE,
/// `lhs[rhs..]`
/// main_token is the lbracket.
AST_NODE_SLICE_OPEN,
/// `lhs[b..c]`. rhs is index into Slice
/// main_token is the lbracket.
AST_NODE_SLICE,
/// `lhs[b..c :d]`. rhs is index into SliceSentinel. Slice end c can be
/// omitted.
/// main_token is the lbracket.
AST_NODE_SLICE_SENTINEL,
/// `lhs.*`. rhs is unused.
AST_NODE_DEREF,
/// `lhs[rhs]`.
AST_NODE_ARRAY_ACCESS,
/// `lhs{rhs}`. rhs can be omitted.
AST_NODE_ARRAY_INIT_ONE,
/// `lhs{rhs,}`. rhs can *not* be omitted
AST_NODE_ARRAY_INIT_ONE_COMMA,
/// `.{lhs, rhs}`. lhs and rhs can be omitted.
AST_NODE_ARRAY_INIT_DOT_TWO,
/// Same as `array_init_dot_two` except there is known to be a trailing
/// comma
/// before the final rbrace.
AST_NODE_ARRAY_INIT_DOT_TWO_COMMA,
/// `.{a, b}`. `sub_list[lhs..rhs]`.
AST_NODE_ARRAY_INIT_DOT,
/// Same as `array_init_dot` except there is known to be a trailing comma
/// before the final rbrace.
AST_NODE_ARRAY_INIT_DOT_COMMA,
/// `lhs{a, b}`. `sub_range_list[rhs]`. lhs can be omitted which means
/// `.{a, b}`.
AST_NODE_ARRAY_INIT,
/// Same as `array_init` except there is known to be a trailing comma
/// before the final rbrace.
AST_NODE_ARRAY_INIT_COMMA,
/// `lhs{.a = rhs}`. rhs can be omitted making it empty.
/// main_token is the lbrace.
AST_NODE_STRUCT_INIT_ONE,
/// `lhs{.a = rhs,}`. rhs can *not* be omitted.
/// main_token is the lbrace.
AST_NODE_STRUCT_INIT_ONE_COMMA,
/// `.{.a = lhs, .b = rhs}`. lhs and rhs can be omitted.
/// main_token is the lbrace.
/// No trailing comma before the rbrace.
AST_NODE_STRUCT_INIT_DOT_TWO,
/// Same as `struct_init_dot_two` except there is known to be a trailing
/// comma
/// before the final rbrace.
AST_NODE_STRUCT_INIT_DOT_TWO_COMMA,
/// `.{.a = b, .c = d}`. `sub_list[lhs..rhs]`.
/// main_token is the lbrace.
AST_NODE_STRUCT_INIT_DOT,
/// Same as `struct_init_dot` except there is known to be a trailing comma
/// before the final rbrace.
AST_NODE_STRUCT_INIT_DOT_COMMA,
/// `lhs{.a = b, .c = d}`. `sub_range_list[rhs]`.
/// lhs can be omitted which means `.{.a = b, .c = d}`.
/// main_token is the lbrace.
AST_NODE_STRUCT_INIT,
/// Same as `struct_init` except there is known to be a trailing comma
/// before the final rbrace.
AST_NODE_STRUCT_INIT_COMMA,
/// `lhs(rhs)`. rhs can be omitted.
/// main_token is the lparen.
AST_NODE_CALL_ONE,
/// `lhs(rhs,)`. rhs can be omitted.
/// main_token is the lparen.
AST_NODE_CALL_ONE_COMMA,
/// `async lhs(rhs)`. rhs can be omitted.
AST_NODE_ASYNC_CALL_ONE,
/// `async lhs(rhs,)`.
AST_NODE_ASYNC_CALL_ONE_COMMA,
/// `lhs(a, b, c)`. `SubRange[rhs]`.
/// main_token is the `(`.
AST_NODE_CALL,
/// `lhs(a, b, c,)`. `SubRange[rhs]`.
/// main_token is the `(`.
AST_NODE_CALL_COMMA,
/// `async lhs(a, b, c)`. `SubRange[rhs]`.
/// main_token is the `(`.
AST_NODE_ASYNC_CALL,
/// `async lhs(a, b, c,)`. `SubRange[rhs]`.
/// main_token is the `(`.
AST_NODE_ASYNC_CALL_COMMA,
/// `switch(lhs) {}`. `SubRange[rhs]`.
/// `main_token` is the identifier of a preceding label, if any; otherwise
/// `switch`.
AST_NODE_SWITCH,
/// Same as switch except there is known to be a trailing comma
/// before the final rbrace
AST_NODE_SWITCH_COMMA,
/// `lhs => rhs`. If lhs is omitted it means `else`.
/// main_token is the `=>`
AST_NODE_SWITCH_CASE_ONE,
/// Same ast `switch_case_one` but the case is inline
AST_NODE_SWITCH_CASE_INLINE_ONE,
/// `a, b, c => rhs`. `SubRange[lhs]`.
/// main_token is the `=>`
AST_NODE_SWITCH_CASE,
/// Same ast `switch_case` but the case is inline
AST_NODE_SWITCH_CASE_INLINE,
/// `lhs...rhs`.
AST_NODE_SWITCH_RANGE,
/// `while (lhs) rhs`.
/// `while (lhs) |x| rhs`.
AST_NODE_WHILE_SIMPLE,
/// `while (lhs) : (a) b`. `WhileCont[rhs]`.
/// `while (lhs) : (a) b`. `WhileCont[rhs]`.
AST_NODE_WHILE_CONT,
/// `while (lhs) : (a) b else c`. `While[rhs]`.
/// `while (lhs) |x| : (a) b else c`. `While[rhs]`.
/// `while (lhs) |x| : (a) b else |y| c`. `While[rhs]`.
/// The cont expression part `: (a)` may be omitted.
AST_NODE_WHILE,
/// `for (lhs) rhs`.
AST_NODE_FOR_SIMPLE,
/// `for (lhs[0..inputs]) lhs[inputs + 1] else lhs[inputs + 2]`.
/// `For[rhs]`.
AST_NODE_FOR,
/// `lhs..rhs`. rhs can be omitted.
AST_NODE_FOR_RANGE,
/// `if (lhs) rhs`.
/// `if (lhs) |a| rhs`.
AST_NODE_IF_SIMPLE,
/// `if (lhs) a else b`. `If[rhs]`.
/// `if (lhs) |x| a else b`. `If[rhs]`.
/// `if (lhs) |x| a else |y| b`. `If[rhs]`.
AST_NODE_IF,
/// `suspend lhs`. lhs can be omitted. rhs is unused.
AST_NODE_SUSPEND,
/// `resume lhs`. rhs is unused.
AST_NODE_RESUME,
/// `continue :lhs rhs`
/// both lhs and rhs may be omitted.
AST_NODE_CONTINUE,
/// `break :lhs rhs`
/// both lhs and rhs may be omitted.
AST_NODE_BREAK,
/// `return lhs`. lhs can be omitted. rhs is unused.
AST_NODE_RETURN,
/// `fn (a: lhs) rhs`. lhs can be omitted.
/// anytype and ... parameters are omitted from the AST tree.
/// main_token is the `fn` keyword.
/// extern function declarations use this tag.
AST_NODE_FN_PROTO_SIMPLE,
/// `fn (a: b, c: d) rhs`. `sub_range_list[lhs]`.
/// anytype and ... parameters are omitted from the AST tree.
/// main_token is the `fn` keyword.
/// extern function declarations use this tag.
AST_NODE_FN_PROTO_MULTI,
/// `fn (a: b) addrspace(e) linksection(f) callconv(g) rhs`.
/// `FnProtoOne[lhs]`.
/// zero or one parameters.
/// anytype and ... parameters are omitted from the AST tree.
/// main_token is the `fn` keyword.
/// extern function declarations use this tag.
AST_NODE_FN_PROTO_ONE,
/// `fn (a: b, c: d) addrspace(e) linksection(f) callconv(g) rhs`.
/// `FnProto[lhs]`.
/// anytype and ... parameters are omitted from the AST tree.
/// main_token is the `fn` keyword.
/// extern function declarations use this tag.
AST_NODE_FN_PROTO,
/// lhs is the fn_proto.
/// rhs is the function body block.
/// Note that extern function declarations use the fn_proto tags rather
/// than this one.
AST_NODE_FN_DECL,
/// `anyframe->rhs`. main_token is `anyframe`. `lhs` is arrow token index.
AST_NODE_ANYFRAME_TYPE,
/// Both lhs and rhs unused.
AST_NODE_ANYFRAME_LITERAL,
/// Both lhs and rhs unused.
AST_NODE_CHAR_LITERAL,
/// Both lhs and rhs unused.
AST_NODE_NUMBER_LITERAL,
/// Both lhs and rhs unused.
AST_NODE_UNREACHABLE_LITERAL,
/// Both lhs and rhs unused.
/// Most identifiers will not have explicit AST nodes, however for
/// expressions
/// which could be one of many different kinds of AST nodes, there will be
/// an
/// identifier AST node for it.
AST_NODE_IDENTIFIER,
/// lhs is the dot token index, rhs unused, main_token is the identifier.
AST_NODE_ENUM_LITERAL,
/// main_token is the string literal token
/// Both lhs and rhs unused.
AST_NODE_STRING_LITERAL,
/// main_token is the first token index (redundant with lhs)
/// lhs is the first token index; rhs is the last token index.
/// Could be a series of multiline_string_literal_line tokens, or a single
/// string_literal token.
AST_NODE_MULTILINE_STRING_LITERAL,
/// `(lhs)`. main_token is the `(`; rhs is the token index of the `)`.
AST_NODE_GROUPED_EXPRESSION,
/// `@a(lhs, rhs)`. lhs and rhs may be omitted.
/// main_token is the builtin token.
AST_NODE_BUILTIN_CALL_TWO,
/// Same as builtin_call_two but there is known to be a trailing comma
/// before the rparen.
AST_NODE_BUILTIN_CALL_TWO_COMMA,
/// `@a(b, c)`. `sub_list[lhs..rhs]`.
/// main_token is the builtin token.
AST_NODE_BUILTIN_CALL,
/// Same as builtin_call but there is known to be a trailing comma before
/// the rparen.
AST_NODE_BUILTIN_CALL_COMMA,
/// `error{a, b}`.
/// rhs is the rbrace, lhs is unused.
AST_NODE_ERROR_SET_DECL,
/// `struct {}`, `union {}`, `opaque {}`, `enum {}`.
/// `extra_data[lhs..rhs]`.
/// main_token is `struct`, `union`, `opaque`, `enum` keyword.
AST_NODE_CONTAINER_DECL,
/// Same as ContainerDecl but there is known to be a trailing comma
/// or semicolon before the rbrace.
AST_NODE_CONTAINER_DECL_TRAILING,
/// `struct {lhs, rhs}`, `union {lhs, rhs}`, `opaque {lhs, rhs}`, `enum
/// {lhs, rhs}`.
/// lhs or rhs can be omitted.
/// main_token is `struct`, `union`, `opaque`, `enum` keyword.
AST_NODE_CONTAINER_DECL_TWO,
/// Same as ContainerDeclTwo except there is known to be a trailing comma
/// or semicolon before the rbrace.
AST_NODE_CONTAINER_DECL_TWO_TRAILING,
/// `struct(lhs)` / `union(lhs)` / `enum(lhs)`. `SubRange[rhs]`.
AST_NODE_CONTAINER_DECL_ARG,
/// Same as container_decl_arg but there is known to be a trailing
/// comma or semicolon before the rbrace.
AST_NODE_CONTAINER_DECL_ARG_TRAILING,
/// `union(enum) {}`. `sub_list[lhs..rhs]`.
/// Note that tagged unions with explicitly provided enums are represented
/// by `container_decl_arg`.
AST_NODE_TAGGED_UNION,
/// Same as tagged_union but there is known to be a trailing comma
/// or semicolon before the rbrace.
AST_NODE_TAGGED_UNION_TRAILING,
/// `union(enum) {lhs, rhs}`. lhs or rhs may be omitted.
/// Note that tagged unions with explicitly provided enums are represented
/// by `container_decl_arg`.
AST_NODE_TAGGED_UNION_TWO,
/// Same as tagged_union_two but there is known to be a trailing comma
/// or semicolon before the rbrace.
AST_NODE_TAGGED_UNION_TWO_TRAILING,
/// `union(enum(lhs)) {}`. `SubRange[rhs]`.
AST_NODE_TAGGED_UNION_ENUM_TAG,
/// Same as tagged_union_enum_tag but there is known to be a trailing comma
/// or semicolon before the rbrace.
AST_NODE_TAGGED_UNION_ENUM_TAG_TRAILING,
/// `a: lhs = rhs,`. lhs and rhs can be omitted.
/// main_token is the field name identifier.
/// lastToken() does not include the possible trailing comma.
AST_NODE_CONTAINER_FIELD_INIT,
/// `a: lhs align(rhs),`. rhs can be omitted.
/// main_token is the field name identifier.
/// lastToken() does not include the possible trailing comma.
AST_NODE_CONTAINER_FIELD_ALIGN,
/// `a: lhs align(c) = d,`. `container_field_list[rhs]`.
/// main_token is the field name identifier.
/// lastToken() does not include the possible trailing comma.
AST_NODE_CONTAINER_FIELD,
/// `comptime lhs`. rhs unused.
AST_NODE_COMPTIME,
/// `nosuspend lhs`. rhs unused.
AST_NODE_NOSUSPEND,
/// `{lhs rhs}`. rhs or lhs can be omitted.
/// main_token points at the lbrace.
AST_NODE_BLOCK_TWO,
/// Same as block_two but there is known to be a semicolon before the
/// rbrace.
AST_NODE_BLOCK_TWO_SEMICOLON,
/// `{}`. `sub_list[lhs..rhs]`.
/// main_token points at the lbrace.
AST_NODE_BLOCK,
/// Same as block but there is known to be a semicolon before the rbrace.
AST_NODE_BLOCK_SEMICOLON,
/// `asm(lhs)`. rhs is the token index of the rparen.
AST_NODE_ASM_SIMPLE,
/// Legacy asm with string clobbers. `asm(lhs, a)`.
/// `AsmLegacy[rhs]`.
AST_NODE_ASM_LEGACY,
/// `asm(lhs, a)`. `Asm[rhs]`.
AST_NODE_ASM,
/// `[a] "b" (c)`. lhs is 0, rhs is token index of the rparen.
/// `[a] "b" (-> lhs)`. rhs is token index of the rparen.
/// main_token is `a`.
AST_NODE_ASM_OUTPUT,
/// `[a] "b" (lhs)`. rhs is token index of the rparen.
/// main_token is `a`.
AST_NODE_ASM_INPUT,
/// `error.a`. lhs is token index of `.`. rhs is token index of `a`.
AST_NODE_ERROR_VALUE,
/// `lhs!rhs`. main_token is the `!`.
AST_NODE_ERROR_UNION,
} AstNodeTag;
typedef uint32_t AstTokenIndex;
typedef uint32_t AstNodeIndex;
typedef uint32_t AstIndex;
typedef struct {
AstIndex lhs;
AstIndex rhs;
} AstData;
typedef struct {
uint32_t len;
uint32_t cap;
AstNodeTag* tags;
AstTokenIndex* main_tokens;
AstData* datas;
} AstNodeList;
typedef struct {
AstNodeTag tag;
AstTokenIndex main_token;
AstData data;
} AstNodeItem;
typedef struct {
uint32_t len;
uint32_t cap;
TokenizerTag* tags;
AstIndex* starts;
} AstTokenList;
typedef SLICE(AstNodeIndex) AstNodeIndexSlice;
typedef struct {
const char* source;
uint32_t source_len;
AstTokenList tokens;
AstNodeList nodes;
AstNodeIndexSlice extra_data;
bool has_error;
char* err_msg;
} Ast;
typedef struct AstPtrType {
AstNodeIndex sentinel;
AstNodeIndex align_node;
AstNodeIndex addrspace_node;
} AstPtrType;
typedef struct AstPtrTypeBitRange {
AstNodeIndex sentinel;
AstNodeIndex align_node;
AstNodeIndex addrspace_node;
AstNodeIndex bit_range_start;
AstNodeIndex bit_range_end;
} AstPtrTypeBitRange;
typedef struct AstFnProtoOne {
AstNodeIndex param;
AstNodeIndex align_expr;
AstNodeIndex addrspace_expr;
AstNodeIndex section_expr;
AstNodeIndex callconv_expr;
} AstFnProtoOne;
typedef struct AstFnProto {
AstNodeIndex params_start;
AstNodeIndex params_end;
AstNodeIndex align_expr;
AstNodeIndex addrspace_expr;
AstNodeIndex section_expr;
AstNodeIndex callconv_expr;
} AstFnProto;
typedef struct AstSubRange {
AstNodeIndex start;
AstNodeIndex end;
} AstSubRange;
typedef struct AstSliceSentinel {
AstNodeIndex start;
AstNodeIndex end;
AstNodeIndex sentinel;
} AstSliceSentinel;
typedef struct AstWhileCont {
AstNodeIndex cont_expr;
AstNodeIndex then_expr;
} AstWhileCont;
typedef struct AstWhile {
AstNodeIndex cont_expr;
AstNodeIndex then_expr;
AstNodeIndex else_expr;
} AstWhile;
typedef struct AstFor {
unsigned int inputs : 31;
unsigned int has_else : 1;
} AstFor;
typedef struct AstIf {
AstNodeIndex then_expr;
AstNodeIndex else_expr;
} AstIf;
typedef struct AstError {
bool is_note;
AstTokenIndex token;
union {
struct {
TokenizerTag expected_tag;
} expected;
struct {
} none;
} extra;
} AstError;
Ast astParse(const char* source, uint32_t len);
void astDeinit(Ast*);
#endif

10639
stage0/astgen.c Normal file

File diff suppressed because it is too large Load Diff

11
stage0/astgen.h Normal file
View File

@@ -0,0 +1,11 @@
// astgen.h — AST to ZIR conversion, ported from lib/std/zig/AstGen.zig.
#ifndef _ZIG0_ASTGEN_H__
#define _ZIG0_ASTGEN_H__
#include "ast.h"
#include "zir.h"
// Convert AST to ZIR.
Zir astGen(const Ast* ast);
#endif

851
stage0/astgen_test.zig Normal file
View File

@@ -0,0 +1,851 @@
const std = @import("std");
const Ast = std.zig.Ast;
const Zir = std.zig.Zir;
const AstGen = std.zig.AstGen;
const Allocator = std.mem.Allocator;
const c = @cImport({
@cInclude("astgen.h");
});
fn refZir(gpa: Allocator, source: [:0]const u8) !Zir {
var tree = try Ast.parse(gpa, source, .zig);
defer tree.deinit(gpa);
return try AstGen.generate(gpa, tree);
}
test "astgen dump: simple cases" {
const gpa = std.testing.allocator;
const cases = .{
.{ "empty", "" },
.{ "comptime {}", "comptime {}" },
.{ "const x = 0;", "const x = 0;" },
.{ "const x = 1;", "const x = 1;" },
.{ "const x = 0; const y = 0;", "const x = 0; const y = 0;" },
.{ "test \"t\" {}", "test \"t\" {}" },
.{ "const std = @import(\"std\");", "const std = @import(\"std\");" },
.{ "test_all.zig", @embedFile("test_all.zig") },
};
inline for (cases) |case| {
// std.debug.print("--- {s} ---\n", .{case[0]});
const source: [:0]const u8 = case[1];
var zir = try refZir(gpa, source);
zir.deinit(gpa);
}
}
/// Build a mask of extra[] indices that contain hash data (src_hash or
/// fields_hash). These are zero-filled in the C output but contain real
/// Blake3 hashes in the Zig reference. We skip these positions during
/// comparison.
fn buildHashSkipMask(gpa: Allocator, ref: Zir) ![]bool {
const ref_extra_len: u32 = @intCast(ref.extra.len);
const skip = try gpa.alloc(bool, ref_extra_len);
@memset(skip, false);
const ref_len: u32 = @intCast(ref.instructions.len);
const ref_tags = ref.instructions.items(.tag);
const ref_datas = ref.instructions.items(.data);
for (0..ref_len) |i| {
switch (ref_tags[i]) {
.extended => {
const ext = ref_datas[i].extended;
if (ext.opcode == .struct_decl or ext.opcode == .enum_decl) {
// StructDecl/EnumDecl starts with fields_hash[4].
const pi = ext.operand;
for (0..4) |j| skip[pi + j] = true;
}
},
.declaration => {
// Declaration starts with src_hash[4].
const pi = ref_datas[i].declaration.payload_index;
for (0..4) |j| skip[pi + j] = true;
},
.func, .func_inferred => {
// Func payload: ret_ty(1) + param_block(1) + body_len(1)
// + trailing ret_ty + body + SrcLocs(3) + proto_hash(4).
const pi = ref_datas[i].pl_node.payload_index;
const ret_ty_raw: u32 = ref.extra[pi];
const ret_body_len: u32 = ret_ty_raw & 0x7FFFFFFF;
const body_len: u32 = ref.extra[pi + 2];
// ret_ty trailing: if body_len > 1, it's a body; if == 1, it's a ref; if 0, void.
const ret_trailing: u32 = if (ret_body_len > 1) ret_body_len else if (ret_body_len == 1) 1 else 0;
// proto_hash is at: pi + 3 + ret_trailing + body_len + 3
if (body_len > 0) {
const hash_start = pi + 3 + ret_trailing + body_len + 3;
for (0..4) |j| {
if (hash_start + j < ref_extra_len)
skip[hash_start + j] = true;
}
}
},
else => {},
}
}
return skip;
}
test "astgen: empty source" {
const gpa = std.testing.allocator;
const source: [:0]const u8 = "";
var ref_zir = try refZir(gpa, source);
defer ref_zir.deinit(gpa);
var c_ast = c.astParse(source.ptr, @intCast(source.len));
defer c.astDeinit(&c_ast);
var c_zir = c.astGen(&c_ast);
defer c.zirDeinit(&c_zir);
try expectEqualZir(gpa, ref_zir, c_zir);
}
test "astgen: comptime {}" {
const gpa = std.testing.allocator;
const source: [:0]const u8 = "comptime {}";
var ref_zir = try refZir(gpa, source);
defer ref_zir.deinit(gpa);
var c_ast = c.astParse(source.ptr, @intCast(source.len));
defer c.astDeinit(&c_ast);
var c_zir = c.astGen(&c_ast);
defer c.zirDeinit(&c_zir);
try expectEqualZir(gpa, ref_zir, c_zir);
}
test "astgen: const x = 0;" {
const gpa = std.testing.allocator;
const source: [:0]const u8 = "const x = 0;";
var ref_zir = try refZir(gpa, source);
defer ref_zir.deinit(gpa);
var c_ast = c.astParse(source.ptr, @intCast(source.len));
defer c.astDeinit(&c_ast);
var c_zir = c.astGen(&c_ast);
defer c.zirDeinit(&c_zir);
try expectEqualZir(gpa, ref_zir, c_zir);
}
test "astgen: const x = 1;" {
const gpa = std.testing.allocator;
const source: [:0]const u8 = "const x = 1;";
var ref_zir = try refZir(gpa, source);
defer ref_zir.deinit(gpa);
var c_ast = c.astParse(source.ptr, @intCast(source.len));
defer c.astDeinit(&c_ast);
var c_zir = c.astGen(&c_ast);
defer c.zirDeinit(&c_zir);
try expectEqualZir(gpa, ref_zir, c_zir);
}
test "astgen: const x = 0; const y = 0;" {
const gpa = std.testing.allocator;
const source: [:0]const u8 = "const x = 0; const y = 0;";
var ref_zir = try refZir(gpa, source);
defer ref_zir.deinit(gpa);
var c_ast = c.astParse(source.ptr, @intCast(source.len));
defer c.astDeinit(&c_ast);
var c_zir = c.astGen(&c_ast);
defer c.zirDeinit(&c_zir);
try expectEqualZir(gpa, ref_zir, c_zir);
}
test "astgen: field_access" {
const gpa = std.testing.allocator;
const source: [:0]const u8 = "const std = @import(\"std\");\nconst mem = std.mem;";
var ref_zir = try refZir(gpa, source);
defer ref_zir.deinit(gpa);
var c_ast = c.astParse(source.ptr, @intCast(source.len));
defer c.astDeinit(&c_ast);
var c_zir = c.astGen(&c_ast);
defer c.zirDeinit(&c_zir);
try expectEqualZir(gpa, ref_zir, c_zir);
}
test "astgen: addr array init" {
const gpa = std.testing.allocator;
const source: [:0]const u8 = "const x = &[_][]const u8{\"a\",\"b\"};";
var ref_zir = try refZir(gpa, source);
defer ref_zir.deinit(gpa);
var c_ast = c.astParse(source.ptr, @intCast(source.len));
defer c.astDeinit(&c_ast);
var c_zir = c.astGen(&c_ast);
defer c.zirDeinit(&c_zir);
try expectEqualZir(gpa, ref_zir, c_zir);
}
test "astgen: test empty body" {
const gpa = std.testing.allocator;
const source: [:0]const u8 = "test \"t\" {}";
var ref_zir = try refZir(gpa, source);
defer ref_zir.deinit(gpa);
var c_ast = c.astParse(source.ptr, @intCast(source.len));
defer c.astDeinit(&c_ast);
var c_zir = c.astGen(&c_ast);
defer c.zirDeinit(&c_zir);
try expectEqualZir(gpa, ref_zir, c_zir);
}
test "astgen: test_all.zig" {
const gpa = std.testing.allocator;
const source: [:0]const u8 = @embedFile("test_all.zig");
var ref_zir = try refZir(gpa, source);
defer ref_zir.deinit(gpa);
var c_ast = c.astParse(source.ptr, @intCast(source.len));
defer c.astDeinit(&c_ast);
var c_zir = c.astGen(&c_ast);
defer c.zirDeinit(&c_zir);
try expectEqualZir(gpa, ref_zir, c_zir);
}
test "astgen: @import" {
const gpa = std.testing.allocator;
const source: [:0]const u8 = "const std = @import(\"std\");";
var ref_zir = try refZir(gpa, source);
defer ref_zir.deinit(gpa);
var c_ast = c.astParse(source.ptr, @intCast(source.len));
defer c.astDeinit(&c_ast);
var c_zir = c.astGen(&c_ast);
defer c.zirDeinit(&c_zir);
try expectEqualZir(gpa, ref_zir, c_zir);
}
fn expectEqualZir(gpa: Allocator, ref: Zir, got: c.Zir) !void {
const ref_len: u32 = @intCast(ref.instructions.len);
const ref_tags = ref.instructions.items(.tag);
const ref_datas = ref.instructions.items(.data);
// 1. Compare lengths.
try std.testing.expectEqual(ref_len, got.inst_len);
// 2. Compare instruction tags.
for (0..ref_len) |i| {
const ref_tag: u8 = @intFromEnum(ref_tags[i]);
const got_tag: u8 = @intCast(got.inst_tags[i]);
if (ref_tag != got_tag) {
std.debug.print(
"inst_tags[{d}] mismatch: ref={d} got={d}\n",
.{ i, ref_tag, got_tag },
);
return error.TestExpectedEqual;
}
}
// 3. Compare instruction data field-by-field.
for (0..ref_len) |i| {
try expectEqualData(i, ref_tags[i], ref_datas[i], got.inst_datas[i]);
}
// 4. Compare string bytes.
const ref_sb_len: u32 = @intCast(ref.string_bytes.len);
try std.testing.expectEqual(ref_sb_len, got.string_bytes_len);
for (0..ref_sb_len) |i| {
if (ref.string_bytes[i] != got.string_bytes[i]) {
std.debug.print(
"string_bytes[{d}] mismatch: ref=0x{x:0>2} got=0x{x:0>2}\n",
.{ i, ref.string_bytes[i], got.string_bytes[i] },
);
return error.TestExpectedEqual;
}
}
// 5. Compare extra data (skipping hash positions).
const skip = try buildHashSkipMask(gpa, ref);
defer gpa.free(skip);
const ref_extra_len: u32 = @intCast(ref.extra.len);
try std.testing.expectEqual(ref_extra_len, got.extra_len);
for (0..ref_extra_len) |i| {
if (skip[i]) continue;
if (ref.extra[i] != got.extra[i]) {
// Show first 10 extra diffs.
var count: u32 = 0;
for (0..ref_extra_len) |j| {
if (!skip[j] and ref.extra[j] != got.extra[j]) {
std.debug.print(
"extra[{d}] mismatch: ref={d} got={d}\n",
.{ j, ref.extra[j], got.extra[j] },
);
count += 1;
if (count >= 10) break;
}
}
return error.TestExpectedEqual;
}
}
}
/// Compare a single instruction's data, dispatching by tag.
/// Zig's Data union has no guaranteed in-memory layout, so we
/// compare each variant's fields individually.
fn expectEqualData(
idx: usize,
tag: Zir.Inst.Tag,
ref: Zir.Inst.Data,
got: c.ZirInstData,
) !void {
switch (tag) {
.extended => {
const r = ref.extended;
const g = got.extended;
// Some extended opcodes have undefined/unused small+operand.
const skip_data = switch (r.opcode) {
.dbg_empty_stmt, .astgen_error => true,
else => false,
};
const skip_small = switch (r.opcode) {
.add_with_overflow,
.sub_with_overflow,
.mul_with_overflow,
.shl_with_overflow,
.restore_err_ret_index,
.branch_hint,
=> true,
else => false,
};
if (@intFromEnum(r.opcode) != g.opcode or
(!skip_data and !skip_small and r.small != g.small) or
(!skip_data and r.operand != g.operand))
{
std.debug.print(
"inst_datas[{d}] (extended) mismatch:\n" ++
" ref: opcode={d} small=0x{x:0>4} operand={d}\n" ++
" got: opcode={d} small=0x{x:0>4} operand={d}\n",
.{
idx,
@intFromEnum(r.opcode),
r.small,
r.operand,
g.opcode,
g.small,
g.operand,
},
);
return error.TestExpectedEqual;
}
},
.declaration => {
const r = ref.declaration;
const g = got.declaration;
if (@intFromEnum(r.src_node) != g.src_node or
r.payload_index != g.payload_index)
{
std.debug.print(
"inst_datas[{d}] (declaration) mismatch:\n" ++
" ref: src_node={d} payload_index={d}\n" ++
" got: src_node={d} payload_index={d}\n",
.{
idx,
@intFromEnum(r.src_node),
r.payload_index,
g.src_node,
g.payload_index,
},
);
return error.TestExpectedEqual;
}
},
.break_inline => {
const r = ref.@"break";
const g = got.break_data;
if (@intFromEnum(r.operand) != g.operand or
r.payload_index != g.payload_index)
{
std.debug.print(
"inst_datas[{d}] (break_inline) mismatch:\n" ++
" ref: operand={d} payload_index={d}\n" ++
" got: operand={d} payload_index={d}\n",
.{
idx,
@intFromEnum(r.operand),
r.payload_index,
g.operand,
g.payload_index,
},
);
return error.TestExpectedEqual;
}
},
.import => {
const r = ref.pl_tok;
const g = got.pl_tok;
if (@intFromEnum(r.src_tok) != g.src_tok or
r.payload_index != g.payload_index)
{
std.debug.print(
"inst_datas[{d}] (import) mismatch:\n" ++
" ref: src_tok={d} payload_index={d}\n" ++
" got: src_tok={d} payload_index={d}\n",
.{
idx,
@intFromEnum(r.src_tok),
r.payload_index,
g.src_tok,
g.payload_index,
},
);
return error.TestExpectedEqual;
}
},
.dbg_stmt => {
const r = ref.dbg_stmt;
const g = got.dbg_stmt;
if (r.line != g.line or r.column != g.column) {
std.debug.print(
"inst_datas[{d}] (dbg_stmt) mismatch:\n" ++
" ref: line={d} column={d}\n" ++
" got: line={d} column={d}\n",
.{ idx, r.line, r.column, g.line, g.column },
);
return error.TestExpectedEqual;
}
},
.ensure_result_non_error,
.restore_err_ret_index_unconditional,
.validate_struct_init_ty,
.validate_struct_init_result_ty,
.struct_init_empty_result,
.struct_init_empty,
.struct_init_empty_ref_result,
=> {
const r = ref.un_node;
const g = got.un_node;
if (@intFromEnum(r.src_node) != g.src_node or
@intFromEnum(r.operand) != g.operand)
{
std.debug.print(
"inst_datas[{d}] ({s}) mismatch:\n" ++
" ref: src_node={d} operand={d}\n" ++
" got: src_node={d} operand={d}\n",
.{
idx,
@tagName(tag),
@intFromEnum(r.src_node),
@intFromEnum(r.operand),
g.src_node,
g.operand,
},
);
return error.TestExpectedEqual;
}
},
.ret_implicit => {
const r = ref.un_tok;
const g = got.un_tok;
if (@intFromEnum(r.src_tok) != g.src_tok or
@intFromEnum(r.operand) != g.operand)
{
std.debug.print(
"inst_datas[{d}] (ret_implicit) mismatch:\n" ++
" ref: src_tok={d} operand={d}\n" ++
" got: src_tok={d} operand={d}\n",
.{
idx,
@intFromEnum(r.src_tok),
@intFromEnum(r.operand),
g.src_tok,
g.operand,
},
);
return error.TestExpectedEqual;
}
},
.func,
.func_inferred,
.array_type,
.array_type_sentinel,
.array_cat,
.array_init,
.array_init_ref,
.error_set_decl,
.struct_init_field_type,
.struct_init,
.struct_init_ref,
.validate_array_init_ref_ty,
.validate_array_init_ty,
=> {
const r = ref.pl_node;
const g = got.pl_node;
if (@intFromEnum(r.src_node) != g.src_node or
r.payload_index != g.payload_index)
{
std.debug.print(
"inst_datas[{d}] ({s}) mismatch:\n" ++
" ref: src_node={d} payload_index={d}\n" ++
" got: src_node={d} payload_index={d}\n",
.{
idx,
@tagName(tag),
@intFromEnum(r.src_node),
r.payload_index,
g.src_node,
g.payload_index,
},
);
return error.TestExpectedEqual;
}
},
.decl_val, .decl_ref => {
const r = ref.str_tok;
const g = got.str_tok;
if (@intFromEnum(r.start) != g.start or @intFromEnum(r.src_tok) != g.src_tok) {
std.debug.print(
"inst_datas[{d}] ({s}) mismatch:\n" ++
" ref: start={d} src_tok={d}\n" ++
" got: start={d} src_tok={d}\n",
.{
idx,
@tagName(tag),
@intFromEnum(r.start),
@intFromEnum(r.src_tok),
g.start,
g.src_tok,
},
);
return error.TestExpectedEqual;
}
},
.field_val, .field_ptr, .field_val_named, .field_ptr_named => {
const r = ref.pl_node;
const g = got.pl_node;
if (@intFromEnum(r.src_node) != g.src_node or
r.payload_index != g.payload_index)
{
std.debug.print(
"inst_datas[{d}] ({s}) mismatch:\n" ++
" ref: src_node={d} payload_index={d}\n" ++
" got: src_node={d} payload_index={d}\n",
.{
idx,
@tagName(tag),
@intFromEnum(r.src_node),
r.payload_index,
g.src_node,
g.payload_index,
},
);
return error.TestExpectedEqual;
}
},
.int => {
if (ref.int != got.int_val) {
std.debug.print(
"inst_datas[{d}] (int) mismatch: ref={d} got={d}\n",
.{ idx, ref.int, got.int_val },
);
return error.TestExpectedEqual;
}
},
.ptr_type => {
// Compare ptr_type data: flags, size, payload_index.
if (@as(u8, @bitCast(ref.ptr_type.flags)) != got.ptr_type.flags or
@intFromEnum(ref.ptr_type.size) != got.ptr_type.size or
ref.ptr_type.payload_index != got.ptr_type.payload_index)
{
std.debug.print(
"inst_datas[{d}] (ptr_type) mismatch:\n" ++
" ref: flags=0x{x} size={d} pi={d}\n" ++
" got: flags=0x{x} size={d} pi={d}\n",
.{
idx,
@as(u8, @bitCast(ref.ptr_type.flags)),
@intFromEnum(ref.ptr_type.size),
ref.ptr_type.payload_index,
got.ptr_type.flags,
got.ptr_type.size,
got.ptr_type.payload_index,
},
);
return error.TestExpectedEqual;
}
},
.int_type => {
const r = ref.int_type;
const g = got.int_type;
if (@intFromEnum(r.src_node) != g.src_node or
@intFromEnum(r.signedness) != g.signedness or
r.bit_count != g.bit_count)
{
std.debug.print(
"inst_datas[{d}] (int_type) mismatch\n",
.{idx},
);
return error.TestExpectedEqual;
}
},
.str => {
const r = ref.str;
const g = got.str;
if (@intFromEnum(r.start) != g.start or r.len != g.len) {
std.debug.print(
"inst_datas[{d}] (str) mismatch:\n" ++
" ref: start={d} len={d}\n" ++
" got: start={d} len={d}\n",
.{ idx, @intFromEnum(r.start), r.len, g.start, g.len },
);
return error.TestExpectedEqual;
}
},
else => {
// Generic raw comparison: treat data as two u32 words.
// Tags using .node data format have undefined second word.
const ref_raw = @as([*]const u32, @ptrCast(&ref));
const got_raw = @as([*]const u32, @ptrCast(&got));
// Tags where only the first u32 word is meaningful
// (second word is padding/undefined).
const first_word_only = switch (tag) {
// .node data format (single i32):
.repeat,
.repeat_inline,
.ret_ptr,
.ret_type,
.trap,
.alloc_inferred,
.alloc_inferred_mut,
.alloc_inferred_comptime,
.alloc_inferred_comptime_mut,
// .@"unreachable" data format (src_node + padding):
.@"unreachable",
// .save_err_ret_index data format (operand only):
.save_err_ret_index,
=> true,
else => false,
};
const w1_match = ref_raw[0] == got_raw[0];
const w2_match = first_word_only or ref_raw[1] == got_raw[1];
if (!w1_match or !w2_match) {
std.debug.print(
"inst_datas[{d}] ({s}) raw mismatch:\n" ++
" ref: 0x{x:0>8} 0x{x:0>8}\n" ++
" got: 0x{x:0>8} 0x{x:0>8}\n",
.{
idx,
@tagName(tag),
ref_raw[0],
ref_raw[1],
got_raw[0],
got_raw[1],
},
);
return error.TestExpectedEqual;
}
},
}
}
const corpus_files = .{
.{ "astgen_test.zig", @embedFile("astgen_test.zig") },
.{ "build.zig", @embedFile("build.zig") },
.{ "parser_test.zig", @embedFile("parser_test.zig") },
.{ "test_all.zig", @embedFile("test_all.zig") },
.{ "tokenizer_test.zig", @embedFile("tokenizer_test.zig") },
};
fn corpusCheck(gpa: Allocator, source: [:0]const u8) !void {
var tree = try Ast.parse(gpa, source, .zig);
defer tree.deinit(gpa);
var ref_zir = try AstGen.generate(gpa, tree);
defer ref_zir.deinit(gpa);
var c_ast = c.astParse(source.ptr, @intCast(source.len));
defer c.astDeinit(&c_ast);
var c_zir = c.astGen(&c_ast);
defer c.zirDeinit(&c_zir);
if (c_zir.has_compile_errors) {
std.debug.print("C port returned compile errors (inst_len={d})\n", .{c_zir.inst_len});
return error.TestUnexpectedResult;
}
try expectEqualZir(gpa, ref_zir, c_zir);
}
test "astgen: struct single field" {
const gpa = std.testing.allocator;
const source: [:0]const u8 = "const T = struct { x: u32 };";
var ref_zir = try refZir(gpa, source);
defer ref_zir.deinit(gpa);
var c_ast = c.astParse(source.ptr, @intCast(source.len));
defer c.astDeinit(&c_ast);
var c_zir = c.astGen(&c_ast);
defer c.zirDeinit(&c_zir);
try expectEqualZir(gpa, ref_zir, c_zir);
}
test "astgen: struct multiple fields" {
const gpa = std.testing.allocator;
const source: [:0]const u8 = "const T = struct { x: u32, y: bool };";
var ref_zir = try refZir(gpa, source);
defer ref_zir.deinit(gpa);
var c_ast = c.astParse(source.ptr, @intCast(source.len));
defer c.astDeinit(&c_ast);
var c_zir = c.astGen(&c_ast);
defer c.zirDeinit(&c_zir);
try expectEqualZir(gpa, ref_zir, c_zir);
}
test "astgen: struct field with default" {
const gpa = std.testing.allocator;
const source: [:0]const u8 = "const T = struct { x: u32 = 0 };";
var ref_zir = try refZir(gpa, source);
defer ref_zir.deinit(gpa);
var c_ast = c.astParse(source.ptr, @intCast(source.len));
defer c.astDeinit(&c_ast);
var c_zir = c.astGen(&c_ast);
defer c.zirDeinit(&c_zir);
try expectEqualZir(gpa, ref_zir, c_zir);
}
test "astgen: struct field with align" {
const gpa = std.testing.allocator;
const source: [:0]const u8 = "const T = struct { x: u32 align(4) };";
var ref_zir = try refZir(gpa, source);
defer ref_zir.deinit(gpa);
var c_ast = c.astParse(source.ptr, @intCast(source.len));
defer c.astDeinit(&c_ast);
var c_zir = c.astGen(&c_ast);
defer c.zirDeinit(&c_zir);
try expectEqualZir(gpa, ref_zir, c_zir);
}
test "astgen: struct comptime field" {
const gpa = std.testing.allocator;
const source: [:0]const u8 = "const T = struct { comptime x: u32 = 0 };";
var ref_zir = try refZir(gpa, source);
defer ref_zir.deinit(gpa);
var c_ast = c.astParse(source.ptr, @intCast(source.len));
defer c.astDeinit(&c_ast);
var c_zir = c.astGen(&c_ast);
defer c.zirDeinit(&c_zir);
try expectEqualZir(gpa, ref_zir, c_zir);
}
test "astgen: empty error set" {
const gpa = std.testing.allocator;
const source: [:0]const u8 = "const E = error{};";
var ref_zir = try refZir(gpa, source);
defer ref_zir.deinit(gpa);
var c_ast = c.astParse(source.ptr, @intCast(source.len));
defer c.astDeinit(&c_ast);
var c_zir = c.astGen(&c_ast);
defer c.zirDeinit(&c_zir);
try expectEqualZir(gpa, ref_zir, c_zir);
}
test "astgen: error set with members" {
const gpa = std.testing.allocator;
const source: [:0]const u8 = "const E = error{ OutOfMemory, OutOfTime };";
var ref_zir = try refZir(gpa, source);
defer ref_zir.deinit(gpa);
var c_ast = c.astParse(source.ptr, @intCast(source.len));
defer c.astDeinit(&c_ast);
var c_zir = c.astGen(&c_ast);
defer c.zirDeinit(&c_zir);
try expectEqualZir(gpa, ref_zir, c_zir);
}
test "astgen: extern var" {
const gpa = std.testing.allocator;
const source: [:0]const u8 = "extern var x: u32;";
var ref_zir = try refZir(gpa, source);
defer ref_zir.deinit(gpa);
var c_ast = c.astParse(source.ptr, @intCast(source.len));
defer c.astDeinit(&c_ast);
var c_zir = c.astGen(&c_ast);
defer c.zirDeinit(&c_zir);
try expectEqualZir(gpa, ref_zir, c_zir);
}
test "astgen: corpus test_all.zig" {
const gpa = std.testing.allocator;
try corpusCheck(gpa, @embedFile("test_all.zig"));
}
test "astgen: corpus build.zig" {
const gpa = std.testing.allocator;
try corpusCheck(gpa, @embedFile("build.zig"));
}
test "astgen: corpus tokenizer_test.zig" {
const gpa = std.testing.allocator;
try corpusCheck(gpa, @embedFile("tokenizer_test.zig"));
}
test "astgen: corpus parser_test.zig" {
// TODO: 10+ extra data mismatches (ref=48 got=32, bit 4 = propagate_error_trace)
// in call instruction flags — ctx propagation differs from upstream.
if (true) return error.SkipZigTest;
const gpa = std.testing.allocator;
try corpusCheck(gpa, @embedFile("parser_test.zig"));
}
test "astgen: corpus astgen_test.zig" {
const gpa = std.testing.allocator;
try corpusCheck(gpa, @embedFile("astgen_test.zig"));
}
test "astgen: enum decl" {
const gpa = std.testing.allocator;
const source: [:0]const u8 = "const E = enum { a, b, c };";
var ref_zir = try refZir(gpa, source);
defer ref_zir.deinit(gpa);
var c_ast = c.astParse(source.ptr, @intCast(source.len));
defer c.astDeinit(&c_ast);
var c_zir = c.astGen(&c_ast);
defer c.zirDeinit(&c_zir);
try expectEqualZir(gpa, ref_zir, c_zir);
}
test "astgen: struct init typed" {
const gpa = std.testing.allocator;
const source: [:0]const u8 =
\\const T = struct { x: u32 };
\\const v = T{ .x = 1 };
;
var ref_zir = try refZir(gpa, source);
defer ref_zir.deinit(gpa);
var c_ast = c.astParse(source.ptr, @intCast(source.len));
defer c.astDeinit(&c_ast);
var c_zir = c.astGen(&c_ast);
defer c.zirDeinit(&c_zir);
try expectEqualZir(gpa, ref_zir, c_zir);
}
test "astgen: corpus" {
if (true) return error.SkipZigTest; // TODO: parser_test.zig fails
const gpa = std.testing.allocator;
var any_fail = false;
inline for (corpus_files) |entry| {
corpusCheck(gpa, entry[1]) catch {
any_fail = true;
};
}
if (any_fail) return error.ZirMismatch;
}

248
stage0/build.zig Normal file
View File

@@ -0,0 +1,248 @@
const std = @import("std");
const builtin = @import("builtin");
const headers = &[_][]const u8{
"common.h",
"ast.h",
"parser.h",
"zir.h",
"astgen.h",
};
const c_lib_files = &[_][]const u8{
"tokenizer.c",
"ast.c",
"zig0.c",
"parser.c",
"zir.c",
"astgen.c",
};
const all_c_files = c_lib_files ++ &[_][]const u8{"main.c"};
const cflags = &[_][]const u8{
"-std=c11",
"-Wall",
"-Wvla",
"-Wextra",
"-Werror",
"-Wshadow",
"-Wswitch",
"-Walloca",
"-Wformat=2",
"-fno-common",
"-Wconversion",
"-Wuninitialized",
"-Wdouble-promotion",
"-fstack-protector-all",
"-Wimplicit-fallthrough",
"-Wno-unused-function", // TODO remove once refactoring is done
//"-D_FORTIFY_SOURCE=2", // consider when optimization flags are enabled
};
const compilers = &[_][]const u8{ "zig", "clang", "gcc", "tcc" };
pub fn build(b: *std.Build) !void {
const optimize = b.standardOptimizeOption(.{});
const cc = b.option([]const u8, "cc", "C compiler") orelse "zig";
const no_exec = b.option(bool, "no-exec", "Compile test binary without running it") orelse false;
const valgrind = b.option(bool, "valgrind", "Run tests under valgrind") orelse false;
const test_timeout = b.option([]const u8, "test-timeout", "Test execution timeout (default: 10s, none with valgrind)");
const target = blk: {
var query = b.standardTargetOptionsQueryOnly(.{});
if (valgrind) {
const arch = query.cpu_arch orelse builtin.cpu.arch;
if (arch == .x86_64) {
query.cpu_features_sub.addFeature(@intFromEnum(std.Target.x86.Feature.avx512f));
}
}
break :blk b.resolveTargetQuery(query);
};
const test_step = b.step("test", "Run unit tests");
addTestStep(b, test_step, target, optimize, cc, no_exec, valgrind, test_timeout);
const fmt_step = b.step("fmt", "clang-format");
const clang_format = b.addSystemCommand(&.{ "clang-format", "-i" });
for (all_c_files ++ headers) |f| clang_format.addFileArg(b.path(f));
fmt_step.dependOn(&clang_format.step);
const lint_step = b.step("lint", "Run linters");
for (all_c_files) |cfile| {
const clang_analyze = b.addSystemCommand(&.{
"clang",
"--analyze",
"--analyzer-output",
"text",
"-Wno-unused-command-line-argument",
"-Werror",
// false positive in astgen.c comptimeDecl: analyzer cannot track
// scratch_instructions ownership through pointer parameters.
"-Xclang",
"-analyzer-disable-checker",
"-Xclang",
"unix.Malloc",
});
clang_analyze.addFileArg(b.path(cfile));
clang_analyze.expectExitCode(0);
lint_step.dependOn(&clang_analyze.step);
// TODO(motiejus) re-enable once project
// nears completion. Takes too long for comfort.
//const gcc_analyze = b.addSystemCommand(&.{
// "gcc",
// "-c",
// "--analyzer",
// "-Werror",
// "-o",
// "/dev/null",
//});
//gcc_analyze.addFileArg(b.path(cfile));
//gcc_analyze.expectExitCode(0);
//lint_step.dependOn(&gcc_analyze.step);
const cppcheck = b.addSystemCommand(&.{
"cppcheck",
"--quiet",
"--error-exitcode=1",
"--check-level=exhaustive",
"--enable=all",
"--inline-suppr",
"--suppress=missingIncludeSystem",
"--suppress=checkersReport",
"--suppress=unusedFunction", // TODO remove after plumbing is done
"--suppress=unusedStructMember", // TODO remove after plumbing is done
"--suppress=unmatchedSuppression",
});
cppcheck.addFileArg(b.path(cfile));
cppcheck.expectExitCode(0);
lint_step.dependOn(&cppcheck.step);
}
const fmt_check = b.addSystemCommand(&.{ "clang-format", "--dry-run", "-Werror" });
for (all_c_files ++ headers) |f| fmt_check.addFileArg(b.path(f));
fmt_check.expectExitCode(0);
b.default_step.dependOn(&fmt_check.step);
for (compilers) |compiler| {
addTestStep(b, b.default_step, target, optimize, compiler, false, valgrind, test_timeout);
}
const all_step = b.step("all", "Run fmt check, lint, and tests with all compilers");
all_step.dependOn(b.default_step);
all_step.dependOn(lint_step);
}
fn addTestStep(
b: *std.Build,
step: *std.Build.Step,
target: std.Build.ResolvedTarget,
optimize: std.builtin.OptimizeMode,
cc: []const u8,
no_exec: bool,
valgrind: bool,
test_timeout: ?[]const u8,
) void {
const test_mod = b.createModule(.{
.root_source_file = b.path("test_all.zig"),
.optimize = optimize,
.target = target,
});
test_mod.addIncludePath(b.path("."));
// TODO(zig 0.16+): remove this if block entirely; keep only the addLibrary branch.
// Also delete addCObjectsDirectly.
// Zig 0.15's ELF archive parser fails on archives containing odd-sized objects
// (off-by-one after 2-byte alignment). This is fixed on zig master/0.16.
if (comptime builtin.zig_version.order(.{ .major = 0, .minor = 16, .patch = 0 }) == .lt) {
addCObjectsDirectly(b, test_mod, cc, optimize);
} else {
const lib_mod = b.createModule(.{
.optimize = optimize,
.target = target,
.link_libc = true,
});
const lib = b.addLibrary(.{
.name = b.fmt("zig0-{s}", .{cc}),
.root_module = lib_mod,
});
addCSources(b, lib.root_module, cc, optimize);
test_mod.linkLibrary(lib);
}
const test_exe = b.addTest(.{
.root_module = test_mod,
.use_llvm = false,
.use_lld = false,
});
const timeout: ?[]const u8 = test_timeout orelse if (valgrind) null else "10";
if (valgrind) {
if (timeout) |t|
test_exe.setExecCmd(&.{
"timeout",
t,
"valgrind",
"--error-exitcode=2",
"--leak-check=full",
"--show-leak-kinds=all",
"--errors-for-leak-kinds=all",
"--track-fds=yes",
null,
})
else
test_exe.setExecCmd(&.{
"valgrind",
"--error-exitcode=2",
"--leak-check=full",
"--show-leak-kinds=all",
"--errors-for-leak-kinds=all",
"--track-fds=yes",
null,
});
} else {
test_exe.setExecCmd(&.{ "timeout", timeout orelse "10", null });
}
if (no_exec) {
const install = b.addInstallArtifact(test_exe, .{});
step.dependOn(&install.step);
} else {
step.dependOn(&b.addRunArtifact(test_exe).step);
}
}
fn addCSources(
b: *std.Build,
mod: *std.Build.Module,
cc: []const u8,
optimize: std.builtin.OptimizeMode,
) void {
if (std.mem.eql(u8, cc, "zig")) {
mod.addCSourceFiles(.{ .files = c_lib_files, .flags = cflags });
} else for (c_lib_files) |cfile| {
const cc1 = b.addSystemCommand(&.{cc});
cc1.addArgs(cflags ++ .{"-g"});
cc1.addArg(switch (optimize) {
.Debug => "-O0",
.ReleaseFast, .ReleaseSafe => "-O3",
.ReleaseSmall => "-Os",
});
cc1.addArg("-c");
cc1.addFileArg(b.path(cfile));
cc1.addArg("-o");
mod.addObjectFile(cc1.addOutputFileArg(b.fmt("{s}.o", .{cfile[0 .. cfile.len - 2]})));
}
}
// TODO(zig 0.16+): delete this function.
fn addCObjectsDirectly(
b: *std.Build,
mod: *std.Build.Module,
cc: []const u8,
optimize: std.builtin.OptimizeMode,
) void {
addCSources(b, mod, cc, optimize);
mod.linkSystemLibrary("c", .{});
}

143
stage0/check_test_order.py Normal file
View File

@@ -0,0 +1,143 @@
#!/usr/bin/env python3
"""Check and optionally fix test order in parser_test.zig to match upstream."""
import re
import sys
OURS = "parser_test.zig"
UPSTREAM = "../zig/lib/std/zig/parser_test.zig"
def extract_test_names(path):
with open(path) as f:
return re.findall(r'^test "(.+?)" \{', f.read(), re.M)
def extract_test_blocks(path):
"""Split file into: header, list of (name, content) test blocks, footer."""
with open(path) as f:
lines = f.readlines()
header = []
footer = []
blocks = []
current_name = None
current_lines = []
brace_depth = 0
in_test = False
found_first_test = False
for line in lines:
m = re.match(r'^test "(.+?)" \{', line)
if m and not in_test:
found_first_test = True
if current_name is not None:
blocks.append((current_name, "".join(current_lines)))
current_name = m.group(1)
current_lines = [line]
brace_depth = 1
in_test = True
continue
if in_test:
current_lines.append(line)
brace_depth += line.count("{") - line.count("}")
if brace_depth == 0:
in_test = False
elif not found_first_test:
header.append(line)
else:
# Non-test content after tests started — could be blank lines
# between tests or footer content
if current_name is not None:
# Append to previous test block as trailing content
current_lines.append(line)
else:
footer.append(line)
if current_name is not None:
blocks.append((current_name, "".join(current_lines)))
# Anything after the last test block is footer
# Split last block's trailing non-test content into footer
if blocks:
last_name, last_content = blocks[-1]
last_lines = last_content.split('\n')
# Find where the test block ends (} at column 0)
test_end = len(last_lines)
for i, line in enumerate(last_lines):
if line == '}' and i > 0:
test_end = i + 1
if test_end < len(last_lines):
blocks[-1] = (last_name, '\n'.join(last_lines[:test_end]) + '\n')
footer = ['\n'.join(last_lines[test_end:]) + '\n'] + footer
return "".join(header), blocks, "".join(footer)
def main():
fix = "--fix" in sys.argv
upstream_order = extract_test_names(UPSTREAM)
our_names = extract_test_names(OURS)
# Build position map for upstream
upstream_pos = {name: i for i, name in enumerate(upstream_order)}
# Check order
our_in_upstream = [n for n in our_names if n in upstream_pos]
positions = [upstream_pos[n] for n in our_in_upstream]
is_sorted = positions == sorted(positions)
if is_sorted:
print(f"OK: {len(our_names)} tests in correct order")
return 0
# Find out-of-order tests
out_of_order = []
prev_pos = -1
for name in our_in_upstream:
pos = upstream_pos[name]
if pos < prev_pos:
out_of_order.append(name)
prev_pos = max(prev_pos, pos)
print(f"WARN: {len(out_of_order)} tests out of order:")
for name in out_of_order[:10]:
print(f" - {name}")
if len(out_of_order) > 10:
print(f" ... and {len(out_of_order) - 10} more")
if not fix:
print("\nRun with --fix to reorder")
return 1
# Fix: reorder
header, blocks, footer = extract_test_blocks(OURS)
block_map = {name: content for name, content in blocks}
# Reorder: upstream-ordered first, then extras
ordered = []
seen = set()
for name in upstream_order:
if name in block_map and name not in seen:
ordered.append((name, block_map[name]))
seen.add(name)
for name, content in blocks:
if name not in seen:
ordered.append((name, content))
seen.add(name)
with open(OURS, "w") as f:
f.write(header)
for _, content in ordered:
f.write("\n")
f.write(content)
f.write(footer)
print(f"Fixed: {len(ordered)} tests reordered")
return 0
if __name__ == "__main__":
sys.exit(main())

54
stage0/common.h Normal file
View File

@@ -0,0 +1,54 @@
// common.h — must be included before any system headers.
#ifndef _ZIG0_COMMON_H__
#define _ZIG0_COMMON_H__
#include <stdint.h>
#include <stdlib.h>
#define SLICE(Type) \
struct Type##Slice { \
uint32_t len; \
uint32_t cap; \
Type* arr; \
}
#define ARR_INIT(Type, initial_cap) \
({ \
Type* arr = calloc(initial_cap, sizeof(Type)); \
if (!arr) \
exit(1); \
arr; \
})
#define SLICE_INIT(Type, initial_cap) \
{ .len = 0, .cap = (initial_cap), .arr = ARR_INIT(Type, initial_cap) }
#define SLICE_RESIZE(Type, slice, new_cap) \
({ \
const uint32_t cap = (new_cap); \
Type* new_arr = realloc((slice)->arr, cap * sizeof(Type)); \
if (new_arr == NULL) { \
free((slice)->arr); \
exit(1); \
} \
(slice)->arr = new_arr; \
(slice)->cap = cap; \
})
#define SLICE_ENSURE_CAPACITY(Type, slice, additional) \
({ \
if ((slice)->len + (additional) > (slice)->cap) { \
SLICE_RESIZE(Type, slice, \
((slice)->cap * 2 > (slice)->len + (additional)) \
? (slice)->cap * 2 \
: (slice)->len + (additional)); \
} \
})
#define SLICE_APPEND(Type, slice, item) \
({ \
SLICE_ENSURE_CAPACITY(Type, slice, 1); \
(slice)->arr[(slice)->len++] = (item); \
})
#endif

39
stage0/main.c Normal file
View File

@@ -0,0 +1,39 @@
#include "common.h"
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
int zig0Run(char* program, char** msg);
int zig0RunFile(char* fname, char** msg);
static void usage(const char* argv0) {
fprintf(stderr, "Usage: %s program.zig\n", argv0);
}
int main(int argc, char** argv) {
if (argc != 2) {
usage(argv[0]);
return 1;
}
char* msg;
switch (zig0RunFile(argv[1], &msg)) {
case 0:
return 0;
break;
case 1:
fprintf(stderr, "panic: %s\n", msg);
free(msg);
return 0;
break;
case 2:
fprintf(stderr, "interpreter error: %s\n", msg);
free(msg);
return 1;
break;
case 3:
return 1;
break;
}
}

3458
stage0/parser.c Normal file

File diff suppressed because it is too large Load Diff

44
stage0/parser.h Normal file
View File

@@ -0,0 +1,44 @@
// parser.h
#ifndef _ZIG0_PARSE_H__
#define _ZIG0_PARSE_H__
#include "ast.h"
#include "common.h"
#include <setjmp.h>
#include <stdbool.h>
#include <stdint.h>
#include <string.h>
typedef struct {
const char* source;
uint32_t source_len;
TokenizerTag* token_tags;
AstIndex* token_starts;
uint32_t tokens_len;
AstTokenIndex tok_i;
AstNodeList nodes;
AstNodeIndexSlice extra_data;
AstNodeIndexSlice scratch;
jmp_buf error_jmp;
char* err_buf;
} Parser;
#define PARSE_ERR_BUF_SIZE 200
_Noreturn static inline void fail(Parser* p, const char* msg) {
size_t len = strlen(msg);
if (len >= PARSE_ERR_BUF_SIZE)
len = PARSE_ERR_BUF_SIZE - 1;
memcpy(p->err_buf, msg, len);
p->err_buf[len] = '\0';
longjmp(p->error_jmp, 1);
}
Parser* parserInit(const char* source, uint32_t len);
void parserDeinit(Parser* parser);
void parseRoot(Parser* parser);
#endif

7021
stage0/parser_test.zig Normal file

File diff suppressed because it is too large Load Diff

5
stage0/test_all.zig Normal file
View File

@@ -0,0 +1,5 @@
test "zig0 test suite" {
_ = @import("tokenizer_test.zig");
_ = @import("parser_test.zig");
_ = @import("astgen_test.zig");
}

1096
stage0/tokenizer.c Normal file

File diff suppressed because it is too large Load Diff

204
stage0/tokenizer.h Normal file
View File

@@ -0,0 +1,204 @@
#ifndef _ZIG0_TOKENIZER_H__
#define _ZIG0_TOKENIZER_H__
#include <stdbool.h>
#include <stdint.h>
#define TOKENIZER_FOREACH_TAG_ENUM(TAG) \
TAG(TOKEN_INVALID) \
TAG(TOKEN_INVALID_PERIODASTERISKS) \
TAG(TOKEN_IDENTIFIER) \
TAG(TOKEN_STRING_LITERAL) \
TAG(TOKEN_MULTILINE_STRING_LITERAL_LINE) \
TAG(TOKEN_CHAR_LITERAL) \
TAG(TOKEN_EOF) \
TAG(TOKEN_BUILTIN) \
TAG(TOKEN_BANG) \
TAG(TOKEN_PIPE) \
TAG(TOKEN_PIPE_PIPE) \
TAG(TOKEN_PIPE_EQUAL) \
TAG(TOKEN_EQUAL) \
TAG(TOKEN_EQUAL_EQUAL) \
TAG(TOKEN_EQUAL_ANGLE_BRACKET_RIGHT) \
TAG(TOKEN_BANG_EQUAL) \
TAG(TOKEN_L_PAREN) \
TAG(TOKEN_R_PAREN) \
TAG(TOKEN_SEMICOLON) \
TAG(TOKEN_PERCENT) \
TAG(TOKEN_PERCENT_EQUAL) \
TAG(TOKEN_L_BRACE) \
TAG(TOKEN_R_BRACE) \
TAG(TOKEN_L_BRACKET) \
TAG(TOKEN_R_BRACKET) \
TAG(TOKEN_PERIOD) \
TAG(TOKEN_PERIOD_ASTERISK) \
TAG(TOKEN_ELLIPSIS2) \
TAG(TOKEN_ELLIPSIS3) \
TAG(TOKEN_CARET) \
TAG(TOKEN_CARET_EQUAL) \
TAG(TOKEN_PLUS) \
TAG(TOKEN_PLUS_PLUS) \
TAG(TOKEN_PLUS_EQUAL) \
TAG(TOKEN_PLUS_PERCENT) \
TAG(TOKEN_PLUS_PERCENT_EQUAL) \
TAG(TOKEN_PLUS_PIPE) \
TAG(TOKEN_PLUS_PIPE_EQUAL) \
TAG(TOKEN_MINUS) \
TAG(TOKEN_MINUS_EQUAL) \
TAG(TOKEN_MINUS_PERCENT) \
TAG(TOKEN_MINUS_PERCENT_EQUAL) \
TAG(TOKEN_MINUS_PIPE) \
TAG(TOKEN_MINUS_PIPE_EQUAL) \
TAG(TOKEN_ASTERISK) \
TAG(TOKEN_ASTERISK_EQUAL) \
TAG(TOKEN_ASTERISK_ASTERISK) \
TAG(TOKEN_ASTERISK_PERCENT) \
TAG(TOKEN_ASTERISK_PERCENT_EQUAL) \
TAG(TOKEN_ASTERISK_PIPE) \
TAG(TOKEN_ASTERISK_PIPE_EQUAL) \
TAG(TOKEN_ARROW) \
TAG(TOKEN_COLON) \
TAG(TOKEN_SLASH) \
TAG(TOKEN_SLASH_EQUAL) \
TAG(TOKEN_COMMA) \
TAG(TOKEN_AMPERSAND) \
TAG(TOKEN_AMPERSAND_EQUAL) \
TAG(TOKEN_QUESTION_MARK) \
TAG(TOKEN_ANGLE_BRACKET_LEFT) \
TAG(TOKEN_ANGLE_BRACKET_LEFT_EQUAL) \
TAG(TOKEN_ANGLE_BRACKET_ANGLE_BRACKET_LEFT) \
TAG(TOKEN_ANGLE_BRACKET_ANGLE_BRACKET_LEFT_EQUAL) \
TAG(TOKEN_ANGLE_BRACKET_ANGLE_BRACKET_LEFT_PIPE) \
TAG(TOKEN_ANGLE_BRACKET_ANGLE_BRACKET_LEFT_PIPE_EQUAL) \
TAG(TOKEN_ANGLE_BRACKET_RIGHT) \
TAG(TOKEN_ANGLE_BRACKET_RIGHT_EQUAL) \
TAG(TOKEN_ANGLE_BRACKET_ANGLE_BRACKET_RIGHT) \
TAG(TOKEN_ANGLE_BRACKET_ANGLE_BRACKET_RIGHT_EQUAL) \
TAG(TOKEN_TILDE) \
TAG(TOKEN_NUMBER_LITERAL) \
TAG(TOKEN_DOC_COMMENT) \
TAG(TOKEN_CONTAINER_DOC_COMMENT) \
TAG(TOKEN_KEYWORD_ADDRSPACE) \
TAG(TOKEN_KEYWORD_ALIGN) \
TAG(TOKEN_KEYWORD_ALLOWZERO) \
TAG(TOKEN_KEYWORD_AND) \
TAG(TOKEN_KEYWORD_ANYFRAME) \
TAG(TOKEN_KEYWORD_ANYTYPE) \
TAG(TOKEN_KEYWORD_ASM) \
TAG(TOKEN_KEYWORD_BREAK) \
TAG(TOKEN_KEYWORD_CALLCONV) \
TAG(TOKEN_KEYWORD_CATCH) \
TAG(TOKEN_KEYWORD_COMPTIME) \
TAG(TOKEN_KEYWORD_CONST) \
TAG(TOKEN_KEYWORD_CONTINUE) \
TAG(TOKEN_KEYWORD_DEFER) \
TAG(TOKEN_KEYWORD_ELSE) \
TAG(TOKEN_KEYWORD_ENUM) \
TAG(TOKEN_KEYWORD_ERRDEFER) \
TAG(TOKEN_KEYWORD_ERROR) \
TAG(TOKEN_KEYWORD_EXPORT) \
TAG(TOKEN_KEYWORD_EXTERN) \
TAG(TOKEN_KEYWORD_FN) \
TAG(TOKEN_KEYWORD_FOR) \
TAG(TOKEN_KEYWORD_IF) \
TAG(TOKEN_KEYWORD_INLINE) \
TAG(TOKEN_KEYWORD_NOALIAS) \
TAG(TOKEN_KEYWORD_NOINLINE) \
TAG(TOKEN_KEYWORD_NOSUSPEND) \
TAG(TOKEN_KEYWORD_OPAQUE) \
TAG(TOKEN_KEYWORD_OR) \
TAG(TOKEN_KEYWORD_ORELSE) \
TAG(TOKEN_KEYWORD_PACKED) \
TAG(TOKEN_KEYWORD_PUB) \
TAG(TOKEN_KEYWORD_RESUME) \
TAG(TOKEN_KEYWORD_RETURN) \
TAG(TOKEN_KEYWORD_LINKSECTION) \
TAG(TOKEN_KEYWORD_STRUCT) \
TAG(TOKEN_KEYWORD_SUSPEND) \
TAG(TOKEN_KEYWORD_SWITCH) \
TAG(TOKEN_KEYWORD_TEST) \
TAG(TOKEN_KEYWORD_THREADLOCAL) \
TAG(TOKEN_KEYWORD_TRY) \
TAG(TOKEN_KEYWORD_UNION) \
TAG(TOKEN_KEYWORD_UNREACHABLE) \
TAG(TOKEN_KEYWORD_VAR) \
TAG(TOKEN_KEYWORD_VOLATILE) \
TAG(TOKEN_KEYWORD_WHILE)
#define TOKENIZER_GENERATE_ENUM(ENUM) ENUM,
#define TOKENIZER_GENERATE_CASE(ENUM) \
case ENUM: \
return #ENUM;
// First define the enum
typedef enum {
TOKENIZER_FOREACH_TAG_ENUM(TOKENIZER_GENERATE_ENUM)
} TokenizerTag;
const char* tokenizerGetTagString(TokenizerTag tag);
typedef enum {
TOKENIZER_STATE_START,
TOKENIZER_STATE_EXPECT_NEWLINE,
TOKENIZER_STATE_IDENTIFIER,
TOKENIZER_STATE_BUILTIN,
TOKENIZER_STATE_STRING_LITERAL,
TOKENIZER_STATE_STRING_LITERAL_BACKSLASH,
TOKENIZER_STATE_MULTILINE_STRING_LITERAL_LINE,
TOKENIZER_STATE_CHAR_LITERAL,
TOKENIZER_STATE_CHAR_LITERAL_BACKSLASH,
TOKENIZER_STATE_BACKSLASH,
TOKENIZER_STATE_EQUAL,
TOKENIZER_STATE_BANG,
TOKENIZER_STATE_PIPE,
TOKENIZER_STATE_MINUS,
TOKENIZER_STATE_MINUS_PERCENT,
TOKENIZER_STATE_MINUS_PIPE,
TOKENIZER_STATE_ASTERISK,
TOKENIZER_STATE_ASTERISK_PERCENT,
TOKENIZER_STATE_ASTERISK_PIPE,
TOKENIZER_STATE_SLASH,
TOKENIZER_STATE_LINE_COMMENT_START,
TOKENIZER_STATE_LINE_COMMENT,
TOKENIZER_STATE_DOC_COMMENT_START,
TOKENIZER_STATE_DOC_COMMENT,
TOKENIZER_STATE_INT,
TOKENIZER_STATE_INT_EXPONENT,
TOKENIZER_STATE_INT_PERIOD,
TOKENIZER_STATE_FLOAT,
TOKENIZER_STATE_FLOAT_EXPONENT,
TOKENIZER_STATE_AMPERSAND,
TOKENIZER_STATE_CARET,
TOKENIZER_STATE_PERCENT,
TOKENIZER_STATE_PLUS,
TOKENIZER_STATE_PLUS_PERCENT,
TOKENIZER_STATE_PLUS_PIPE,
TOKENIZER_STATE_ANGLE_BRACKET_LEFT,
TOKENIZER_STATE_ANGLE_BRACKET_ANGLE_BRACKET_LEFT,
TOKENIZER_STATE_ANGLE_BRACKET_ANGLE_BRACKET_LEFT_PIPE,
TOKENIZER_STATE_ANGLE_BRACKET_RIGHT,
TOKENIZER_STATE_ANGLE_BRACKET_ANGLE_BRACKET_RIGHT,
TOKENIZER_STATE_PERIOD,
TOKENIZER_STATE_PERIOD_2,
TOKENIZER_STATE_PERIOD_ASTERISK,
TOKENIZER_STATE_SAW_AT_SIGN,
TOKENIZER_STATE_INVALID,
} TokenizerState;
typedef struct {
TokenizerTag tag;
struct {
uint32_t start, end;
} loc;
} TokenizerToken;
typedef struct {
const char* buffer;
const uint32_t buffer_len;
uint32_t index;
} Tokenizer;
Tokenizer tokenizerInit(const char* buffer, uint32_t len);
TokenizerToken tokenizerNext(Tokenizer* self);
#endif

767
stage0/tokenizer_test.zig Normal file
View File

@@ -0,0 +1,767 @@
const std = @import("std");
const testing = std.testing;
const Token = std.zig.Token;
const Tokenizer = std.zig.Tokenizer;
const c = @cImport({
@cInclude("tokenizer.h");
});
pub fn zigToken(token: c_uint) Token.Tag {
return switch (token) {
c.TOKEN_INVALID => .invalid,
c.TOKEN_INVALID_PERIODASTERISKS => .invalid_periodasterisks,
c.TOKEN_IDENTIFIER => .identifier,
c.TOKEN_STRING_LITERAL => .string_literal,
c.TOKEN_MULTILINE_STRING_LITERAL_LINE => .multiline_string_literal_line,
c.TOKEN_CHAR_LITERAL => .char_literal,
c.TOKEN_EOF => .eof,
c.TOKEN_BUILTIN => .builtin,
c.TOKEN_BANG => .bang,
c.TOKEN_PIPE => .pipe,
c.TOKEN_PIPE_PIPE => .pipe_pipe,
c.TOKEN_PIPE_EQUAL => .pipe_equal,
c.TOKEN_EQUAL => .equal,
c.TOKEN_EQUAL_EQUAL => .equal_equal,
c.TOKEN_EQUAL_ANGLE_BRACKET_RIGHT => .equal_angle_bracket_right,
c.TOKEN_BANG_EQUAL => .bang_equal,
c.TOKEN_L_PAREN => .l_paren,
c.TOKEN_R_PAREN => .r_paren,
c.TOKEN_SEMICOLON => .semicolon,
c.TOKEN_PERCENT => .percent,
c.TOKEN_PERCENT_EQUAL => .percent_equal,
c.TOKEN_L_BRACE => .l_brace,
c.TOKEN_R_BRACE => .r_brace,
c.TOKEN_L_BRACKET => .l_bracket,
c.TOKEN_R_BRACKET => .r_bracket,
c.TOKEN_PERIOD => .period,
c.TOKEN_PERIOD_ASTERISK => .period_asterisk,
c.TOKEN_ELLIPSIS2 => .ellipsis2,
c.TOKEN_ELLIPSIS3 => .ellipsis3,
c.TOKEN_CARET => .caret,
c.TOKEN_CARET_EQUAL => .caret_equal,
c.TOKEN_PLUS => .plus,
c.TOKEN_PLUS_PLUS => .plus_plus,
c.TOKEN_PLUS_EQUAL => .plus_equal,
c.TOKEN_PLUS_PERCENT => .plus_percent,
c.TOKEN_PLUS_PERCENT_EQUAL => .plus_percent_equal,
c.TOKEN_PLUS_PIPE => .plus_pipe,
c.TOKEN_PLUS_PIPE_EQUAL => .plus_pipe_equal,
c.TOKEN_MINUS => .minus,
c.TOKEN_MINUS_EQUAL => .minus_equal,
c.TOKEN_MINUS_PERCENT => .minus_percent,
c.TOKEN_MINUS_PERCENT_EQUAL => .minus_percent_equal,
c.TOKEN_MINUS_PIPE => .minus_pipe,
c.TOKEN_MINUS_PIPE_EQUAL => .minus_pipe_equal,
c.TOKEN_ASTERISK => .asterisk,
c.TOKEN_ASTERISK_EQUAL => .asterisk_equal,
c.TOKEN_ASTERISK_ASTERISK => .asterisk_asterisk,
c.TOKEN_ASTERISK_PERCENT => .asterisk_percent,
c.TOKEN_ASTERISK_PERCENT_EQUAL => .asterisk_percent_equal,
c.TOKEN_ASTERISK_PIPE => .asterisk_pipe,
c.TOKEN_ASTERISK_PIPE_EQUAL => .asterisk_pipe_equal,
c.TOKEN_ARROW => .arrow,
c.TOKEN_COLON => .colon,
c.TOKEN_SLASH => .slash,
c.TOKEN_SLASH_EQUAL => .slash_equal,
c.TOKEN_COMMA => .comma,
c.TOKEN_AMPERSAND => .ampersand,
c.TOKEN_AMPERSAND_EQUAL => .ampersand_equal,
c.TOKEN_QUESTION_MARK => .question_mark,
c.TOKEN_ANGLE_BRACKET_LEFT => .angle_bracket_left,
c.TOKEN_ANGLE_BRACKET_LEFT_EQUAL => .angle_bracket_left_equal,
c.TOKEN_ANGLE_BRACKET_ANGLE_BRACKET_LEFT => .angle_bracket_angle_bracket_left,
c.TOKEN_ANGLE_BRACKET_ANGLE_BRACKET_LEFT_EQUAL => .angle_bracket_angle_bracket_left_equal,
c.TOKEN_ANGLE_BRACKET_ANGLE_BRACKET_LEFT_PIPE => .angle_bracket_angle_bracket_left_pipe,
c.TOKEN_ANGLE_BRACKET_ANGLE_BRACKET_LEFT_PIPE_EQUAL => .angle_bracket_angle_bracket_left_pipe_equal,
c.TOKEN_ANGLE_BRACKET_RIGHT => .angle_bracket_right,
c.TOKEN_ANGLE_BRACKET_RIGHT_EQUAL => .angle_bracket_right_equal,
c.TOKEN_ANGLE_BRACKET_ANGLE_BRACKET_RIGHT => .angle_bracket_angle_bracket_right,
c.TOKEN_ANGLE_BRACKET_ANGLE_BRACKET_RIGHT_EQUAL => .angle_bracket_angle_bracket_right_equal,
c.TOKEN_TILDE => .tilde,
c.TOKEN_NUMBER_LITERAL => .number_literal,
c.TOKEN_DOC_COMMENT => .doc_comment,
c.TOKEN_CONTAINER_DOC_COMMENT => .container_doc_comment,
c.TOKEN_KEYWORD_ADDRSPACE => .keyword_addrspace,
c.TOKEN_KEYWORD_ALIGN => .keyword_align,
c.TOKEN_KEYWORD_ALLOWZERO => .keyword_allowzero,
c.TOKEN_KEYWORD_AND => .keyword_and,
c.TOKEN_KEYWORD_ANYFRAME => .keyword_anyframe,
c.TOKEN_KEYWORD_ANYTYPE => .keyword_anytype,
c.TOKEN_KEYWORD_ASM => .keyword_asm,
c.TOKEN_KEYWORD_BREAK => .keyword_break,
c.TOKEN_KEYWORD_CALLCONV => .keyword_callconv,
c.TOKEN_KEYWORD_CATCH => .keyword_catch,
c.TOKEN_KEYWORD_COMPTIME => .keyword_comptime,
c.TOKEN_KEYWORD_CONST => .keyword_const,
c.TOKEN_KEYWORD_CONTINUE => .keyword_continue,
c.TOKEN_KEYWORD_DEFER => .keyword_defer,
c.TOKEN_KEYWORD_ELSE => .keyword_else,
c.TOKEN_KEYWORD_ENUM => .keyword_enum,
c.TOKEN_KEYWORD_ERRDEFER => .keyword_errdefer,
c.TOKEN_KEYWORD_ERROR => .keyword_error,
c.TOKEN_KEYWORD_EXPORT => .keyword_export,
c.TOKEN_KEYWORD_EXTERN => .keyword_extern,
c.TOKEN_KEYWORD_FN => .keyword_fn,
c.TOKEN_KEYWORD_FOR => .keyword_for,
c.TOKEN_KEYWORD_IF => .keyword_if,
c.TOKEN_KEYWORD_INLINE => .keyword_inline,
c.TOKEN_KEYWORD_NOALIAS => .keyword_noalias,
c.TOKEN_KEYWORD_NOINLINE => .keyword_noinline,
c.TOKEN_KEYWORD_NOSUSPEND => .keyword_nosuspend,
c.TOKEN_KEYWORD_OPAQUE => .keyword_opaque,
c.TOKEN_KEYWORD_OR => .keyword_or,
c.TOKEN_KEYWORD_ORELSE => .keyword_orelse,
c.TOKEN_KEYWORD_PACKED => .keyword_packed,
c.TOKEN_KEYWORD_PUB => .keyword_pub,
c.TOKEN_KEYWORD_RESUME => .keyword_resume,
c.TOKEN_KEYWORD_RETURN => .keyword_return,
c.TOKEN_KEYWORD_LINKSECTION => .keyword_linksection,
c.TOKEN_KEYWORD_STRUCT => .keyword_struct,
c.TOKEN_KEYWORD_SUSPEND => .keyword_suspend,
c.TOKEN_KEYWORD_SWITCH => .keyword_switch,
c.TOKEN_KEYWORD_TEST => .keyword_test,
c.TOKEN_KEYWORD_THREADLOCAL => .keyword_threadlocal,
c.TOKEN_KEYWORD_TRY => .keyword_try,
c.TOKEN_KEYWORD_UNION => .keyword_union,
c.TOKEN_KEYWORD_UNREACHABLE => .keyword_unreachable,
c.TOKEN_KEYWORD_VAR => .keyword_var,
c.TOKEN_KEYWORD_VOLATILE => .keyword_volatile,
c.TOKEN_KEYWORD_WHILE => .keyword_while,
else => undefined,
};
}
// Copy-pasted from lib/std/zig/tokenizer.zig
fn testTokenize(source: [:0]const u8, expected_token_tags: []const Token.Tag) !void {
// Do the C thing
{
var ctokenizer = c.tokenizerInit(source.ptr, @intCast(source.len));
for (expected_token_tags) |expected_token_tag| {
const token = c.tokenizerNext(&ctokenizer);
try std.testing.expectEqual(expected_token_tag, zigToken(token.tag));
}
const last_token = c.tokenizerNext(&ctokenizer);
try std.testing.expectEqual(Token.Tag.eof, zigToken(last_token.tag));
}
{
var tokenizer = Tokenizer.init(source);
for (expected_token_tags) |expected_token_tag| {
const token = tokenizer.next();
try std.testing.expectEqual(expected_token_tag, token.tag);
}
// Last token should always be eof, even when the last token was invalid,
// in which case the tokenizer is in an invalid state, which can only be
// recovered by opinionated means outside the scope of this implementation.
const last_token = tokenizer.next();
try std.testing.expectEqual(Token.Tag.eof, last_token.tag);
try std.testing.expectEqual(source.len, last_token.loc.start);
try std.testing.expectEqual(source.len, last_token.loc.end);
}
}
test "keywords" {
try testTokenize("test const else", &.{ .keyword_test, .keyword_const, .keyword_else });
}
test "line comment followed by top-level comptime" {
try testTokenize(
\\// line comment
\\comptime {}
\\
, &.{
.keyword_comptime,
.l_brace,
.r_brace,
});
}
test "unknown length pointer and then c pointer" {
try testTokenize(
\\[*]u8
\\[*c]u8
, &.{
.l_bracket,
.asterisk,
.r_bracket,
.identifier,
.l_bracket,
.asterisk,
.identifier,
.r_bracket,
.identifier,
});
}
test "code point literal with hex escape" {
try testTokenize(
\\'\x1b'
, &.{.char_literal});
try testTokenize(
\\'\x1'
, &.{.char_literal});
}
test "newline in char literal" {
try testTokenize(
\\'
\\'
, &.{ .invalid, .invalid });
}
test "newline in string literal" {
try testTokenize(
\\"
\\"
, &.{ .invalid, .invalid });
}
test "code point literal with unicode escapes" {
// Valid unicode escapes
try testTokenize(
\\'\u{3}'
, &.{.char_literal});
try testTokenize(
\\'\u{01}'
, &.{.char_literal});
try testTokenize(
\\'\u{2a}'
, &.{.char_literal});
try testTokenize(
\\'\u{3f9}'
, &.{.char_literal});
try testTokenize(
\\'\u{6E09aBc1523}'
, &.{.char_literal});
try testTokenize(
\\"\u{440}"
, &.{.string_literal});
// Invalid unicode escapes
try testTokenize(
\\'\u'
, &.{.char_literal});
try testTokenize(
\\'\u{{'
, &.{.char_literal});
try testTokenize(
\\'\u{}'
, &.{.char_literal});
try testTokenize(
\\'\u{s}'
, &.{.char_literal});
try testTokenize(
\\'\u{2z}'
, &.{.char_literal});
try testTokenize(
\\'\u{4a'
, &.{.char_literal});
// Test old-style unicode literals
try testTokenize(
\\'\u0333'
, &.{.char_literal});
try testTokenize(
\\'\U0333'
, &.{.char_literal});
}
test "code point literal with unicode code point" {
try testTokenize(
\\'💩'
, &.{.char_literal});
}
test "float literal e exponent" {
try testTokenize("a = 4.94065645841246544177e-324;\n", &.{
.identifier,
.equal,
.number_literal,
.semicolon,
});
}
test "float literal p exponent" {
try testTokenize("a = 0x1.a827999fcef32p+1022;\n", &.{
.identifier,
.equal,
.number_literal,
.semicolon,
});
}
test "chars" {
try testTokenize("'c'", &.{.char_literal});
}
test "invalid token characters" {
try testTokenize("#", &.{.invalid});
try testTokenize("`", &.{.invalid});
try testTokenize("'c", &.{.invalid});
try testTokenize("'", &.{.invalid});
try testTokenize("''", &.{.char_literal});
try testTokenize("'\n'", &.{ .invalid, .invalid });
}
test "invalid literal/comment characters" {
try testTokenize("\"\x00\"", &.{.invalid});
try testTokenize("`\x00`", &.{.invalid});
try testTokenize("//\x00", &.{.invalid});
try testTokenize("//\x1f", &.{.invalid});
try testTokenize("//\x7f", &.{.invalid});
}
test "utf8" {
try testTokenize("//\xc2\x80", &.{});
try testTokenize("//\xf4\x8f\xbf\xbf", &.{});
}
test "invalid utf8" {
try testTokenize("//\x80", &.{});
try testTokenize("//\xbf", &.{});
try testTokenize("//\xf8", &.{});
try testTokenize("//\xff", &.{});
try testTokenize("//\xc2\xc0", &.{});
try testTokenize("//\xe0", &.{});
try testTokenize("//\xf0", &.{});
try testTokenize("//\xf0\x90\x80\xc0", &.{});
}
test "illegal unicode codepoints" {
// unicode newline characters.U+0085, U+2028, U+2029
try testTokenize("//\xc2\x84", &.{});
try testTokenize("//\xc2\x85", &.{});
try testTokenize("//\xc2\x86", &.{});
try testTokenize("//\xe2\x80\xa7", &.{});
try testTokenize("//\xe2\x80\xa8", &.{});
try testTokenize("//\xe2\x80\xa9", &.{});
try testTokenize("//\xe2\x80\xaa", &.{});
}
test "string identifier and builtin fns" {
try testTokenize(
\\const @"if" = @import("std");
, &.{
.keyword_const,
.identifier,
.equal,
.builtin,
.l_paren,
.string_literal,
.r_paren,
.semicolon,
});
}
test "pipe and then invalid" {
try testTokenize("||=", &.{
.pipe_pipe,
.equal,
});
}
test "line comment and doc comment" {
try testTokenize("//", &.{});
try testTokenize("// a / b", &.{});
try testTokenize("// /", &.{});
try testTokenize("/// a", &.{.doc_comment});
try testTokenize("///", &.{.doc_comment});
try testTokenize("////", &.{});
try testTokenize("//!", &.{.container_doc_comment});
try testTokenize("//!!", &.{.container_doc_comment});
}
test "line comment followed by identifier" {
try testTokenize(
\\ Unexpected,
\\ // another
\\ Another,
, &.{
.identifier,
.comma,
.identifier,
.comma,
});
}
test "UTF-8 BOM is recognized and skipped" {
try testTokenize("\xEF\xBB\xBFa;\n", &.{
.identifier,
.semicolon,
});
}
test "correctly parse pointer assignment" {
try testTokenize("b.*=3;\n", &.{
.identifier,
.period_asterisk,
.equal,
.number_literal,
.semicolon,
});
}
test "correctly parse pointer dereference followed by asterisk" {
try testTokenize("\"b\".* ** 10", &.{
.string_literal,
.period_asterisk,
.asterisk_asterisk,
.number_literal,
});
try testTokenize("(\"b\".*)** 10", &.{
.l_paren,
.string_literal,
.period_asterisk,
.r_paren,
.asterisk_asterisk,
.number_literal,
});
try testTokenize("\"b\".*** 10", &.{
.string_literal,
.invalid_periodasterisks,
.asterisk_asterisk,
.number_literal,
});
}
test "range literals" {
try testTokenize("0...9", &.{ .number_literal, .ellipsis3, .number_literal });
try testTokenize("'0'...'9'", &.{ .char_literal, .ellipsis3, .char_literal });
try testTokenize("0x00...0x09", &.{ .number_literal, .ellipsis3, .number_literal });
try testTokenize("0b00...0b11", &.{ .number_literal, .ellipsis3, .number_literal });
try testTokenize("0o00...0o11", &.{ .number_literal, .ellipsis3, .number_literal });
}
test "number literals decimal" {
try testTokenize("0", &.{.number_literal});
try testTokenize("1", &.{.number_literal});
try testTokenize("2", &.{.number_literal});
try testTokenize("3", &.{.number_literal});
try testTokenize("4", &.{.number_literal});
try testTokenize("5", &.{.number_literal});
try testTokenize("6", &.{.number_literal});
try testTokenize("7", &.{.number_literal});
try testTokenize("8", &.{.number_literal});
try testTokenize("9", &.{.number_literal});
try testTokenize("1..", &.{ .number_literal, .ellipsis2 });
try testTokenize("0a", &.{.number_literal});
try testTokenize("9b", &.{.number_literal});
try testTokenize("1z", &.{.number_literal});
try testTokenize("1z_1", &.{.number_literal});
try testTokenize("9z3", &.{.number_literal});
try testTokenize("0_0", &.{.number_literal});
try testTokenize("0001", &.{.number_literal});
try testTokenize("01234567890", &.{.number_literal});
try testTokenize("012_345_6789_0", &.{.number_literal});
try testTokenize("0_1_2_3_4_5_6_7_8_9_0", &.{.number_literal});
try testTokenize("00_", &.{.number_literal});
try testTokenize("0_0_", &.{.number_literal});
try testTokenize("0__0", &.{.number_literal});
try testTokenize("0_0f", &.{.number_literal});
try testTokenize("0_0_f", &.{.number_literal});
try testTokenize("0_0_f_00", &.{.number_literal});
try testTokenize("1_,", &.{ .number_literal, .comma });
try testTokenize("0.0", &.{.number_literal});
try testTokenize("1.0", &.{.number_literal});
try testTokenize("10.0", &.{.number_literal});
try testTokenize("0e0", &.{.number_literal});
try testTokenize("1e0", &.{.number_literal});
try testTokenize("1e100", &.{.number_literal});
try testTokenize("1.0e100", &.{.number_literal});
try testTokenize("1.0e+100", &.{.number_literal});
try testTokenize("1.0e-100", &.{.number_literal});
try testTokenize("1_0_0_0.0_0_0_0_0_1e1_0_0_0", &.{.number_literal});
try testTokenize("1.", &.{ .number_literal, .period });
try testTokenize("1e", &.{.number_literal});
try testTokenize("1.e100", &.{.number_literal});
try testTokenize("1.0e1f0", &.{.number_literal});
try testTokenize("1.0p100", &.{.number_literal});
try testTokenize("1.0p-100", &.{.number_literal});
try testTokenize("1.0p1f0", &.{.number_literal});
try testTokenize("1.0_,", &.{ .number_literal, .comma });
try testTokenize("1_.0", &.{.number_literal});
try testTokenize("1._", &.{.number_literal});
try testTokenize("1.a", &.{.number_literal});
try testTokenize("1.z", &.{.number_literal});
try testTokenize("1._0", &.{.number_literal});
try testTokenize("1.+", &.{ .number_literal, .period, .plus });
try testTokenize("1._+", &.{ .number_literal, .plus });
try testTokenize("1._e", &.{.number_literal});
try testTokenize("1.0e", &.{.number_literal});
try testTokenize("1.0e,", &.{ .number_literal, .comma });
try testTokenize("1.0e_", &.{.number_literal});
try testTokenize("1.0e+_", &.{.number_literal});
try testTokenize("1.0e-_", &.{.number_literal});
try testTokenize("1.0e0_+", &.{ .number_literal, .plus });
}
test "number literals binary" {
try testTokenize("0b0", &.{.number_literal});
try testTokenize("0b1", &.{.number_literal});
try testTokenize("0b2", &.{.number_literal});
try testTokenize("0b3", &.{.number_literal});
try testTokenize("0b4", &.{.number_literal});
try testTokenize("0b5", &.{.number_literal});
try testTokenize("0b6", &.{.number_literal});
try testTokenize("0b7", &.{.number_literal});
try testTokenize("0b8", &.{.number_literal});
try testTokenize("0b9", &.{.number_literal});
try testTokenize("0ba", &.{.number_literal});
try testTokenize("0bb", &.{.number_literal});
try testTokenize("0bc", &.{.number_literal});
try testTokenize("0bd", &.{.number_literal});
try testTokenize("0be", &.{.number_literal});
try testTokenize("0bf", &.{.number_literal});
try testTokenize("0bz", &.{.number_literal});
try testTokenize("0b0000_0000", &.{.number_literal});
try testTokenize("0b1111_1111", &.{.number_literal});
try testTokenize("0b10_10_10_10", &.{.number_literal});
try testTokenize("0b0_1_0_1_0_1_0_1", &.{.number_literal});
try testTokenize("0b1.", &.{ .number_literal, .period });
try testTokenize("0b1.0", &.{.number_literal});
try testTokenize("0B0", &.{.number_literal});
try testTokenize("0b_", &.{.number_literal});
try testTokenize("0b_0", &.{.number_literal});
try testTokenize("0b1_", &.{.number_literal});
try testTokenize("0b0__1", &.{.number_literal});
try testTokenize("0b0_1_", &.{.number_literal});
try testTokenize("0b1e", &.{.number_literal});
try testTokenize("0b1p", &.{.number_literal});
try testTokenize("0b1e0", &.{.number_literal});
try testTokenize("0b1p0", &.{.number_literal});
try testTokenize("0b1_,", &.{ .number_literal, .comma });
}
test "number literals octal" {
try testTokenize("0o0", &.{.number_literal});
try testTokenize("0o1", &.{.number_literal});
try testTokenize("0o2", &.{.number_literal});
try testTokenize("0o3", &.{.number_literal});
try testTokenize("0o4", &.{.number_literal});
try testTokenize("0o5", &.{.number_literal});
try testTokenize("0o6", &.{.number_literal});
try testTokenize("0o7", &.{.number_literal});
try testTokenize("0o8", &.{.number_literal});
try testTokenize("0o9", &.{.number_literal});
try testTokenize("0oa", &.{.number_literal});
try testTokenize("0ob", &.{.number_literal});
try testTokenize("0oc", &.{.number_literal});
try testTokenize("0od", &.{.number_literal});
try testTokenize("0oe", &.{.number_literal});
try testTokenize("0of", &.{.number_literal});
try testTokenize("0oz", &.{.number_literal});
try testTokenize("0o01234567", &.{.number_literal});
try testTokenize("0o0123_4567", &.{.number_literal});
try testTokenize("0o01_23_45_67", &.{.number_literal});
try testTokenize("0o0_1_2_3_4_5_6_7", &.{.number_literal});
try testTokenize("0o7.", &.{ .number_literal, .period });
try testTokenize("0o7.0", &.{.number_literal});
try testTokenize("0O0", &.{.number_literal});
try testTokenize("0o_", &.{.number_literal});
try testTokenize("0o_0", &.{.number_literal});
try testTokenize("0o1_", &.{.number_literal});
try testTokenize("0o0__1", &.{.number_literal});
try testTokenize("0o0_1_", &.{.number_literal});
try testTokenize("0o1e", &.{.number_literal});
try testTokenize("0o1p", &.{.number_literal});
try testTokenize("0o1e0", &.{.number_literal});
try testTokenize("0o1p0", &.{.number_literal});
try testTokenize("0o_,", &.{ .number_literal, .comma });
}
test "number literals hexadecimal" {
try testTokenize("0x0", &.{.number_literal});
try testTokenize("0x1", &.{.number_literal});
try testTokenize("0x2", &.{.number_literal});
try testTokenize("0x3", &.{.number_literal});
try testTokenize("0x4", &.{.number_literal});
try testTokenize("0x5", &.{.number_literal});
try testTokenize("0x6", &.{.number_literal});
try testTokenize("0x7", &.{.number_literal});
try testTokenize("0x8", &.{.number_literal});
try testTokenize("0x9", &.{.number_literal});
try testTokenize("0xa", &.{.number_literal});
try testTokenize("0xb", &.{.number_literal});
try testTokenize("0xc", &.{.number_literal});
try testTokenize("0xd", &.{.number_literal});
try testTokenize("0xe", &.{.number_literal});
try testTokenize("0xf", &.{.number_literal});
try testTokenize("0xA", &.{.number_literal});
try testTokenize("0xB", &.{.number_literal});
try testTokenize("0xC", &.{.number_literal});
try testTokenize("0xD", &.{.number_literal});
try testTokenize("0xE", &.{.number_literal});
try testTokenize("0xF", &.{.number_literal});
try testTokenize("0x0z", &.{.number_literal});
try testTokenize("0xz", &.{.number_literal});
try testTokenize("0x0123456789ABCDEF", &.{.number_literal});
try testTokenize("0x0123_4567_89AB_CDEF", &.{.number_literal});
try testTokenize("0x01_23_45_67_89AB_CDE_F", &.{.number_literal});
try testTokenize("0x0_1_2_3_4_5_6_7_8_9_A_B_C_D_E_F", &.{.number_literal});
try testTokenize("0X0", &.{.number_literal});
try testTokenize("0x_", &.{.number_literal});
try testTokenize("0x_1", &.{.number_literal});
try testTokenize("0x1_", &.{.number_literal});
try testTokenize("0x0__1", &.{.number_literal});
try testTokenize("0x0_1_", &.{.number_literal});
try testTokenize("0x_,", &.{ .number_literal, .comma });
try testTokenize("0x1.0", &.{.number_literal});
try testTokenize("0xF.0", &.{.number_literal});
try testTokenize("0xF.F", &.{.number_literal});
try testTokenize("0xF.Fp0", &.{.number_literal});
try testTokenize("0xF.FP0", &.{.number_literal});
try testTokenize("0x1p0", &.{.number_literal});
try testTokenize("0xfp0", &.{.number_literal});
try testTokenize("0x1.0+0xF.0", &.{ .number_literal, .plus, .number_literal });
try testTokenize("0x1.", &.{ .number_literal, .period });
try testTokenize("0xF.", &.{ .number_literal, .period });
try testTokenize("0x1.+0xF.", &.{ .number_literal, .period, .plus, .number_literal, .period });
try testTokenize("0xff.p10", &.{.number_literal});
try testTokenize("0x0123456.789ABCDEF", &.{.number_literal});
try testTokenize("0x0_123_456.789_ABC_DEF", &.{.number_literal});
try testTokenize("0x0_1_2_3_4_5_6.7_8_9_A_B_C_D_E_F", &.{.number_literal});
try testTokenize("0x0p0", &.{.number_literal});
try testTokenize("0x0.0p0", &.{.number_literal});
try testTokenize("0xff.ffp10", &.{.number_literal});
try testTokenize("0xff.ffP10", &.{.number_literal});
try testTokenize("0xffp10", &.{.number_literal});
try testTokenize("0xff_ff.ff_ffp1_0_0_0", &.{.number_literal});
try testTokenize("0xf_f_f_f.f_f_f_fp+1_000", &.{.number_literal});
try testTokenize("0xf_f_f_f.f_f_f_fp-1_00_0", &.{.number_literal});
try testTokenize("0x1e", &.{.number_literal});
try testTokenize("0x1e0", &.{.number_literal});
try testTokenize("0x1p", &.{.number_literal});
try testTokenize("0xfp0z1", &.{.number_literal});
try testTokenize("0xff.ffpff", &.{.number_literal});
try testTokenize("0x0.p", &.{.number_literal});
try testTokenize("0x0.z", &.{.number_literal});
try testTokenize("0x0._", &.{.number_literal});
try testTokenize("0x0_.0", &.{.number_literal});
try testTokenize("0x0_.0.0", &.{ .number_literal, .period, .number_literal });
try testTokenize("0x0._0", &.{.number_literal});
try testTokenize("0x0.0_", &.{.number_literal});
try testTokenize("0x0_p0", &.{.number_literal});
try testTokenize("0x0_.p0", &.{.number_literal});
try testTokenize("0x0._p0", &.{.number_literal});
try testTokenize("0x0.0_p0", &.{.number_literal});
try testTokenize("0x0._0p0", &.{.number_literal});
try testTokenize("0x0.0p_0", &.{.number_literal});
try testTokenize("0x0.0p+_0", &.{.number_literal});
try testTokenize("0x0.0p-_0", &.{.number_literal});
try testTokenize("0x0.0p0_", &.{.number_literal});
}
test "multi line string literal with only 1 backslash" {
try testTokenize("x \\\n;", &.{ .identifier, .invalid, .semicolon });
}
test "invalid builtin identifiers" {
try testTokenize("@()", &.{.invalid});
try testTokenize("@0()", &.{.invalid});
}
test "invalid token with unfinished escape right before eof" {
try testTokenize("\"\\", &.{.invalid});
try testTokenize("'\\", &.{.invalid});
try testTokenize("'\\u", &.{.invalid});
}
test "saturating operators" {
try testTokenize("<<", &.{.angle_bracket_angle_bracket_left});
try testTokenize("<<|", &.{.angle_bracket_angle_bracket_left_pipe});
try testTokenize("<<|=", &.{.angle_bracket_angle_bracket_left_pipe_equal});
try testTokenize("*", &.{.asterisk});
try testTokenize("*|", &.{.asterisk_pipe});
try testTokenize("*|=", &.{.asterisk_pipe_equal});
try testTokenize("+", &.{.plus});
try testTokenize("+|", &.{.plus_pipe});
try testTokenize("+|=", &.{.plus_pipe_equal});
try testTokenize("-", &.{.minus});
try testTokenize("-|", &.{.minus_pipe});
try testTokenize("-|=", &.{.minus_pipe_equal});
}
test "null byte before eof" {
try testTokenize("123 \x00 456", &.{ .number_literal, .invalid });
try testTokenize("//\x00", &.{.invalid});
try testTokenize("\\\\\x00", &.{.invalid});
try testTokenize("\x00", &.{.invalid});
try testTokenize("// NUL\x00\n", &.{.invalid});
try testTokenize("///\x00\n", &.{ .doc_comment, .invalid });
try testTokenize("/// NUL\x00\n", &.{ .doc_comment, .invalid });
}
test "invalid tabs and carriage returns" {
// "Inside Line Comments and Documentation Comments, Any TAB is rejected by
// the grammar since it is ambiguous how it should be rendered."
// https://github.com/ziglang/zig-spec/issues/38
try testTokenize("//\t", &.{.invalid});
try testTokenize("// \t", &.{.invalid});
try testTokenize("///\t", &.{.invalid});
try testTokenize("/// \t", &.{.invalid});
try testTokenize("//!\t", &.{.invalid});
try testTokenize("//! \t", &.{.invalid});
// "Inside Line Comments and Documentation Comments, CR directly preceding
// NL is unambiguously part of the newline sequence. It is accepted by the
// grammar and removed by zig fmt, leaving only NL. CR anywhere else is
// rejected by the grammar."
// https://github.com/ziglang/zig-spec/issues/38
try testTokenize("//\r", &.{.invalid});
try testTokenize("// \r", &.{.invalid});
try testTokenize("///\r", &.{.invalid});
try testTokenize("/// \r", &.{.invalid});
try testTokenize("//\r ", &.{.invalid});
try testTokenize("// \r ", &.{.invalid});
try testTokenize("///\r ", &.{.invalid});
try testTokenize("/// \r ", &.{.invalid});
try testTokenize("//\r\n", &.{});
try testTokenize("// \r\n", &.{});
try testTokenize("///\r\n", &.{.doc_comment});
try testTokenize("/// \r\n", &.{.doc_comment});
try testTokenize("//!\r", &.{.invalid});
try testTokenize("//! \r", &.{.invalid});
try testTokenize("//!\r ", &.{.invalid});
try testTokenize("//! \r ", &.{.invalid});
try testTokenize("//!\r\n", &.{.container_doc_comment});
try testTokenize("//! \r\n", &.{.container_doc_comment});
// The control characters TAB and CR are rejected by the grammar inside multi-line string literals,
// except if CR is directly before NL.
// https://github.com/ziglang/zig-spec/issues/38
try testTokenize("\\\\\r", &.{.invalid});
try testTokenize("\\\\\r ", &.{.invalid});
try testTokenize("\\\\ \r", &.{.invalid});
try testTokenize("\\\\\t", &.{.invalid});
try testTokenize("\\\\\t ", &.{.invalid});
try testTokenize("\\\\ \t", &.{.invalid});
try testTokenize("\\\\\r\n", &.{.multiline_string_literal_line});
// "TAB used as whitespace is...accepted by the grammar. CR used as
// whitespace, whether directly preceding NL or stray, is...accepted by the
// grammar."
// https://github.com/ziglang/zig-spec/issues/38
try testTokenize("\tpub\tswitch\t", &.{ .keyword_pub, .keyword_switch });
try testTokenize("\rpub\rswitch\r", &.{ .keyword_pub, .keyword_switch });
}

5
stage0/zig-interp.txt Normal file
View File

@@ -0,0 +1,5 @@
1. implement @panic, write a test that does it.
2. local variables.
3. control flow.
4. functions.
5. imports until one can import stdlib.

59
stage0/zig0.c Normal file
View File

@@ -0,0 +1,59 @@
#include "common.h"
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
// API:
// - code = 0: program successfully terminated.
// - code = 1: panicked, panic message in msg. Caller should free msg.
// - code = 2: interpreter error, error in msg. Caller should free msg.
static int zig0Run(const char* program, char** msg) {
(void)program;
(void)msg;
return 0;
}
// API: run and:
// code = 3: abnormal error, expect something in stderr.
int zig0RunFile(const char* fname, char** msg) {
FILE* f = fopen(fname, "r");
if (f == NULL) {
perror("fopen");
return 3;
}
fseek(f, 0, SEEK_END);
long fsizel = ftell(f);
if (fsizel == -1) {
perror("ftell");
fclose(f);
return 3;
}
unsigned long fsize = (unsigned long)fsizel;
fseek(f, 0, SEEK_SET);
char* program = malloc(fsize + 1);
if (program == NULL) {
perror("malloc");
fclose(f);
return 3;
}
size_t bytes_read = fread(program, 1, fsize, f);
if (bytes_read < fsize) {
if (ferror(f)) {
perror("fread");
} else {
fprintf(stderr, "Unexpected end of file\n");
}
free(program);
fclose(f);
return 3;
}
fclose(f);
program[fsize] = 0;
int code = zig0Run(program, msg);
free(program);
return code;
}

19
stage0/zir.c Normal file
View File

@@ -0,0 +1,19 @@
#include "zir.h"
#include <stdlib.h>
void zirDeinit(Zir* zir) {
free(zir->inst_tags);
free(zir->inst_datas);
free(zir->extra);
free(zir->string_bytes);
zir->inst_tags = NULL;
zir->inst_datas = NULL;
zir->extra = NULL;
zir->string_bytes = NULL;
zir->inst_len = 0;
zir->inst_cap = 0;
zir->extra_len = 0;
zir->extra_cap = 0;
zir->string_bytes_len = 0;
zir->string_bytes_cap = 0;
}

544
stage0/zir.h Normal file
View File

@@ -0,0 +1,544 @@
// zir.h — ZIR data structures, ported from lib/std/zig/Zir.zig.
#ifndef _ZIG0_ZIR_H__
#define _ZIG0_ZIR_H__
#include "common.h"
#include <stdbool.h>
#include <stdint.h>
// --- ZIR instruction tags (uint8_t) ---
// Matches Zir.Inst.Tag enum order from Zir.zig.
#define ZIR_INST_FOREACH_TAG(TAG) \
TAG(ZIR_INST_ADD) \
TAG(ZIR_INST_ADDWRAP) \
TAG(ZIR_INST_ADD_SAT) \
TAG(ZIR_INST_ADD_UNSAFE) \
TAG(ZIR_INST_SUB) \
TAG(ZIR_INST_SUBWRAP) \
TAG(ZIR_INST_SUB_SAT) \
TAG(ZIR_INST_MUL) \
TAG(ZIR_INST_MULWRAP) \
TAG(ZIR_INST_MUL_SAT) \
TAG(ZIR_INST_DIV_EXACT) \
TAG(ZIR_INST_DIV_FLOOR) \
TAG(ZIR_INST_DIV_TRUNC) \
TAG(ZIR_INST_MOD) \
TAG(ZIR_INST_REM) \
TAG(ZIR_INST_MOD_REM) \
TAG(ZIR_INST_SHL) \
TAG(ZIR_INST_SHL_EXACT) \
TAG(ZIR_INST_SHL_SAT) \
TAG(ZIR_INST_SHR) \
TAG(ZIR_INST_SHR_EXACT) \
TAG(ZIR_INST_PARAM) \
TAG(ZIR_INST_PARAM_COMPTIME) \
TAG(ZIR_INST_PARAM_ANYTYPE) \
TAG(ZIR_INST_PARAM_ANYTYPE_COMPTIME) \
TAG(ZIR_INST_ARRAY_CAT) \
TAG(ZIR_INST_ARRAY_MUL) \
TAG(ZIR_INST_ARRAY_TYPE) \
TAG(ZIR_INST_ARRAY_TYPE_SENTINEL) \
TAG(ZIR_INST_VECTOR_TYPE) \
TAG(ZIR_INST_ELEM_TYPE) \
TAG(ZIR_INST_INDEXABLE_PTR_ELEM_TYPE) \
TAG(ZIR_INST_SPLAT_OP_RESULT_TY) \
TAG(ZIR_INST_INDEXABLE_PTR_LEN) \
TAG(ZIR_INST_ANYFRAME_TYPE) \
TAG(ZIR_INST_AS_NODE) \
TAG(ZIR_INST_AS_SHIFT_OPERAND) \
TAG(ZIR_INST_BIT_AND) \
TAG(ZIR_INST_BITCAST) \
TAG(ZIR_INST_BIT_NOT) \
TAG(ZIR_INST_BIT_OR) \
TAG(ZIR_INST_BLOCK) \
TAG(ZIR_INST_BLOCK_COMPTIME) \
TAG(ZIR_INST_BLOCK_INLINE) \
TAG(ZIR_INST_DECLARATION) \
TAG(ZIR_INST_SUSPEND_BLOCK) \
TAG(ZIR_INST_BOOL_NOT) \
TAG(ZIR_INST_BOOL_BR_AND) \
TAG(ZIR_INST_BOOL_BR_OR) \
TAG(ZIR_INST_BREAK) \
TAG(ZIR_INST_BREAK_INLINE) \
TAG(ZIR_INST_SWITCH_CONTINUE) \
TAG(ZIR_INST_CHECK_COMPTIME_CONTROL_FLOW) \
TAG(ZIR_INST_CALL) \
TAG(ZIR_INST_FIELD_CALL) \
TAG(ZIR_INST_BUILTIN_CALL) \
TAG(ZIR_INST_CMP_LT) \
TAG(ZIR_INST_CMP_LTE) \
TAG(ZIR_INST_CMP_EQ) \
TAG(ZIR_INST_CMP_GTE) \
TAG(ZIR_INST_CMP_GT) \
TAG(ZIR_INST_CMP_NEQ) \
TAG(ZIR_INST_CONDBR) \
TAG(ZIR_INST_CONDBR_INLINE) \
TAG(ZIR_INST_TRY) \
TAG(ZIR_INST_TRY_PTR) \
TAG(ZIR_INST_ERROR_SET_DECL) \
TAG(ZIR_INST_DBG_STMT) \
TAG(ZIR_INST_DBG_VAR_PTR) \
TAG(ZIR_INST_DBG_VAR_VAL) \
TAG(ZIR_INST_DECL_REF) \
TAG(ZIR_INST_DECL_VAL) \
TAG(ZIR_INST_LOAD) \
TAG(ZIR_INST_DIV) \
TAG(ZIR_INST_ELEM_PTR_NODE) \
TAG(ZIR_INST_ELEM_PTR) \
TAG(ZIR_INST_ELEM_VAL_NODE) \
TAG(ZIR_INST_ELEM_VAL) \
TAG(ZIR_INST_ELEM_VAL_IMM) \
TAG(ZIR_INST_ENSURE_RESULT_USED) \
TAG(ZIR_INST_ENSURE_RESULT_NON_ERROR) \
TAG(ZIR_INST_ENSURE_ERR_UNION_PAYLOAD_VOID) \
TAG(ZIR_INST_ERROR_UNION_TYPE) \
TAG(ZIR_INST_ERROR_VALUE) \
TAG(ZIR_INST_EXPORT) \
TAG(ZIR_INST_FIELD_PTR) \
TAG(ZIR_INST_FIELD_VAL) \
TAG(ZIR_INST_FIELD_PTR_NAMED) \
TAG(ZIR_INST_FIELD_VAL_NAMED) \
TAG(ZIR_INST_FUNC) \
TAG(ZIR_INST_FUNC_INFERRED) \
TAG(ZIR_INST_FUNC_FANCY) \
TAG(ZIR_INST_IMPORT) \
TAG(ZIR_INST_INT) \
TAG(ZIR_INST_INT_BIG) \
TAG(ZIR_INST_FLOAT) \
TAG(ZIR_INST_FLOAT128) \
TAG(ZIR_INST_INT_TYPE) \
TAG(ZIR_INST_IS_NON_NULL) \
TAG(ZIR_INST_IS_NON_NULL_PTR) \
TAG(ZIR_INST_IS_NON_ERR) \
TAG(ZIR_INST_IS_NON_ERR_PTR) \
TAG(ZIR_INST_RET_IS_NON_ERR) \
TAG(ZIR_INST_LOOP) \
TAG(ZIR_INST_REPEAT) \
TAG(ZIR_INST_REPEAT_INLINE) \
TAG(ZIR_INST_FOR_LEN) \
TAG(ZIR_INST_MERGE_ERROR_SETS) \
TAG(ZIR_INST_REF) \
TAG(ZIR_INST_RET_NODE) \
TAG(ZIR_INST_RET_LOAD) \
TAG(ZIR_INST_RET_IMPLICIT) \
TAG(ZIR_INST_RET_ERR_VALUE) \
TAG(ZIR_INST_RET_ERR_VALUE_CODE) \
TAG(ZIR_INST_RET_PTR) \
TAG(ZIR_INST_RET_TYPE) \
TAG(ZIR_INST_PTR_TYPE) \
TAG(ZIR_INST_SLICE_START) \
TAG(ZIR_INST_SLICE_END) \
TAG(ZIR_INST_SLICE_SENTINEL) \
TAG(ZIR_INST_SLICE_LENGTH) \
TAG(ZIR_INST_SLICE_SENTINEL_TY) \
TAG(ZIR_INST_STORE_NODE) \
TAG(ZIR_INST_STORE_TO_INFERRED_PTR) \
TAG(ZIR_INST_STR) \
TAG(ZIR_INST_NEGATE) \
TAG(ZIR_INST_NEGATE_WRAP) \
TAG(ZIR_INST_TYPEOF) \
TAG(ZIR_INST_TYPEOF_BUILTIN) \
TAG(ZIR_INST_TYPEOF_LOG2_INT_TYPE) \
TAG(ZIR_INST_UNREACHABLE) \
TAG(ZIR_INST_XOR) \
TAG(ZIR_INST_OPTIONAL_TYPE) \
TAG(ZIR_INST_OPTIONAL_PAYLOAD_SAFE) \
TAG(ZIR_INST_OPTIONAL_PAYLOAD_UNSAFE) \
TAG(ZIR_INST_OPTIONAL_PAYLOAD_SAFE_PTR) \
TAG(ZIR_INST_OPTIONAL_PAYLOAD_UNSAFE_PTR) \
TAG(ZIR_INST_ERR_UNION_PAYLOAD_UNSAFE) \
TAG(ZIR_INST_ERR_UNION_PAYLOAD_UNSAFE_PTR) \
TAG(ZIR_INST_ERR_UNION_CODE) \
TAG(ZIR_INST_ERR_UNION_CODE_PTR) \
TAG(ZIR_INST_ENUM_LITERAL) \
TAG(ZIR_INST_DECL_LITERAL) \
TAG(ZIR_INST_DECL_LITERAL_NO_COERCE) \
TAG(ZIR_INST_SWITCH_BLOCK) \
TAG(ZIR_INST_SWITCH_BLOCK_REF) \
TAG(ZIR_INST_SWITCH_BLOCK_ERR_UNION) \
TAG(ZIR_INST_VALIDATE_DEREF) \
TAG(ZIR_INST_VALIDATE_DESTRUCTURE) \
TAG(ZIR_INST_FIELD_TYPE_REF) \
TAG(ZIR_INST_OPT_EU_BASE_PTR_INIT) \
TAG(ZIR_INST_COERCE_PTR_ELEM_TY) \
TAG(ZIR_INST_VALIDATE_REF_TY) \
TAG(ZIR_INST_VALIDATE_CONST) \
TAG(ZIR_INST_STRUCT_INIT_EMPTY) \
TAG(ZIR_INST_STRUCT_INIT_EMPTY_RESULT) \
TAG(ZIR_INST_STRUCT_INIT_EMPTY_REF_RESULT) \
TAG(ZIR_INST_STRUCT_INIT_ANON) \
TAG(ZIR_INST_STRUCT_INIT) \
TAG(ZIR_INST_STRUCT_INIT_REF) \
TAG(ZIR_INST_VALIDATE_STRUCT_INIT_TY) \
TAG(ZIR_INST_VALIDATE_STRUCT_INIT_RESULT_TY) \
TAG(ZIR_INST_VALIDATE_PTR_STRUCT_INIT) \
TAG(ZIR_INST_STRUCT_INIT_FIELD_TYPE) \
TAG(ZIR_INST_STRUCT_INIT_FIELD_PTR) \
TAG(ZIR_INST_ARRAY_INIT_ANON) \
TAG(ZIR_INST_ARRAY_INIT) \
TAG(ZIR_INST_ARRAY_INIT_REF) \
TAG(ZIR_INST_VALIDATE_ARRAY_INIT_TY) \
TAG(ZIR_INST_VALIDATE_ARRAY_INIT_RESULT_TY) \
TAG(ZIR_INST_VALIDATE_ARRAY_INIT_REF_TY) \
TAG(ZIR_INST_VALIDATE_PTR_ARRAY_INIT) \
TAG(ZIR_INST_ARRAY_INIT_ELEM_TYPE) \
TAG(ZIR_INST_ARRAY_INIT_ELEM_PTR) \
TAG(ZIR_INST_UNION_INIT) \
TAG(ZIR_INST_TYPE_INFO) \
TAG(ZIR_INST_SIZE_OF) \
TAG(ZIR_INST_BIT_SIZE_OF) \
TAG(ZIR_INST_INT_FROM_PTR) \
TAG(ZIR_INST_COMPILE_ERROR) \
TAG(ZIR_INST_SET_EVAL_BRANCH_QUOTA) \
TAG(ZIR_INST_INT_FROM_ENUM) \
TAG(ZIR_INST_ALIGN_OF) \
TAG(ZIR_INST_INT_FROM_BOOL) \
TAG(ZIR_INST_EMBED_FILE) \
TAG(ZIR_INST_ERROR_NAME) \
TAG(ZIR_INST_PANIC) \
TAG(ZIR_INST_TRAP) \
TAG(ZIR_INST_SET_RUNTIME_SAFETY) \
TAG(ZIR_INST_SQRT) \
TAG(ZIR_INST_SIN) \
TAG(ZIR_INST_COS) \
TAG(ZIR_INST_TAN) \
TAG(ZIR_INST_EXP) \
TAG(ZIR_INST_EXP2) \
TAG(ZIR_INST_LOG) \
TAG(ZIR_INST_LOG2) \
TAG(ZIR_INST_LOG10) \
TAG(ZIR_INST_ABS) \
TAG(ZIR_INST_FLOOR) \
TAG(ZIR_INST_CEIL) \
TAG(ZIR_INST_TRUNC) \
TAG(ZIR_INST_ROUND) \
TAG(ZIR_INST_TAG_NAME) \
TAG(ZIR_INST_TYPE_NAME) \
TAG(ZIR_INST_FRAME_TYPE) \
TAG(ZIR_INST_INT_FROM_FLOAT) \
TAG(ZIR_INST_FLOAT_FROM_INT) \
TAG(ZIR_INST_PTR_FROM_INT) \
TAG(ZIR_INST_ENUM_FROM_INT) \
TAG(ZIR_INST_FLOAT_CAST) \
TAG(ZIR_INST_INT_CAST) \
TAG(ZIR_INST_PTR_CAST) \
TAG(ZIR_INST_TRUNCATE) \
TAG(ZIR_INST_HAS_DECL) \
TAG(ZIR_INST_HAS_FIELD) \
TAG(ZIR_INST_CLZ) \
TAG(ZIR_INST_CTZ) \
TAG(ZIR_INST_POP_COUNT) \
TAG(ZIR_INST_BYTE_SWAP) \
TAG(ZIR_INST_BIT_REVERSE) \
TAG(ZIR_INST_BIT_OFFSET_OF) \
TAG(ZIR_INST_OFFSET_OF) \
TAG(ZIR_INST_SPLAT) \
TAG(ZIR_INST_REDUCE) \
TAG(ZIR_INST_SHUFFLE) \
TAG(ZIR_INST_ATOMIC_LOAD) \
TAG(ZIR_INST_ATOMIC_RMW) \
TAG(ZIR_INST_ATOMIC_STORE) \
TAG(ZIR_INST_MUL_ADD) \
TAG(ZIR_INST_MEMCPY) \
TAG(ZIR_INST_MEMMOVE) \
TAG(ZIR_INST_MEMSET) \
TAG(ZIR_INST_MIN) \
TAG(ZIR_INST_MAX) \
TAG(ZIR_INST_C_IMPORT) \
TAG(ZIR_INST_ALLOC) \
TAG(ZIR_INST_ALLOC_MUT) \
TAG(ZIR_INST_ALLOC_COMPTIME_MUT) \
TAG(ZIR_INST_ALLOC_INFERRED) \
TAG(ZIR_INST_ALLOC_INFERRED_MUT) \
TAG(ZIR_INST_ALLOC_INFERRED_COMPTIME) \
TAG(ZIR_INST_ALLOC_INFERRED_COMPTIME_MUT) \
TAG(ZIR_INST_RESOLVE_INFERRED_ALLOC) \
TAG(ZIR_INST_MAKE_PTR_CONST) \
TAG(ZIR_INST_RESUME) \
TAG(ZIR_INST_DEFER) \
TAG(ZIR_INST_DEFER_ERR_CODE) \
TAG(ZIR_INST_SAVE_ERR_RET_INDEX) \
TAG(ZIR_INST_RESTORE_ERR_RET_INDEX_UNCONDITIONAL) \
TAG(ZIR_INST_RESTORE_ERR_RET_INDEX_FN_ENTRY) \
TAG(ZIR_INST_EXTENDED)
#define ZIR_GENERATE_ENUM(e) e,
typedef enum { ZIR_INST_FOREACH_TAG(ZIR_GENERATE_ENUM) } ZirInstTag;
// --- ZIR extended opcodes (uint16_t) ---
// Matches Zir.Inst.Extended enum order from Zir.zig.
#define ZIR_EXT_FOREACH_TAG(TAG) \
TAG(ZIR_EXT_STRUCT_DECL) \
TAG(ZIR_EXT_ENUM_DECL) \
TAG(ZIR_EXT_UNION_DECL) \
TAG(ZIR_EXT_OPAQUE_DECL) \
TAG(ZIR_EXT_TUPLE_DECL) \
TAG(ZIR_EXT_THIS) \
TAG(ZIR_EXT_RET_ADDR) \
TAG(ZIR_EXT_BUILTIN_SRC) \
TAG(ZIR_EXT_ERROR_RETURN_TRACE) \
TAG(ZIR_EXT_FRAME) \
TAG(ZIR_EXT_FRAME_ADDRESS) \
TAG(ZIR_EXT_ALLOC) \
TAG(ZIR_EXT_BUILTIN_EXTERN) \
TAG(ZIR_EXT_ASM) \
TAG(ZIR_EXT_ASM_EXPR) \
TAG(ZIR_EXT_COMPILE_LOG) \
TAG(ZIR_EXT_TYPEOF_PEER) \
TAG(ZIR_EXT_MIN_MULTI) \
TAG(ZIR_EXT_MAX_MULTI) \
TAG(ZIR_EXT_ADD_WITH_OVERFLOW) \
TAG(ZIR_EXT_SUB_WITH_OVERFLOW) \
TAG(ZIR_EXT_MUL_WITH_OVERFLOW) \
TAG(ZIR_EXT_SHL_WITH_OVERFLOW) \
TAG(ZIR_EXT_C_UNDEF) \
TAG(ZIR_EXT_C_INCLUDE) \
TAG(ZIR_EXT_C_DEFINE) \
TAG(ZIR_EXT_WASM_MEMORY_SIZE) \
TAG(ZIR_EXT_WASM_MEMORY_GROW) \
TAG(ZIR_EXT_PREFETCH) \
TAG(ZIR_EXT_SET_FLOAT_MODE) \
TAG(ZIR_EXT_ERROR_CAST) \
TAG(ZIR_EXT_BREAKPOINT) \
TAG(ZIR_EXT_DISABLE_INSTRUMENTATION) \
TAG(ZIR_EXT_DISABLE_INTRINSICS) \
TAG(ZIR_EXT_SELECT) \
TAG(ZIR_EXT_INT_FROM_ERROR) \
TAG(ZIR_EXT_ERROR_FROM_INT) \
TAG(ZIR_EXT_REIFY) \
TAG(ZIR_EXT_CMPXCHG) \
TAG(ZIR_EXT_C_VA_ARG) \
TAG(ZIR_EXT_C_VA_COPY) \
TAG(ZIR_EXT_C_VA_END) \
TAG(ZIR_EXT_C_VA_START) \
TAG(ZIR_EXT_PTR_CAST_FULL) \
TAG(ZIR_EXT_PTR_CAST_NO_DEST) \
TAG(ZIR_EXT_WORK_ITEM_ID) \
TAG(ZIR_EXT_WORK_GROUP_SIZE) \
TAG(ZIR_EXT_WORK_GROUP_ID) \
TAG(ZIR_EXT_IN_COMPTIME) \
TAG(ZIR_EXT_RESTORE_ERR_RET_INDEX) \
TAG(ZIR_EXT_CLOSURE_GET) \
TAG(ZIR_EXT_VALUE_PLACEHOLDER) \
TAG(ZIR_EXT_FIELD_PARENT_PTR) \
TAG(ZIR_EXT_BUILTIN_VALUE) \
TAG(ZIR_EXT_BRANCH_HINT) \
TAG(ZIR_EXT_INPLACE_ARITH_RESULT_TY) \
TAG(ZIR_EXT_DBG_EMPTY_STMT) \
TAG(ZIR_EXT_ASTGEN_ERROR)
#define ZIR_EXT_GENERATE_ENUM(e) e,
typedef enum { ZIR_EXT_FOREACH_TAG(ZIR_EXT_GENERATE_ENUM) } ZirInstExtended;
// --- ZIR instruction data (8-byte union) ---
// Matches Zir.Inst.Data union from Zir.zig.
typedef uint32_t ZirInstIndex;
typedef uint32_t ZirInstRef;
typedef union {
struct {
uint16_t opcode;
uint16_t small;
uint32_t operand;
} extended;
struct {
int32_t src_node;
ZirInstRef operand;
} un_node;
struct {
int32_t src_tok;
ZirInstRef operand;
} un_tok;
struct {
int32_t src_node;
uint32_t payload_index;
} pl_node;
struct {
int32_t src_tok;
uint32_t payload_index;
} pl_tok;
struct {
ZirInstRef lhs;
ZirInstRef rhs;
} bin;
struct {
uint32_t start;
uint32_t len;
} str;
struct {
uint32_t start;
int32_t src_tok;
} str_tok;
int32_t tok;
int32_t node;
uint64_t int_val;
double float_val;
struct {
uint8_t flags;
uint8_t size;
uint16_t _pad;
uint32_t payload_index;
} ptr_type;
struct {
int32_t src_node;
uint16_t bit_count;
uint8_t signedness;
uint8_t _pad;
} int_type;
struct {
int32_t src_node;
uint32_t _pad;
} unreachable_data;
struct {
ZirInstRef operand;
uint32_t payload_index;
} break_data;
struct {
uint32_t line;
uint32_t column;
} dbg_stmt;
struct {
int32_t src_node;
ZirInstIndex inst;
} inst_node;
struct {
uint32_t str;
ZirInstRef operand;
} str_op;
struct {
uint32_t index;
uint32_t len;
} defer_data;
struct {
ZirInstRef err_code;
uint32_t payload_index;
} defer_err_code;
struct {
ZirInstRef operand;
uint32_t _pad;
} save_err_ret_index;
struct {
ZirInstRef operand;
uint32_t idx;
} elem_val_imm;
struct {
uint32_t src_node;
uint32_t payload_index;
} declaration;
} ZirInstData;
// --- ZIR built-in refs ---
// Matches Zir.Inst.Ref enum from Zir.zig.
// Values below REF_START_INDEX are InternPool indices.
#define ZIR_REF_START_INDEX 124
#define ZIR_REF_NONE UINT32_MAX
#define ZIR_MAIN_STRUCT_INST 0
// Zir.Inst.Ref enum values (matching Zig enum order in Zir.zig).
// Types (0-103).
#define ZIR_REF_U1_TYPE 2
#define ZIR_REF_U8_TYPE 3
#define ZIR_REF_I8_TYPE 4
#define ZIR_REF_U16_TYPE 5
#define ZIR_REF_I16_TYPE 6
#define ZIR_REF_U29_TYPE 7
#define ZIR_REF_U32_TYPE 8
#define ZIR_REF_I32_TYPE 9
#define ZIR_REF_U64_TYPE 10
#define ZIR_REF_I64_TYPE 11
#define ZIR_REF_U128_TYPE 13
#define ZIR_REF_I128_TYPE 14
#define ZIR_REF_USIZE_TYPE 16
#define ZIR_REF_ISIZE_TYPE 17
#define ZIR_REF_C_CHAR_TYPE 18
#define ZIR_REF_C_SHORT_TYPE 19
#define ZIR_REF_C_USHORT_TYPE 20
#define ZIR_REF_C_INT_TYPE 21
#define ZIR_REF_C_UINT_TYPE 22
#define ZIR_REF_C_LONG_TYPE 23
#define ZIR_REF_C_ULONG_TYPE 24
#define ZIR_REF_C_LONGLONG_TYPE 25
#define ZIR_REF_C_ULONGLONG_TYPE 26
#define ZIR_REF_C_LONGDOUBLE_TYPE 27
#define ZIR_REF_F16_TYPE 28
#define ZIR_REF_F32_TYPE 29
#define ZIR_REF_F64_TYPE 30
#define ZIR_REF_F80_TYPE 31
#define ZIR_REF_F128_TYPE 32
#define ZIR_REF_ANYOPAQUE_TYPE 33
#define ZIR_REF_BOOL_TYPE 34
#define ZIR_REF_VOID_TYPE 35
#define ZIR_REF_TYPE_TYPE 36
#define ZIR_REF_ANYERROR_TYPE 37
#define ZIR_REF_COMPTIME_INT_TYPE 38
#define ZIR_REF_COMPTIME_FLOAT_TYPE 39
#define ZIR_REF_NORETURN_TYPE 40
#define ZIR_REF_ANYFRAME_TYPE 41
#define ZIR_REF_NULL_TYPE 42
#define ZIR_REF_UNDEFINED_TYPE 43
#define ZIR_REF_ENUM_LITERAL_TYPE 44
#define ZIR_REF_PTR_USIZE_TYPE 45
#define ZIR_REF_PTR_CONST_COMPTIME_INT_TYPE 46
#define ZIR_REF_MANYPTR_U8_TYPE 47
#define ZIR_REF_MANYPTR_CONST_U8_TYPE 48
#define ZIR_REF_MANYPTR_CONST_U8_SENTINEL_0_TYPE 49
#define ZIR_REF_SLICE_CONST_U8_TYPE 50
#define ZIR_REF_SLICE_CONST_U8_SENTINEL_0_TYPE 51
#define ZIR_REF_ANYERROR_VOID_ERROR_UNION_TYPE 100
#define ZIR_REF_GENERIC_POISON_TYPE 102
#define ZIR_REF_EMPTY_TUPLE_TYPE 103
// Values (104-123).
#define ZIR_REF_UNDEF 104
#define ZIR_REF_UNDEF_BOOL 105
#define ZIR_REF_UNDEF_USIZE 106
#define ZIR_REF_UNDEF_U1 107
#define ZIR_REF_ZERO 108
#define ZIR_REF_ZERO_USIZE 109
#define ZIR_REF_ZERO_U1 110
#define ZIR_REF_ZERO_U8 111
#define ZIR_REF_ONE 112
#define ZIR_REF_ONE_USIZE 113
#define ZIR_REF_ONE_U1 114
#define ZIR_REF_ONE_U8 115
#define ZIR_REF_FOUR_U8 116
#define ZIR_REF_NEGATIVE_ONE 117
#define ZIR_REF_VOID_VALUE 118
#define ZIR_REF_UNREACHABLE_VALUE 119
#define ZIR_REF_NULL_VALUE 120
#define ZIR_REF_BOOL_TRUE 121
#define ZIR_REF_BOOL_FALSE 122
#define ZIR_REF_EMPTY_TUPLE 123
// Ast.Node.OptionalOffset.none = maxInt(i32).
#define AST_NODE_OFFSET_NONE ((int32_t)0x7FFFFFFF)
// --- Extra indices reserved at the start of extra[] ---
// Matches Zir.ExtraIndex enum from Zir.zig.
#define ZIR_EXTRA_COMPILE_ERRORS 0
#define ZIR_EXTRA_IMPORTS 1
#define ZIR_EXTRA_RESERVED_COUNT 2
// --- Zir output structure ---
typedef struct {
uint32_t inst_len;
uint32_t inst_cap;
ZirInstTag* inst_tags;
ZirInstData* inst_datas;
uint32_t extra_len;
uint32_t extra_cap;
uint32_t* extra;
uint32_t string_bytes_len;
uint32_t string_bytes_cap;
uint8_t* string_bytes;
bool has_compile_errors;
} Zir;
void zirDeinit(Zir* zir);
#endif