Port module-level analysis infrastructure from upstream Zig to create
InternPool entries that match the Zig compiler's output. This is the
first step toward closing the IP gap for neghf2.zig (corpus test 4).
- Add Nav struct to intern_pool.h with ipCreateDeclNav/ipGetNav/
ipResetNavs/ipNavCount management functions
- Add IP_KEY_PTR_NAV to InternPool (hash, equality, typeOf)
- Add SemaNamespace struct to sema.h for declaration grouping
- Port createFileRootStruct, scanNamespace from PerThread.zig
- Port ensureFileAnalyzed for recursive import resolution
- Add resolveModuleDeclImports: creates type_struct + type_pointer +
ptr_nav entries for import declarations, matching upstream order
- Add internPtrConst and internNavPtr helpers (from analyzeNavRefInner)
The C sema now creates entries [124-129] matching the Zig compiler:
type_struct(neghf2), type_struct(common), type_pointer(*const type),
ptr_nav(common), type_struct(std), ptr_nav(std). The remaining ~870
entries will be added in subsequent commits.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
zig0 doesn't use any C11-specific features. Lowering to C99
enables bootstrapping on platforms with only C99 compilers,
such as OpenBSD on exotic architectures (GCC 4.2.1).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace GNU statement expressions ({...}) in common.h with a static
inline function and do...while(0) macros. Expand case range expressions
(case 'a' ... 'z') in tokenizer.c to individual case labels. Replace
empty initializer braces {} with {0} in parser.c. Add a dummy member
to the empty struct in ast.h. Add -pedantic to zig0_cflags in build.zig
to prevent future regressions.
zig0 now compiles with any C11-conforming compiler, not just those
supporting GNU extensions. This enables bootstrapping with MSVC,
cproc, and other strict C11 compilers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add two points to the IP gap constraints section:
- Port functions mechanically, don't analyze individual entries first
- Time-box investigation to ~10 minutes before coding
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add verbose_intern_pool.c/h for dumping the C sema's InternPool
entries. Integrate as --verbose-intern-pool flag in zig0, mirroring
the Zig compiler's flag.
Fix InternPool.zig dump crash on locals with zero-capacity items
(skip empty locals in dumpStatsFallible and dumpAllFallible).
Update CLAUDE.md with IP debugging tools documentation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After creating the function type IP entry, also create a
pointer-to-function type (*const fn(...) ...) matching what the Zig
compiler creates when taking the address of a function for @export.
For neghf2.zig (num_passing=4), gap shrinks from 861 to 860.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Before analyzing an exported function's body, create a type_function
IP entry matching what the Zig compiler's ensureNavValUpToDate
creates when resolving the function declaration during @export
processing.
For neghf2.zig (num_passing=4), gap shrinks from 862 to 861.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove scanZirImportsRecursive which eagerly scanned all imports in
each loaded module, creating 255 struct types vs the Zig compiler's
108. The eager approach creates entries for modules never accessed
during the Zig compiler's lazy analysis, which would cause IP index
overshoot when other entry types are added.
Keep the demand-driven approach: struct types are created via
DECL_VAL (when imports are first referenced during analysis) and
loadImportZirFromPath (when modules are loaded for cross-module
calls). Currently creates 2 struct types (root + common.zig) for
neghf2.zig; gap = 862.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When DECL_VAL encounters a declaration whose value is an @import,
create a struct type IP entry for the imported module. Additionally,
scan the imported module's ZIR for its own imports and recursively
create struct types for those too. This matches the Zig compiler's
ensureFileAnalyzed → semaFile → createFileRootStruct → scanNamespace
sequence, where importing a file triggers analysis of that file which
discovers further imports.
The struct type creation is triggered lazily from the DECL_VAL handler
during analysis (not eagerly upfront), matching the Zig compiler's
demand-driven processing order.
For neghf2.zig (num_passing=4), the IP index gap shrinks from 862 to
607 as ~255 struct type entries are created for transitively imported
modules.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The answer to "how do I proceed?" is always the same: follow what the
upstream Zig compiler does. There is no reason to stop and ask.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three changes to prevent the agent from stopping between iterations:
1. Add bold "Do NOT stop between iterations" notice at the top of
the main loop — each commit is a checkpoint, not a stopping point.
2. Main loop step 6: reinforce "keep looping until blocked or all
corpus tests pass."
3. Module-system sub-loop: step 5 now says "immediately continue to
step 3. Do NOT stop here." Step 6 is renamed "Exit condition"
with explicit criteria (gap zero, num_passing incremented, tests
pass).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Embed the Zig compiler's IP items count (func_ip) in the .air binary
format so that test mismatch errors show [zig_ip_base=N], eliminating
the need for temporary debug prints in src/Zcu/PerThread.zig.
Add lazy module-level struct type creation in the C sema: each imported
module gets a type_struct IP entry when first loaded via
loadImportZirFromPath, matching the Zig compiler's demand-driven
ensureFileAnalyzed → createFileRootStruct sequence. The root module's
struct type is created at the start of semaAnalyze.
For neghf2.zig (num_passing=4), the IP index gap shrinks from 864 to
862 (root struct + common.zig struct created lazily during cross-module
call resolution).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Document the workflow for closing the IP entry gap between the Zig
compiler and C sema. Starting with neghf2.zig, corpus tests require
the C sema to create ~878 module-level IP entries (struct types,
ptr_nav, enum types, etc.) matching the Zig compiler's output.
The workflow describes how to dump the Zig compiler's IP state, compare
with the C sema, port module-system functions (createFileRootStruct,
scanNamespace, etc.), and iterate.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The upstream Zig sema is lazy — it only evaluates const declarations
when first accessed. The C sema was eagerly evaluating ALL non-function
declarations, including ones never accessed during analysis (e.g.
`pub const panic = common.panic` in neghf2.zig).
Only evaluate comptime declarations (id == 3) and function declarations
(detected by ZIR_INST_FUNC / FUNC_FANCY). Skip all other const/var
declarations, matching upstream behavior.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
README.md keeps project overview, testing commands, debugging tips, and
float handling. CLAUDE.md gets the full Sema porting loop, decomposition
strategy, AIR exceptions, cleanup policy, and general rules.
Also fixes: stages.zig -> corpus.zig, sema_test.zig -> sema_tests/ +
num_sema_passing, nether -> neither.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
air_gen: replace per-file symlink workaround with two-pass compilation.
Pass 1 compiles lib/std/std.zig as root with use_root_as_std=true
(one compilation, all lib/std/ functions). Pass 2 compiles non-lib/std/
files standalone. Symlink workaround eliminated entirely.
build.zig: pass all corpus.files (not 0..num_passing) to air_gen,
skipping lib/std/ files. Bumping num_passing no longer invalidates
the air_gen cache.
air_data.zig: route lib/std/ paths to the combined std.zig.air file.
sema_test.zig: switch to unidirectional comparison (C→Zig only) and
exact FQN matching. Remove stripModulePrefix, bare-name fallback, and
unused cNameSpan. Add pathToModulePrefix and pathStem helpers.
sema.h/sema.c: add root_fqn, module_prefix, and is_test fields to
Sema struct. Function names use "{root_fqn}[.{prefix}].{name}" format
to match Zig's FQN convention.
stages_test.zig: set root_fqn and module_prefix on C sema so FQNs
match Zig's naming. Remove symlink workaround — C sema uses real
paths directly. Set is_test=false to match air_gen.
corpus.zig: remove lib/init/src/main.zig (template file with
@import(".NAME") that cannot compile standalone).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove all normalization layers from the AIR comparison:
- canonicalizeRef: was renumbering IP refs sequentially by first
appearance to hide raw index differences
- stripAnonSuffix: was stripping __anon_NNN suffix from generic
function names
- canonicalizeExtraRefs: was canonicalizing refs in extra payloads
The C and Zig InternPools now produce identical indices for 431 of
433 tests. Two tests still fail due to IP index gaps:
- return_integer.zig: value 42 at IP 0xd8 (Zig) vs 0x7d (C)
- neghf2.zig: value at IP 0x3e1 (Zig) vs 0x81 (C)
These gaps come from upstream interning intermediate values during
module-level analysis (struct declarations, function types, export
validation) that the C sema doesn't yet replicate.
Also uses IP index (not ZIR inst) for __anon_ suffix in generic
function names, matching upstream's finishFuncInstance.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change the generic monomorphization naming from func_inst (ZIR
instruction index) to func_val_ip (InternPool index), matching
upstream's finishFuncInstance which uses @intFromEnum(func_index).
Pass the func_val_ip through analyzeFuncBodyAndRecord and store it
in SemaFuncAir.func_ip. The anon suffix now uses the same numbering
scheme as upstream, though the actual numbers still differ because
the C and Zig InternPools intern values in different order.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three fixes to match upstream Sema.zig behavior for addhf3:
1. ComptimeReturn: don't rollback air_inst_len at all (upstream keeps
all body instructions as dead instructions in the AIR array).
This preserves nested dead blocks from comptime inline calls.
2. dbg_arg_inline: skip emission when the declared param type is
comptime-only (comptime_int, comptime_float, enum_literal).
Ported from addDbgVar's val_ty.comptimeOnlySema() check.
The C sema doesn't coerce comptime IP values to the param type,
so we check the ZIR param type body directly.
3. Param type body scanning: always register calls in the global
seen_calls set (even when the dead block is skipped due to
type_fn_created). This ensures that after type_fn_created is
reset by analyzeFuncBodyAndRecord, subsequent calls still dedup.
Enables num_passing = 9 (addhf3) and adds comptime_arg_dbg.zig unit test.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
In upstream Sema.zig:7872, when an inline call returns at comptime
(ComptimeReturn), the pre-allocated block instruction is NOT rolled
back — it remains as a dead block in the AIR. The C sema was
incorrectly discarding it by rolling back air_inst_len to before the
block.
Fix: roll back to block_inst_idx+1 (keep dead block, discard body
instructions). This produces dead blocks for comptime inline calls
in comptime context (e.g., floatExponentMax, mantissaOne called
from within nan(T)'s comptime-evaluated body).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
In upstream Sema.zig:7872, when an inline call returns at comptime
(ComptimeReturn), the pre-allocated block instruction is NOT rolled
back — it remains as a dead block in the AIR. The C sema was
incorrectly discarding it by rolling back air_inst_len to before the
block.
Fix: roll back to block_inst_idx+1 (keep dead block, discard body
instructions). This produces dead blocks for comptime inline calls
in comptime context (e.g., floatExponentMax, mantissaOne called
from within nan(T)'s comptime-evaluated body).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a global seen_call_names/seen_call_nargs set to Sema that persists
across analyzeFuncBodyAndRecord calls (not reset per-function). This
matches upstream Zig's InternPool memoization which is global: when a
type-returning function (Int, Log2Int, etc.) is called in one function's
body and later in another function's body, upstream memoizes the result
and skips the dead block on the second call.
The set is checked at three points:
- Unresolved type function path (callee not found, known type name)
- Param type body scanning (generic param type resolution)
- Resolved type function path (returns_type handler)
After creating a dead block, the call is registered in the set so
subsequent calls with the same callee name and arg count skip it.
Also add two new sema unit tests:
- cross_fn_memoized_call.zig: two exports calling same inline helper
- nested_inline_dead_blocks.zig: nested comptime inline calls
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two fixes toward enabling addhf3.zig corpus test:
1. dbg_arg_inline: upstream emits dbg_arg_inline for all inline
params whose resolved type is not comptime-only. The C sema was
skipping all comptime-declared params (ZIR_INST_PARAM_COMPTIME).
Now it checks whether the argument value is a type (param's type
is `type`) and only skips those, matching upstream behavior.
E.g. `comptime bits: u16` now gets dbg_arg_inline.
2. Log2Int dead blocks: when Log2Int is called from a comptime
sub-block whose parent is runtime (e.g. @as(Log2Int(T), ...)),
create 2 dead blocks (1 for Log2Int + 1 for nested Int call).
This fixes normalize__anon_1028 which was missing 2 instructions.
Also lifts skip_block out of inner scope in the resolved-callee path
for visibility by the Log2Int handler, and resolves the TODO about
Log2Int comptime context dead blocks.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The skip_first_int fix in 855d1c59 was insufficient: normalize's AIR
still mismatches by 4 instructions. The root cause is that the C sema
needs broader handling of comptime-only return types (comptime_int, not
just type) and proper memoization of inline comptime calls across
function boundaries. Revert to 8 passing corpus files until the dead
block generation for comptime function calls matches upstream.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
In upstream Zig, finishFuncInstance evaluates param type bodies and
memoizes type function calls (e.g. Int) in InternPool. When the
function body contains an identical call, it hits the memo and skips
dead block creation. The C port's shortcut (call_arg_types) skips
type body evaluation, so the memo is never set.
Add skip_first_int flag: set by analyzeFuncBodyAndRecord when a generic
param type body contains both ptr_type and a call instruction (the
*Int(...) pattern). Consumed once by site2's dead block creation.
Also fix cppcheck lint: const-qualify call_arg_types parameter.
normalize__anon_1028 still off by -2 (missing Log2Int dead blocks
from comptime sub-expressions) — to be addressed separately.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Cross-module generic function body analysis with findStringInZirBytes
for name lookup across ZIR modules.
- Two-phase parameter mapping: comptime params mapped before return type
resolution, then runtime params create ARG instructions.
- call_arg_types: pass call-site types directly for generic parameters
to avoid evaluating cross-module ZIR type bodies.
- AIR rollback on comptime-returned inline calls (ported from Sema.zig
air_instructions.shrinkRetainingCapacity).
- Add sema tests: generic_fn_with_clz and generic_fn_with_shl_assign.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cross-module function bodies belong to the imported module's AIR output,
not the current file's. Analyzing them in the current context produces
spurious function entries (e.g. ve_endian from native_endian resolution)
that don't appear in the precomputed Zig AIR data.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement generic function body analysis for runtime calls to functions
with comptime parameters. When a generic function like normalize(comptime
T: type, p: *T) is called at runtime, the C sema now produces a
monomorphized function entry (e.g. normalize__anon_42) matching upstream
Zig's finishFuncInstance behavior.
Key changes:
- analyzeFuncBodyAndRecord accepts optional call_args for comptime param
mapping: comptime params get mapped to resolved values from the call
site instead of generating ARG instructions
- Runtime params use original param index (not renumbered) to match Zig
- Deduplication handles __anon_NNN suffix for repeated generic calls
- sema_test.zig strips __anon_NNN suffixes for name comparison since IP
indices differ between C and Zig compilers
Enables sema tests 82-88 (num_sema_passing: 82 → 89, all tests pass).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add analyzeFuncBodyAndRecord helper to analyze non-inline function
callees in a fresh AIR context and record them to func_air_list.
Simplify zirFunc to use parseFuncZir + analyzeFuncBodyAndRecord.
In zirCall, generic functions are correctly treated as runtime calls
(not auto-inlined), matching upstream Sema.zig:7482 behavior where
is_inline_call = block.isComptime() or inline_requested.
Also includes pre-existing uncommitted changes:
- SemaBlock inline instructions array (avoid heap for small blocks)
- StructFieldInfo.fields[].name as fixed-size char[64]
num_sema_passing: 78 -> 82
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Gap 1: function count check is now a hard error (was warning),
with diagnostic listing functions missing from C output
- Gap 3: canonicalizeExtraRefs for tags with Refs in extra payload
(StructField, Bin, UnionInit, VectorCmp, Cmpxchg, AtomicRmw,
TryPtr, FieldParentPtr, ShuffleOne/Two)
- Gap 5: detect ambiguous name matches in precomputedFindByName
- Reduce num_passing 66→8 (addhf3.zig function count mismatch)
- Add num_sema_passing=78 (call_inside_runtime_conditional and
6 similar tests have function count mismatches)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add bidirectional function count check (warns when Zig produces
functions that C does not, surfacing lazy-analysis gaps)
- Replace magic number 51 with c.AIR_INST_BLOCK for robustness
- Add ~60 missing tags to airDataRefSlots with correct Ref slot
mappings (bin_op, ty_op, ty_pl, pl_op, br, reduce, prefetch,
atomic_load, vector_store_elem, ty_nav variants)
- Add SET_ERR_RETURN_TRACE (un_op) and ERR_RETURN_TRACE (ty) to
airInstNumSlots for correct slot counts
- Add TODO for extra-array Ref canonicalization (Gap 3)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Refactor the monolithic analyzeBodyInner switch into named functions
matching the upstream Zig Sema.zig naming convention (zirRetImplicit,
zirRetNode, zirFloat, zirBlock, etc.). The switch body now serves as
a clean dispatch table.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PrecomputedFunc now stores raw [*]const u8 byte pointers instead of c.Air,
eliminating per-function heap allocations and memcpy in parsePrecomputedAir.
airCompareOne takes two PrecomputedFunc values; C-sema output is wrapped via
precomputedFromCAir.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two categories of use-after-free in cross-module import handling:
1. Struct field names stored as raw pointers into ZIR string_bytes
became dangling after zirDeinit freed the imported ZIR. Fixed by
dupString() to create owned copies, freed in semaDeinit.
2. computeSourceDir called with sub_import/fn_import pointers after
zirDeinit freed the ZIR containing those strings. Fixed by computing
the source dir before freeing the ZIR.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the old approach of linking verbose_air.zig (and the full Zig
compiler internals) into the test binary with a build-time generator
(verbose_air_gen.zig) that pre-computes AIR data for corpus files.
The generator runs as a build step, compiling each corpus file through
the Zig compiler and serializing the resulting AIR to binary files.
It produces air_data.zig and tag_names.zig bridge files that the test
binary imports as anonymous modules. This removes the heavyweight
zig_compile_air extern dependency from the test binary.
Key changes:
- build.zig: add air_gen executable build+run step, anonymous imports
- verbose_air_gen.zig (new): build-time AIR generator with symlink
workaround to avoid lib/std/ module path conflicts
- corpus.zig (new): centralized corpus file list with num_passing
- sema_test.zig: replace zig_compile_air extern with parsePrecomputedAir
- stages_test.zig: use corpus.zig and @import("air_data")
- sema.c: zero dead block data in comptime switch handler so the
dead-block skip rule fires correctly with precomputed data
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>