Introduce `validate_array_init_comptime`, similar to
`validate_struct_init_comptime` introduced in
713d2a9b38.
`zirValidateArrayInit` is improved to detect comptime array literals and
emit AIR accordingly. This code is very similar to the changes
introduced in that same commit for `zirValidateStructInit`.
The C backend needed some improvements to continue passing the same set
of tests:
* `resolveInst` for arrays now will add a local `static const` with the
array value and so then `elem_val` instructions reference that local.
It memoizes accesses using `value_map`, which is changed to use
`Air.Inst.Ref` as the key rather than `Air.Inst.Index`.
* This required a mechanism for writing to a "header" which is lines
that appear at the beginning of a function body, before everything
else.
* dbg_stmt output comments rather than `#line` directives.
TODO comment reproduced here:
We need to re-evaluate whether to emit these or not. If we naively emit
these directives, the output file will report bogus line numbers because
every newline after the #line directive adds one to the line.
We also don't print the filename yet, so the output is strictly unhelpful.
If we wanted to go this route, we would need to go all the way and not output
newlines until the next dbg_stmt occurs.
Perhaps an additional compilation option is in order?
`Value.elemValue` is improved to support `elem_ptr` values.
AIR:
* `array_elem_val` is now allowed to be used with a vector as the array
type.
* New instructions: splat, vector_init
AstGen:
* The splat ZIR instruction uses coerced_ty for the ResultLoc, avoiding
an unnecessary `as` instruction, since the coercion will be performed
in Sema.
* Builtins that accept vectors now ignore the type parameter. Comment
from this commit reproduced here:
The accepted proposal #6835 tells us to remove the type parameter from
these builtins. To stay source-compatible with stage1, we still observe
the parameter here, but we do not encode it into the ZIR. To implement
this proposal in stage2, only AstGen code will need to be changed.
Sema:
* `clz` and `ctz` ZIR instructions are now handled by the same function
which accept AIR tag and comptime eval function pointer to
differentiate.
* `@typeInfo` for vectors is implemented.
* `@splat` is implemented. It takes advantage of `Value.Tag.repeated` 😎
* `elemValue` is implemented for vectors, when the index is a scalar.
Handling a vector index is still TODO.
* Element-wise coercion is implemented for vectors. It could probably
be optimized a bit, but it is at least complete & correct.
* `Type.intInfo` supports vectors, returning int info for the element.
* `Value.ctz` initial implementation. Needs work.
* `Value.eql` is implemented for arrays and vectors.
LLVM backend:
* Implement vector support when lowering `array_elem_val`.
* Implement vector support when lowering `ctz` and `clz`.
* Implement `splat` and `vector_init`.
Add a variant of the `validate_struct_init` ZIR instruction:
`validate_struct_init_comptime` which is the same thing except it
indicates a comptime scope.
Sema code for this instruction now handles default struct field
values and detects when the struct initialization resulted in a
comptime value, replacing the already-emitted AIR instructions
to store each individual field with a single `store` instruction
with a comptime struct value as the operand.
In the case of a comptime scope, there is a simpler path that only
evals the implicit store instructions for default field values, avoiding
the mechanism for detecting comptime values.
This regressed one test case for the wasm backend, but it's just hitting
a different prong of `emitConstant` which currently has "TODO" in there,
so I think it's fine.
The main problem was that the loop body was treated as an expression
that was one of the peer result values of a loop, when in reality the
loop body is noreturn and only the `break` operands are the result
values of loops.
This was solved by introducing an override that prevents rvalue() from
emitting a store to result location instruction for loop bodies.
An orthogonal change also included in this commit is switching
`elem_val` index expressions to using `coerced_ty` and doing the
coercion to `usize` inside `Sema`, resulting in smaller ZIR (since the
cast becomes implied).
I also changed the break operand expression to use `reachableExpr`,
introducing a new compile error for double break.
This makes a few more behavior tests pass for `while` and `for` loops.
Previously, function parameter instructions for function prototypes would be
generated in the parent block. This caused issues in blocks where multiple
prototypes would be generated in, such as the block for struct fields for
example. This change introduces an inline block around every prototype such
that all parameters for a prototype are confined to a unique block.
dd62a6d2e8 short-circuited the logic of
`asmExpr` by emitting ZIR for `@compileError("...")`. This caused false
positive "unreachable code" errors for stage1 when there was an
expression in the asm template.
This commit makes such cases instead go through logic of `asmExpr` like
normal, however the asm template is set to 0. This is then picked up in
Sema (part of stage2, not stage1) and reported as "assembly code must
use string literal syntax".
The end-game for inline assembly is that the syntax is more integrated
with zig, and it will not allow string concatenation for the assembler
code, for the same reasons that Zig does not have a preprocessor.
However, inline assembly in zig right now is lacking for a variety of
use cases (take a look at the open issues having to do with inline
assembly for example), and being able to use comptime expressions to
concatenate text is a workaround that real-world users are exploiting to
get by in the short term.
This commit keeps "assembly code must use string literal syntax" as a
compile error when using stage2, but allows it through when using
stage1.
I expect to revert this commit after making enough improvements to
inline assembly that our real world users' needs are satisfied.
Switch prong values are fetched by index in semantic analysis by prong
offset, but these were computed as capture offset. This means that a switch
where the first prong does not capture and the second does, the switch_capture
zir instruction would be assigned switch_prong 0 instead of 1.
* AstGen: always use `typeof` and never `typeof_elem` on the
`switch_cond`/`switch_cond_ref` instruction because both variants
return a value and not a pointer.
- Delete the `typeof_elem` ZIR instruction since it is no longer
needed.
* Sema: validateUnionInit now recognizes a comptime mutable value and
no longer emits a compile error saying "cannot evaluate constant
expression"
- Still to-do is detecting comptime union values in a function that
is not being executed at compile-time.
- This is still to-do for structs too.
* Sema: when emitting a call AIR instruction, call resolveTypeLayout on
all the parameter types as well as the return type.
* `Type.structFieldOffset` now works for unions in addition to structs.
After a discussion about language specs, this seems like the best way to
go, because it's simpler to reason about both for humans and compilers.
The `bitcast_result_ptr` ZIR instruction is no longer needed.
This commit also implements writing enums, arrays, and vectors to
virtual memory at compile-time.
This unlocked some more of compiler-rt being able to build, which
in turn unlocks saturating arithmetic behavior tests.
There was also a memory leak in the comptime closure system which is now
fixed.
* New AIR instruction: slice, which constructs a slice out of a pointer
and a length.
* AstGen: use `coerced_ty` for start and end expressions, use `none`
for the sentinel, and don't try to load the result of the slice
operation because it returns a by-value result.
* Sema: pointer arithmetic is extracted into analyzePointerArithmetic
and it is used by the implementation of slice.
- Also I implemented comptime pointer addition.
* Sema: extract logic into analyzeSlicePtr, analyzeSliceLen and use them
inside the slice semantic analysis.
- The approach in stage2 is much cleaner than stage1 because it uses
more granular analysis calls for obtaining the slice pointer, doing
arithmetic on it, and checking if the length is comptime-known.
* Sema: use the slice Value Tag for slices when doing coercion from
pointer-to-array.
* LLVM backend: detect when emitting a GEP instruction into a
pointer-to-array and add the extra index that is required.
* Type: ptrAlignment for c_void returns 0.
* Implement Value.hash and Value.eql for slices.
* Remove accidentally duplicated behavior test.
* AstGen: Move `refToIndex` and `indexToRef` to Zir
* ZIR: the switch_block_*_* instruction tags are collapsed into one
switch_block tag which uses 4 bits for flags, and reduces the
scalar_cases_len field from 32 to 28 bits.
This freed up more ZIR tags, 2 of which are now used for
`switch_cond` and `switch_cond_ref` for producing the switch
condition value. For example, for union values it returns the
corresponding enum value.
* switching with multiple cases and ranges is not yet supported because
I want to change the ZIR encoding to store index pointers into the
extra array rather than storing prong indexes. This will avoid O(N^2)
iteration over prongs.
* AstGen now adds a `switch_cond` on the operand and then passes the
result of that to the `switch_block` instruction.
* Sema: partially implement `switch_capture_*` instructions.
* Sema: `unionToTag` notices if the enum type has only one possible value.
The remaining uses of this result location were causing a bunch of errors
problems where the pointers returned from rvalue and lvalue expressions
would be confused, allowing for extra pointers on rvalue expressions.
For example:
```zig
const X = struct {a: i32};
var x: X = .{.a = 1};
var ptr = &x;
_ = x.a;
```
In the last line, the lookup of x with result location .none_or_ref would
return a double pointer (**X). This would be dereferenced one, after which
a relative pointer to `a` would be fetched and derefenced to get the final
result.
However, this also allows us to manually construct a double pointer, and
fetch the field of the inner type of that:
```zig
_ = &(&(x)).a;
```
This problem also manifests itself with element access. There are two obvious
ways to fix the problem, both of which include replacing the usage of
.none_or_ref for field- and element accesses with something which
deterministically produce either a pointer or value: either result location
.ref or .none. In the former case, this would be paired with .elem_ptr, and
in the latter case with .elem_val.
Note that the stage 1 compiler does not have this problem, because there is
no equivalent of .elem_val and .field_val. In this way it is equivalent to
using the result location .ref for field- and element accesses.
In this case i have used .none, as this matches language behaviour more
closely.
* ZIR: the `array_type_sentinel` now has a source node attached to it
for proper error reporting.
* Refactor: move `Module.arrayType` to `Type.array`
* Value: the `bytes` and `array` tags now include the sentinel, if the
type has one. This simplifies comptime evaluation logic.
* Sema: fix `zirStructInitEmpty` to properly handle when the type is
void or a sentinel-terminated array. This handles the syntax `void{}`
and `[0:X]T{}`.
* Sema: fix the logic for reporting "cannot store runtime value in
compile time variable" as well as for emitting a runtime store when a
pointer value is comptime known but it is a global variable.
* Sema: implement elemVal for double pointer to array. This can happen
with this code for example: `var a: *[1]u8 = undefined; _ = a[0];`
* Sema: Rework the `storePtrVal` function to properly handle nested
structs and arrays.
- Also it now handles comptime stores through a bitcasted pointer.
When the pointer element type and the type according to the Decl
don't match, the element value is bitcasted before storage.
isZigPrimitiveType had a bug where it checked the integer names (e.g.
u32) before primitives, leading it to incorrectly return `false` for
`undefined` which starts with `u`.
Related: #9928