The fstat,lstat,stat,mknod stubs used to build older (before v2.33)
glibc versions depend on the weak_hidden_alias macro. It was removed
from the glibc libc-symbols header, so patch it back in for the older
builds.
The scope of libc_nonshared.a was greatly changed in glibc 2.33 and
2.34, but only the change from 2.34 was reflected so far. Glibc 2.33
finally switched to versioned symbols for stat functions, meaning that
libc_nonshared.a no longer contains them since 2.33. Relevant files were
therefore reverted to 2.32 versions and renamed accordingly.
This commit also removes errno.c, which was probably added to
libc_nonshared.a based on a wrong assumption that glibc/include/errno.h
requires glibc/csu/errno.c. In reality errno.h should refer to
__libc_errno (not to be confused with the public __errno_location),
which should be imported from libc.so. The inclusion of errno.c resulted
in wrong compile options as well; this commit fixes them as well.
These are tripping on 32-bit x86 but are intended to prevent glibc
itself from being built with a bad configuration. Zig is only using this
file to create libc_nonshared.a, so it's not relevant.
This is the only place in all of glibc that this macro is referenced.
What is it doing? Only preventing fstatat.c from knowing the type
definition of `__time64_t`, apparently.
Fixes compilation of fstatat.c on 32-bit x86.
I could have just included the file from upstream glibc, but it was too
silly so I just inlined it. This patch could be dropped in a future
glibc update if desired. If omitted it will cause easily solvable
C compilation failures building glibc nonshared.
- `fcntl` was renamed to `fcntl64` in glibc 2.28 (see #9485)
- `res_{,n}{search,query,querydomain}` became "their own" symbols since
glibc 2.34: they were prefixed with `__` before.
This PR makes it possible to use `fcntl` with glibc 2.27 or older and
the `res_*` functions with glibc 2.33 or older.
These patches will become redundant with universal-headers and can be
dropped. But we have to do with what we have now.
- `fcntl` was renamed to `fcntl64` in glibc 2.28 (see #9485)
- `res_{,n}{search,query,querydomain}` became "their own" symbols since
glibc 2.34: they were prefixed with `__` before.
This PR makes it possible to use `fcntl` with glibc 2.27 or older and
the `res_*` functions with glibc 2.33 or older.
These patches will become redundant with universal-headers and can be
dropped. But we have to do with what we have now.
This is a patch to glibc features.h which makes
_DYNAMIC_STACK_SIZE_SOURCE undefined unless the version is >= 2.34.
This feature was introduced with glibc 2.34 and without this patch, code
built against these headers but then run on an older glibc will end up
making a call to sysconf() that returns -1 for the value of SIGSTKSZ
and MINSIGSTKSZ.
The actual `zig objcopy` does not accept keeping multiple sections.
If you pass multiple `-j .section` arguments to `zig objcopy`, it will
only respect the last one passed.
Originally I changed `zig objcopy` to accept multiple sections
and then concatenate them instead of returning after outputting the
first section (see emitElf) but I realized concatenating probably doesn't make sense.
Fixes a regression introduced in 67455c5e70. The `errdefer` cannot run since its not possible for an error to occur, and we don't want it to run on the last handle, so we move the closing back down to where it was before 67455c5e70.
Most of the functions related to points on the Edwards25519 curve
check that input points are not in a small-order subgroup.
They don't check that points are on the prime-order subgroup,
because this is expensive, and not always necessary.
However, applications may require such a check in order to
ensure that a public key is valid, and that a secret key counterpart
exists.
Many functions in the public API of libsodium related to arithmetic
over Edwards25519 also do that check unconditionally. This is
expensive, but a good way to catch bugs in protocols and
implementations.
So, add a `rejectUnexpectedSubgroup()` function to achieve this.
The documentation on the edwards25519->curve25519 conversion
function was also updated, in order to explain how to match
libsodium's behavior if necessary.
We use an addition chain to multiply the point by the order of
the prime group.
An alternative we may implement later is Pornin's point halving
technique: https://eprint.iacr.org/2022/1164.pdf
On Windows, the console mode flag `ENABLE_VIRTUAL_TERMINAL_PROCESSING` determines whether or not ANSI escape codes are parsed/acted on. On the newer Windows Terminal, this flag is set by default, but on the older Windows Console, it is not set by default, but *can* be enabled (since Windows 10 RS1 from June 2016).
The new `File.getOrEnableAnsiEscapeSupport` function will get the current status of ANSI escape code support, but will also attempt to enable `ENABLE_VIRTUAL_TERMINAL_PROCESSING` on Windows if necessary which will provide better/more consistent results for things like `std.Progress` and `std.io.tty`.
This type of change was not done previously due to a mistaken assumption (on my part) that the console mode would persist after the run of a program. However, it turns out that the console mode is always reset to the default for each program run in a console session.
* Newer versions of Windows added VT seq support not only in Windows Terminal, but also in the old-fashioned Windows Console (standalone conhost.exe), though not enabled by default.
* Try setting the newer console mode flags provides better experience for Windows Console users.
Co-authored-by: Kexy Biscuit <kexybiscuit@biscuitt.in>
Instead of introducing YES_COLOR, a completely new standard, into the mix
it might make more sense to instead tag along with the CLICOLOR_FORCE env var,
which dates back to at least 2000 with FreeBSD 4.1.1 and which is
supported by tools like CMake.
<https://bixense.com/clicolors/>
The \r\n is necessary to get the progress tree to work properly in the old console when ENABLE_VIRTUAL_TERMINAL_PROCESSING and DISABLE_NEWLINE_AUTO_RETURN are set.
The line_upper_bound_len fix addresses part of #20161
This changes the terminal display to keep the cursor at the top left of
the progress display, so that unlocked stderr writes, perhaps by child
processes, don't get eaten by the clear.
The docs for setting stdio to "inherit" say:
It also means that this step will obtain a global lock to prevent other
steps from running in the meantime.
The implementation of this lock was missing but is now provided by this
commit.
closes#20119
Reduce node_storage_buffer_len from 200 to 83. This makes messages over
the pipe fit in a single packet (4096 bytes). There is now a comptime
assert to ensure this. In practice this is plenty of storage because
typical terminal heights are significantly less than 83 rows.
Handling of split reads is fixed; instead of using a global
`remaining_read_trash_bytes`, the value is stored in the "saved
metadata" for the IPC node.
Saved metadata is split into two arrays so that the "find" operation can
quickly scan over fds for a match, looking at 332 bytes maximum, and
only reading the memory for the other data upon match. More typical
number of bytes read for this operation would be 0 (no child processes),
4 (1 child process), or 64 (16 child processes reporting progress).
Removed an align(4) that was leftover from an older design.
This also includes part of Jacob Young's not-yet-landed patch that
implements `writevNonblock`.
rather than ignoring specifically "zig-cache" and "zig-out". The latter
is not necessarily the install prefix and should not be special.
The former will be handled by renaming zig-cache to .zig-cache.
3a3d2187f9 unintentionally broke some of the Windows console API implementation.
- The 'marker' character was no longer being written at all
- The ANSI escape codes for syncing were being written unconditionally
Clarify the usage of .paths in build.zig.zon. Follow the recommendation
of the comments to explicitly list paths by explicitly listing the paths
in the init project.
* Merge a bunch of related state together into TerminalMode. Windows
sometimes follows the same path as posix via ansi_escape_codes,
sometimes not.
* Use a different thread entry point for Windows API but share the same
entry point on Windows when the terminal is in ansi_escape_codes mode.
* Only clear the terminal when the stderr lock is held.
* Don't try to clear the terminal when nothing has been written yet.
* Don't try to clear the terminal in IPC mode.
* Fix size detection logic bug under error conditions.
7281cc1d839da6e84bb76fadb2c1eafc22a82df7 did not solve the problem
because even when Node.index is none, it still counts as initializing
the global Progress object. Just use a normal zig optional, and all is
good.
The update thread was sometimes reading the special state and then
incorrectly getting 0 for the file descriptor, making it hang since it
tried to read from stdin.
Slightly slower refresh rate. It's still updating very quickly.
Lower the initial delay so that CLI applications feel more responsive.
Even though the application is doing work for the full 500ms until
something is displayed on the screen, it feels nicer to get the progress
earlier.
Introduces `disable_zig_progress` which prevents the build runner from
assigning the child process a progress node.
This is needed for the empty_env test which requires the environment to
be completely empty.
This makes it so that any other threads which are writing to stderr have
a chance to finish before the process terminates. It also clears the
terminal in case any progress has been written to stderr, while still
accomplishing the goal of not waiting until the update thread exits.
when the root progress node has a zero length name, the sub-tree is
flattened one layer, reducing visual noise, as well as bytes written to
the terminal.
Split newline_count into written_newline_count and
accumulated_newline_count. This handle the case when the tryLock() fails
to obtain the lock, because in such case there would not be any newlines
written to the terminal but the system would incorrectly think there
were. Now, written_newline_count is only adjusted when the write() call
succeeds.
Furthermore, write() call failure is handled by exiting the update
thread.
Don't truncate trailing newline. This better handles stray writes to
stderr that are not std.Progress-aware, such as from non-zig child
processes.
This commit also makes `Node.start` and `Node.end` bail out early with a
comptime branch when it is known the target will not be spawning an
update thread.
Switch Node.Parent, Node.Index, and Node.OptionalIndex to be backed by
u8 rather than u16. This works fine since we use 200 as the preallocated
node buffer. This has the nice property that scanning the entire parents
array for allocated nodes fits in 4 cache lines, even if we bumped the
200 up to 254 (leaving room for the two special states).
The thread that reads progress updates from the pipe now handles short
reads by ignoring messages that are sent in multiple reads.
When checking the terminal size, if there is a failure, fall back to a
conservative guess of 80x25 rather than panicking. A debug message is
also emitted which would be displayed only in a debug build.
This accomplishes 2 things simultaneously:
1. Don't trust child process data; if the data is outside the expected
range, ignore the data.
2. If there is too much data to fit in the preallocated buffers, drop
the data.
Instead of making static buffers configurable, let's pick strong
defaults and then use the update thread's stack memory to store the
preallocations. The thread uses a fairly shallow stack so this memory is
otherwise unused. This also makes the data section of the executable
smaller since it runtime allocates the memory when a `std.Progress`
instance is allocated, and in the case that the process is not connected
to a terminal, it never allocates the memory.
* correctly report time spent analyzing function bodies
* print fully qualified decl names
* also have a progress node for codegen
The downside of these changes is that it's a bit flickerey, but the
upside is that it's accurate; you can see what the compiler's doing!
This fix is already in master branch for stdin, stdout, and stderr; this
commit solves the same problem but for the progress pipe.
Both fixes were originally included in one commit on this branch,
however it was split it into two so that master branch could receive the
fix before the progress branch is merged.
It stored some metadata into the canonical node storage data but that is
a race condition because another thread recycles those nodes.
Also, keep the parent name for empty child root node names.
* bump default statically allocated resources
* debug help when multiple instances of std.Progress are initialized
* only handle sigwinch on supported operating systems
* handle when reading from the pipe returns 0 bytes
* avoid printing more lines than rows
This time, we preallocate a fixed set of nodes and have the user-visible
Node only be an index into them. This allows for lock-free management of
the node storage.
Only the parent indexes are stored, and the update thread makes a
serialized copy of the state before trying to compute children lists.
The update thread then walks the tree and outputs an entire tree of
progress rather than only one line.
There is a problem with clearing from the cursor to the end of the
screen when the cursor is at the bottom of the terminal.
New design ideas:
* One global instance, don't try to play nicely with other instances
except via IPC.
* One process owns the terminal and the other processes communicate via
IPC.
* Clear the whole terminal and use multiple lines.
What's implemented so far:
* Query the terminal for size.
* Register a SIGWINCH handler.
* Use a thread for redraws.
To be done:
* IPC
* Handling single threaded targets
* Porting to Windows
* More intelligent display of the progress tree rather than only using
one line.
You don't know if it's possible to run a binary until you try. The build
system already integrates with executors and has the
`skip_foreign_checks` for exactly this use case.
- Used `Self` instead of `*const Self` where appropriate (orignally proposed in #19770)
- Replaced `@intFromPtr` and `@ptrFromInt` with `@ptrCast`, `@alignCast`, and pointer arithmetic where appropriate
With this, the only remaining instance on pointer-int conversion in hash_map.zig is in `HashMapUnmanaged.removeByPtr`, which easily be able to be eliminated once pointer subtraction is supported.
The added comment explains the issue here relatively well. The new
progress API made this bug obvious because it became visibly clear that
certain Compile steps were seemingly "hanging" until other steps
completed. As it turned out, these child processes had raced to spawn,
and hence one had inherited the other's stdio pipes, meaning the `poll`
call in `std.Build.Step.evalZigProcess` was not identifying the child
stdout as closed until an unrelated process terminated.
This reverts commit a7de02e052.
This did not implement the accepted proposal, and I did not sign off
on the changes. I would like a chance to review this, please.
Fixes regression introduced by 5d5e89aa8d
Turns out since landing that PR we haven't run any tests requiring
symlinks or any Apple SDK on a macOS host. Not great.
ArrayList uses `items` slice to store len initialized items, while
PriorityQueue stores `capacity` potentially uninitialized items.
This is a surprising difference in the API that leads to bugs!
https://github.com/tigerbeetle/tigerbeetle/pull/1948
The wrong `size_class` was used when fetching stack traces from empty
buckets. The `size_class` would always be the maximum value after
exhausting the search of active buckets rather than the actual
`size_class` of the allocation.
Empty buckets have their `alloc_cursor` set to `slot_count` to allow the
size class to be calculated later. This happens deep within the free
function.
This adds a helper and a test to verify that the size class of empty
buckets is indeed recoverable.
Not really useful after old C++ compiler removal, and
zig build has own cache system. If someone still wants it,
`CMAKE_C_COMPILER_LAUNCHER` and `CMAKE_CXX_COMPILER_LAUNCHER` exist.
From CMake docs:
> RULE_LAUNCH_COMPILE
> Note: This property is intended for internal use by ctest(1).
> Projects and developers should use the <LANG>_COMPILER_LAUNCHER
> target properties or the associated CMAKE_<LANG>_COMPILER_LAUNCHER
> variables instead.
Signed-off-by: Eric Joldasov <bratishkaerik@landless-city.net>
Set `ZIG_PIE` default to be same as `CMAKE_POSITION_INDEPENDENT_CODE`, and
add check for situation when `ZIG_PIE` is set to True but CMake does not
support compiling position independent code. CMake's support is needed
for "zigcpp" target.
Also remove temporary variables for constructing `ZIG_BUILD_ARGS`,
instead use `list(APPEND ...)` functions.
Also remove long unused `ZIG_NO_LANGREF` variable.
Signed-off-by: Eric Joldasov <bratishkaerik@landless-city.net>
Replace `CMAKE_SOURCE_DIR` and `CMAKE_BUILD_DIR` with different variables,
or in some cases just remove them.
For some function arguments, prepended `CMAKE_SOURCE_DIR` was removed without
replacement. This includes:
* Sources for `add_library` and `add_executable` (`ZIG_CPP_SOURCES` and `ZIG_WASM2C_SOURCES`)
* Inputs for `configure_file` and `target_include_directory`
* For arguments above, CMake already prepends
`CMAKE_CURRENT_SOURCE_DIR` to them by default, if they are relative paths.
Additionaly, it was removed from arguments of commands that have `WORKING_DIRECTORY` set to
`PROJECT_SOURCE_DIR`, they will be similarly converted by CMake for us.
Also:
* Move project declaration to the top so that these variables are
available earlier.
* Avoid calling "git" executable if ".git" directory does not exist.
* Swap "--prefix" and `ZIG_BUILD_ARGS` arguments in cmake/install.cmake
to match same "zig2 build" command in CMakeLists.txt and allow
overriding "--prefix" argument
Signed-off-by: Eric Joldasov <bratishkaerik@landless-city.net>
* Localize most of the global properties and functions, for some time
they are only needed for "zigcpp" static library (sometimes with PUBLIC
keyword, so that it will propagate to zig2): `CMAKE_*_OUTPUT_DIRECTORY`
and two calls to `include_directories`. This removes useless flags when
building other targets and cleans build log a bit.
* Remove `EXE_CXX_FLAGS` variable, instead use more appropriate specific
properties and functions for this target. This gives better errors if
compiler does not support some of them, and CMake also handles for us
duplicate flags. It's also easier to read side-by-side with same
flags from build.zig .
* Add some comments.
Signed-off-by: Eric Joldasov <bratishkaerik@landless-city.net>
Previously the error had a note suggesting to use `try`, `catch`, or
`if`, even for error sets where none of those work.
Instead, in case of an error set the way you can handle the error
depends very much on the specific case. For example you might be in a
`catch` where you are discarding or ignoring the error set capture
value, in which case one way to handle the error might be to `return`
the error.
So, in that case, we do not attach that error note.
Additionally, this makes the error tell you what kind of an error it is:
is it an error set or an error union? This distinction is very relevant
in how to handle the error.
Maybe I'm just being pedantic here (most likely) but I don't like how we're
just telling the user here how to "suppress this error" by "assigning the value to '_'".
I think it's better if we use the word "discard" here which I think is
the official terminology and also tells the user what it actually means
to "assign the value to '_'".
Also, using the value would also be a way to "suppress the error".
It is just one of the two options: discard or use.
* Revert "Revert "Merge pull request #19349 from nolanderc/save-commit""
This reverts commit 6ca4ed5948.
* update to new URI changes, rework `--save` type
* initialize `latest_commit` to null everywhere
this one is even harder to document then the last large overhaul.
TLDR;
- split apart Emit.zig into an Emit.zig and a Lower.zig
- created seperate files for the encoding, and now adding a new instruction
is as simple as just adding it to a couple of switch statements and providing the encoding.
- relocs are handled in a more sane maner, and we have a clear defining boundary between
lea_symbol and load_symbol now.
- a lot of different abstractions for things like the stack, memory, registers, and others.
- we're using x86_64's FrameIndex now, which simplifies a lot of the tougher design process.
- a lot more that I don't have the energy to document. at this point, just read the commit itself :p
this commit is a little too large to document fully, however the main gist of it this
- finish the `genInlineMemcpy` implement
- rename `setValue` to `genCopy` as I agree with jacob that it's a better name
- add in `genVarDbgInfo` for a better gdb experience
- follow the x86_64's method for genCall, as the procedure is very similar for us
- add `airSliceLen` as it's trivial
- change up the `airAddWithOverflow implementation a bit
- make sure to not spill of the elem_ty is 0 size
- correctly follow the RISC-V calling convention and spill the used calle saved registers in the prologue
and restore them in the epilogue
- add `address`, `deref`, and `offset` helper functions for MCValue. I must say I love these,
they make the code very readable and super verbose :)
- fix a `register_manager.zig` issue where when using the last register in the set, the value would overflow at comptime.
was happening because we were adding to `max_id` before subtracting from it.
the truncation panic logic is generated in Sema, so I don't need to roll anything
of my own. I add all of the boilerplate for that detecting the truncation and it works
in basic test cases!
this provides a much better indication of when we are having a controlled panic with an error message
or when we are actually segfaulting, as before the `trap` as causing a segfault.
when the struct is in stack memory, we access it using a byte-offset,
because that's how the stack works. on the other hand when the struct
is in a register, we are working with bits and the field offset should
be a bit offset.
- Added the basic framework for panicing with an overflow in `airAddWithOverflow`, but there is no check done yet.
- added the `cmp_lt`, `cmp_gte`, and `cmp_imm_eq` MIR instructions, and their respective functionality.
lots of thinking later, ive begun to grasp my head around how the pointers should work. this commit allows basic pointer loading and storing to happen.
- before we were storing each arg in it's own function arg register. with this commit now we store the args in the fa register before calling as per the RISC-V calling convention, however as soon as we enter the callee, aka in airArg, we spill the argument to the stack. this allows us to spend less effort worrying about whether we're going to clobber the function arguments when another function is called inside of the callee.
- we were actually clobbering the fa regs inside of resolveCallingConvetion, because of the null argument to allocReg. now each lock is stored in an array which is then iterated over and unlocked, which actually aids in the first point of this commit.
this was an annoying one to do, as there is no (to my knowledge) myriad sequence
that will allow us to do `gte` compares with an immediate without allocating a register.
RISC-V provides a single instruction to do compares, that being `lt`, and so you need to
use more than one for other variants, but in this case, i believe you need to allocate a register.
- implement `airArrayElemVal` for arrays on the stack. This is really easy
as we can just move the offset by the bytes into the array. This only works
when the index access is comptime-known though, this won't work for runtime access.
the current implementation only works when the struct is in a register. we use some shifting magic
to get the field into the LSB, and from there, given the type provenance, the generated code should
never reach into the bits beyond the bit size of the type and interact with the rest of the struct.
we use a code offset map in Emit.zig to pre-compute what byte offset each MIR instruction is at. this is important because they can be
of different size
- rename setRegOrMem -> setValue
- a naive method of passing arguments by register
- gather the prologue and epilogue and generate them in Emit.zig. this is cleaner because we have the final stack size in the emit step.
- define the "fa" register set, which contains the RISC-V calling convention defined function argument registers
Information about installed MSVC instances are stored in `state.json` files within a `Packages/_Instances` directory. The default location for this is `%PROGRAMDATA%\Microsoft\VisualStudio\Packages\_Instances`. However, it is possible for the Packages directory to be put somewhere else. In that case, the registry value `HKLM\SOFTWARE\Microsoft\VisualStudio\Setup\CachePath` is set and contains the path to the Packages directory.
Previously, WindowsSdk did not check that registry value. After this commit, the registry value `HKLM\SOFTWARE\Microsoft\VisualStudio\Setup\CachePath` is checked first, which matches what ISetupEnumInstances does (according to a Procmon log).
Clang 17 passed struct{f128} parameters using rdi and rax, while Clang
18 matches GCC 13.2 behavior, passing them using xmm0.
This commit makes Zig's LLVM backend match Clang 18 and GCC 13.2. The
commit deletes a hack in x86_64/abi.zig which miscategorized f128 as
"memory" which obviously disagreed with the spec.
LLVMABIAlignmentOfType(i128) reports 16 on this target, however the C
ABI uses align(4).
Clang in LLVM 17 does this:
%struct.foo = type { i32, i128 }
Clang in LLVM 18 does this:
%struct.foo = type <{ i32, i128 }>
Clang is working around the 16-byte alignment to use align(4) for the C
ABI by making the LLVM struct packed.
release/18.x branch, commit 78b99c73ee4b96fe9ce0e294d4632326afb2db42
This adds the flag `-D_LIBCPP_HARDENING_MODE` which is determined based
on the Zig optimization mode.
This commit also fixes libunwind, libcxx, and libcxxabi to properly
report sub compilation errors.
LLVM now refuses to lower arguments and return values on x86 targets
when the total vector bit size is >= 512.
This code detects such a situation and uses byref instead of byval.
* some manual fixes to generated CPU features code. In the future it
would be nice to make the script do those automatically.
* add to various target OS switches. Some of the values I was unsure of
and added TODO panics, for example in the case of spirv CPU arch.
New OSs:
* XROS
* Serenity
* Vulkan
Removed OSs:
* Ananas
* CloudABI
* Minix
* Contiki
New CPUs:
* spirv
The removed stuff is removed from LLVM but not Zig.
This allows running commands that take an output directory argument. The
main thing that was needed for this feature was generated file subpaths,
to allow access to the files in a generated directory. Additionally, a
minor change was required to so that the correct directory is created
for output directory args.
* `doc/langref` formatting
* upgrade `.{ .path = "..." }` to `b.path("...")`
* avoid using arguments named `self`
* make `Build.Step.Id` usage more consistent
* add `Build.pathResolve`
* use `pathJoin` and `pathResolve` everywhere
* make sure `Build.LazyPath.getPath2` returns an absolute path
This replaces `extra_file_dependencies` with support for lazy paths.
Also assert output file basenames are not empty, avoid improper use of
field default values, ensure stored strings are duplicated, and
prefer `ArrayListUnmanaged` to discourage misuse of direct field access
which wouldn't add step dependencies.
This was a "fake" type used to handle C varargs parameters, much like
generic poison. In fact, it is treated identically to generic poison in
all cases other than one (the final coercion of a call argument), which
is trivially special-cased. Thus, it makes sense to remove this special
tag and instead use `generic_poison_type` in its place. This fixes
several bugs in Sema related to missing handling of this tag.
Resolves: #19781
This function accepts a WaitGroup parameter and manages the reference
counting therein. It also is infallible.
The existing `spawn` function is still handy when the job wants to
further schedule more tasks.
closes#19803 by changing quota from (30 * N) to (10 * N * log2(N)) where
N = kvs_list.len
* adds reported adversarial test case
* update doc comment of getLongestPrefix()
Adds an `include_paths` field to RcSourceFile that takes a slice of LazyPaths. The paths are resolved and subsequently appended to the -rcflags as `/I <resolved path>`.
This fixes an accidental regression from https://github.com/ziglang/zig/pull/19174. Before that PR, all Win32 resource compilation would inherit the CC flags (via `addCCArgs`), which included things like include directories. After that PR, though, that is no longer the case.
However, this commit intentionally does not restore the previous behavior (inheriting the C include paths). Instead, each .rc file will need to have its include paths specified directly and the include paths only apply to one particular resource script. This allows more fine-grained control and has less potentially surprising behavior (at the cost of some convenience).
Closes#19605
This function incorrectly assumed that module name subsections, function
name subsections, and local name subsections are encoded the same,
however according to
[the specification](https://webassembly.github.io/spec/core/appendix/custom.html)
they are encoded differently.
This commit adds support for parsing module name subsections correctly,
which started appearing after upgrading to LLVM 18.
As of Clang 18, calling memcpy() with a misaligned pointer trips UBSAN,
even if the length is zero. This unfortunately includes any call to
`@memcpy` when source or destination are undefined and the length is
zero.
This patch makes the C backend avoid calling memcpy when the length is
zero, thereby avoiding undefined behavior.
A zig1.wasm update will be needed in the llvm18 branch to activate this
code.
wheras on NetBSD, only 2 PT_LOAD are usually produced by other compilers
(tested with host gcc and clang).
$ ldd -v main_4segs
.../main_4segs: wrong number of segments (4 != 2)
.../main_4segs: invalid ELF class 2; expected 1
* std.crypto.hash.sha2: generalize sha512 truncation
Replace `Sha512224`, `Sha512256`, and `Sha512T224` with
`fn Sha512Truncated(digest_bits: comptime_int)`.
This required refactoring `Sha2x64(comptime params)` to
`Sha2x64(comptime iv: [8]u64, digest_bits: comptime_int)`
for user-specified `digest_bits`.
I left #19697 alone but added a compile-time check that digest_bits is
divisible by 8.
Remove docs which restate type name. Add module docs and reference where
IVs come from.
* std.crypto.sha2: make Sha512_224 and Sha512_256 pub
* make generic type implementation detail, add comments
* fix iv
* address @jedisct1 feedback
* fix typo
* renaming
* add truncation clarifying comment and Sha259T192 tests
* Fix the ELF binaries for freestanding target created with the self-hosted linker.
The ELF specification (generic ABI) states that ``loadable process segments must have congruent
values for p_vaddr and p_offset, modulo the page size''. Linux refuses to load binaries that
don't meet this requirement (execve() fails with EINVAL).
This removes the two original implementations in favour of the single
generic one based on the Algorithm type. Previously we had three, very
similar implementations which was somewhat confusing when knowing what
one should actually be used.
The previous polynomials all have equivalent variants available when
using the Algorithm type.
A volume can be mounted as a NTFS path, e.g. as C:\Mnt\Foo. In that case, IOCTL_MOUNTMGR_QUERY_POINTS gives us a mount point with a symlink value something like `\??\Volume{383da0b0-717f-41b6-8c36-00500992b58d}`. In order to get the `C:\Mnt\Foo` path, we can query the mountmgr again using IOCTL_MOUNTMGR_QUERY_DOS_VOLUME_PATH.
Fixes#19731
* Adjust buffer length a bit.
* Fix detecting if file is a script. Logic below was unreachable,
because 99% of scripts failed "At least 255 bytes long" check and were detected as ELF files.
It should be "At least 4" instead (minimum value of "ELF magic length" and "smallest possible interpreter path length").
* Fix parsing interpreter path, when text after shebang:
1. does not have newline,
2. has leading spaces and tabs,
3. separates interpreter and arguments by tab or NUL.
* Remove empty error set from `defaultAbiAndDynamicLinker`.
Signed-off-by: Eric Joldasov <bratishkaerik@landless-city.net>
* std.crypto: make ff.ct_unprotected.limbsCmpLt compile
* std.crypto: add ff.ct test
* fix testCt to work on x86
* disable test on stage2-c
---------
Co-authored-by: Frank Denis <124872+jedisct1@users.noreply.github.com>
> Note: This first part is mostly a rephrasing of https://flatt.tech/research/posts/batbadbut-you-cant-securely-execute-commands-on-windows/
> See that article for more details
On Windows, it is possible to execute `.bat`/`.cmd` scripts via CreateProcessW. When this happens, `CreateProcessW` will (under-the-hood) spawn a `cmd.exe` process with the path to the script and the args like so:
cmd.exe /c script.bat arg1 arg2
This is a problem because:
- `cmd.exe` has its own, separate, parsing/escaping rules for arguments
- Environment variables in arguments will be expanded before the `cmd.exe` parsing rules are applied
Together, this means that (1) maliciously constructed arguments can lead to arbitrary command execution via the APIs in `std.process.Child` and (2) escaping according to the rules of `cmd.exe` is not enough on its own.
A basic example argv field that reproduces the vulnerability (this will erroneously spawn `calc.exe`):
.argv = &.{ "test.bat", "\"&calc.exe" },
And one that takes advantage of environment variable expansion to still spawn calc.exe even if the args are properly escaped for `cmd.exe`:
.argv = &.{ "test.bat", "%CMDCMDLINE:~-1%&calc.exe" },
(note: if these spawned e.g. `test.exe` instead of `test.bat`, they wouldn't be vulnerable; it's only `.bat`/`.cmd` scripts that are vulnerable since they go through `cmd.exe`)
Zig allows passing `.bat`/`.cmd` scripts as `argv[0]` via `std.process.Child`, so the Zig API is affected by this vulnerability. Note also that Zig will search `PATH` for `.bat`/`.cmd` scripts, so spawning something like `foo` may end up executing `foo.bat` somewhere in the PATH (the PATH searching of Zig matches the behavior of cmd.exe).
> Side note to keep in mind: On Windows, the extension is significant in terms of how Windows will try to execute the command. If the extension is not `.bat`/`.cmd`, we know that it will not attempt to be executed as a `.bat`/`.cmd` script (and vice versa). This means that we can just look at the extension to know if we are trying to execute a `.bat`/`.cmd` script.
---
This general class of problem has been documented before in 2011 here:
https://learn.microsoft.com/en-us/archive/blogs/twistylittlepassagesallalike/everyone-quotes-command-line-arguments-the-wrong-way
and the course of action it suggests for escaping when executing .bat/.cmd files is:
- Escape first using the non-cmd.exe rules
- Then escape all cmd.exe 'metacharacters' (`(`, `)`, `%`, `!`, `^`, `"`, `<`, `>`, `&`, and `|`) with `^`
However, escaping with ^ on its own is insufficient because it does not stop cmd.exe from expanding environment variables. For example:
```
args.bat %PATH%
```
escaped with ^ (and wrapped in quotes that are also escaped), it *will* stop cmd.exe from expanding `%PATH%`:
```
> args.bat ^"^%PATH^%^"
"%PATH%"
```
but it will still try to expand `%PATH^%`:
```
set PATH^^=123
> args.bat ^"^%PATH^%^"
"123"
```
The goal is to stop *all* environment variable expansion, so this won't work.
Another problem with the ^ approach is that it does not seem to allow all possible command lines to round trip through cmd.exe (as far as I can tell at least).
One known example:
```
args.bat ^"\^"key^=value\^"^"
```
where args.bat is:
```
@echo %1 %2 %3 %4 %5 %6 %7 %8 %9
```
will print
```
"\"key value\""
```
(it will turn the `=` into a space for an unknown reason; other minor variations do roundtrip, e.g. `\^"key^=value\^"`, `^"key^=value^"`, so it's unclear what's going on)
It may actually be possible to escape with ^ such that every possible command line round trips correctly, but it's probably not worth the effort to figure it out, since the suggested mitigation for BatBadBut has better roundtripping and leads to less garbled command lines overall.
---
Ultimately, the mitigation used here is the same as the one suggested in:
https://flatt.tech/research/posts/batbadbut-you-cant-securely-execute-commands-on-windows/
The mitigation steps are reproduced here, noted with one deviation that Zig makes (following Rust's lead):
1. Replace percent sign (%) with %%cd:~,%.
2. Replace the backslash (\) in front of the double quote (") with two backslashes (\\).
3. Replace the double quote (") with two double quotes ("").
4. ~~Remove newline characters (\n).~~
- Instead, `\n`, `\r`, and NUL are disallowed and will trigger `error.InvalidBatchScriptArg` if they are found in `argv`. These three characters do not roundtrip through a `.bat` file and therefore are of dubious/no use. It's unclear to me if `\n` in particular is relevant to the BatBadBut vulnerability (I wasn't able to find a reproduction with \n and the post doesn't mention anything about it except in the suggested mitigation steps); it just seems to act as a 'end of arguments' marker and therefore anything after the `\n` is lost (and same with NUL). `\r` seems to be stripped from the command line arguments when passed through a `.bat`/`.cmd`, so that is also disallowed to ensure that `argv` can always fully roundtrip through `.bat`/`.cmd`.
5. Enclose the argument with double quotes (").
The escaped command line is then run as something like:
cmd.exe /d /e:ON /v:OFF /c "foo.bat arg1 arg2"
Note: Previously, we would pass `foo.bat arg1 arg2` as the command line and the path to `foo.bat` as the app name and let CreateProcessW handle the `cmd.exe` spawning for us, but because we need to pass `/e:ON` and `/v:OFF` to cmd.exe to ensure the mitigation is effective, that is no longer tenable. Instead, we now get the full path to `cmd.exe` and use that as the app name when executing `.bat`/`.cmd` files.
---
A standalone test has also been added that tests two things:
1. Known reproductions of the vulnerability are tested to ensure that they do not reproduce the vulnerability
2. Randomly generated command line arguments roundtrip when passed to a `.bat` file and then are passed from the `.bat` file to a `.exe`. This fuzz test is as thorough as possible--it tests that things like arbitrary Unicode codepoints and unpaired surrogates roundtrip successfully.
Note: In order for the `CreateProcessW` -> `.bat` -> `.exe` roundtripping to succeed, the .exe must split the arguments using the post-2008 C runtime argv splitting implementation, see https://github.com/ziglang/zig/pull/19655 for details on when that change was made in Zig.
ensureTotalCapacityPrecise only satisfies the assumptions made in the ArrayListImpl functions (that there's already enough capacity for the entire converted string if it's all ASCII) when the ArrayList has no items, otherwise it would hit illegal behavior.
Windows does not support RPATH and only searches for DLLs in a small
number of predetermined paths by default, with one of them being the
directory from which the application loaded.
Installing both executables and DLLs to `bin/` by default helps ensure
that the executable can find any DLL artifacts it has linked to.
DLL import libraries are still installed to `lib/`.
These defaults match CMake's behavior.
this patch renames ComptimeStringMap to StaticStringMap, makes it
accept only a single type parameter, and return a known struct type
instead of an anonymous struct. initial motivation for these changes
was to reduce the 'very long type names' issue described here
https://github.com/ziglang/zig/pull/19682.
this breaks the previous API. users will now need to write:
`const map = std.StaticStringMap(T).initComptime(kvs_list);`
* move `kvs_list` param from type param to an `initComptime()` param
* new public methods
* `keys()`, `values()` helpers
* `init(allocator)`, `deinit(allocator)` for runtime data
* `getLongestPrefix(str)`, `getLongestPrefixIndex(str)` - i'm not sure
these belong but have left in for now incase they are deemed useful
* performance notes:
* i posted some benchmarking results here:
https://github.com/travisstaloch/comptime-string-map-revised/issues/1
* i noticed a speedup reducing the size of the struct from 48 to 32
bytes and thus use u32s instead of usize for all length fields
* i noticed speedup storing KVs as a struct of arrays
* latest benchmark shows these wall_time improvements for
debug/safe/small/fast builds: -6.6% / -10.2% / -19.1% / -8.9%. full
output in link above.
* define std.crypto.sha2.Sha512224
* rename blunder
* add sha512-224 and sha512-256 tests
* fix Sha2x64 for variations that aren't a multiple of 64 bits
Also removes the LOCK namespace from std.c.wasi because wasi libc does
not have flock.
closes#19336
related to #19352
Co-authored-by: Ryan Liptak <squeek502@hotmail.com>
This is a partial revert of 105db13536.
As we learned from Void Linux packaging, these options are not actually
helpful since the distribution package manager may very well want to
cross-compile the packages that it is building.
So, let's not overcomplicate things. There are already the standard
options: -Dtarget, -Dcpu, and -Ddynamic-linker.
These options are generally provided when the project generates machine
code artifacts, however, there may be a project that does no such thing,
in which case it makes sense for these options to be missing. The Zig
Build System is a general-purpose build system, after all.
Certain types (notably, `std.ComptimeStringMap`) were resulting in excessively
long type names when instantiated, which in turn resulted in excessively long
symbol names. These are problematic for two reasons:
* Symbol names are sometimes read by humans -- they ought to be readable.
* Some other applications (looking at you, xcode) trip on very long symbol names.
To work around this for now, we cap the depth of value printing at 1, as opposed
to the normal 3. This doesn't guarantee anything -- there could still be, for
instance, an incredibly long aggregate -- but it works around the issue in
practice for the time being.
When building zig natively from source on an RK3588 SoC board, the
generated stage3 compiler will use unsupported 'sha256h.4s' instructions
due to mis-identified CPU features. This happens when trying to
compile a new project generated with "zig init"
$ ~/devel/zig/build/stage3/bin/zig build
thread 919 panic: Illegal instruction at address 0x1fdc0c4
/home/dliviu/devel/zig/lib/std/crypto/sha2.zig:223:29: 0x1fdc0c4 in round (zig)
asm volatile (
^
/home/dliviu/devel/zig/lib/std/crypto/sha2.zig:168:20: 0x1fdca87 in final (zig)
d.round(&d.buf);
^
/home/dliviu/devel/zig/lib/std/crypto/sha2.zig:180:20: 0x1c8bb33 in finalResult (zig)
d.final(&result);
^
/mnt/home/dliviu/devel/zig/src/Package/Fetch.zig:754:49: 0x1a3a8eb in relativePathDigest (zig)
return Manifest.hexDigest(hasher.finalResult());
^
/mnt/home/dliviu/devel/zig/src/main.zig:5128:53: 0x1a37413 in cmdBuild (zig)
Package.Fetch.relativePathDigest(build_mod.root, global_cache_directory),
^
???:?:?: 0xff09ffffffffffff in ??? (???)
Unwind information for `???:0xff09ffffffffffff` was not available, trace may be incomplete
???:?:?: 0x3f7f137 in ??? (???)
Aborted (core dumped)
system/linux.zig parses "/proc/cpuinfo" to determine the CPU features, but it
seems to generate the wrong set of features from:
$ cat /proc/cpuinfo
processor : 0
BogoMIPS : 48.00
Features : fp asimd evtstrm crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x2
CPU part : 0xd05
CPU revision : 0
.....
processor : 4
BogoMIPS : 48.00
Features : fp asimd evtstrm crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x4
CPU part : 0xd0b
CPU revision : 0
.....
To fix this, use the Linux kernel way of reading the feature registers as documented
here: https://www.kernel.org/doc/html/latest/arch/arm64/cpu-feature-registers.html
arm.zig already has the code to parse the feature register values, we just need to
collect them in an array and pass them for identification.
Signed-off-by: Liviu Dudau <liviu@dudau.co.uk>
The operation `undefined & 0` ought to result in the value `0`, and likewise for
zeroing only some bits. `std/packed_int_array.zig` tests were failing because
this behavior was not implemented -- this issue was previously masked by faulty
bitcast logic which turned `undefined` values into `0xAA` on pointer loads.
Ideally, we would like to be able to track the undefined bits at comptime.
This is related to #19634.
This commit reverts the handling of partially-undefined values in
bitcasting to transform these bits into an arbitrary numeric value,
like happens on `master` today.
As @andrewrk rightly points out, #19634 has unfortunate consequences
for the standard library, and likely requires more thought. To avoid
a major breaking change, it has been decided to revert this design
decision for now, and make a more informed decision further down the
line.
I have noted several latent bugs in the handling of packed memory on
big-endian targets, and one of them has been exposed by the previous
commit, triggering this test failure. I am opting to disable it for now
on the ground that the commit preceding this one helps far more than it
hurts.
We've got a big one here! This commit reworks how we represent pointers
in the InternPool, and rewrites the logic for loading and storing from
them at comptime.
Firstly, the pointer representation. Previously, pointers were
represented in a highly structured manner: pointers to fields, array
elements, etc, were explicitly represented. This works well for simple
cases, but is quite difficult to handle in the cases of unusual
reinterpretations, pointer casts, offsets, etc. Therefore, pointers are
now represented in a more "flat" manner. For types without well-defined
layouts -- such as comptime-only types, automatic-layout aggregates, and
so on -- we still use this "hierarchical" structure. However, for types
with well-defined layouts, we use a byte offset associated with the
pointer. This allows the comptime pointer access logic to deal with
reinterpreted pointers far more gracefully, because the "base address"
of a pointer -- for instance a `field` -- is a single value which
pointer accesses cannot exceed since the parent has undefined layout.
This strategy is also more useful to most backends -- see the updated
logic in `codegen.zig` and `codegen/llvm.zig`. For backends which do
prefer a chain of field and elements accesses for lowering pointer
values, such as SPIR-V, there is a helpful function in `Value` which
creates a strategy to derive a pointer value using ideally only field
and element accesses. This is actually more correct than the previous
logic, since it correctly handles pointer casts which, after the dust
has settled, end up referring exactly to an aggregate field or array
element.
In terms of the pointer access code, it has been rewritten from the
ground up. The old logic had become rather a mess of special cases being
added whenever bugs were hit, and was still riddled with bugs. The new
logic was written to handle the "difficult" cases correctly, the most
notable of which is restructuring of a comptime-only array (for
instance, converting a `[3][2]comptime_int` to a `[2][3]comptime_int`.
Currently, the logic for loading and storing work somewhat differently,
but a future change will likely improve the loading logic to bring it
more in line with the store strategy. As far as I can tell, the rewrite
has fixed all bugs exposed by #19414.
As a part of this, the comptime bitcast logic has also been rewritten.
Previously, bitcasts simply worked by serializing the entire value into
an in-memory buffer, then deserializing it. This strategy has two key
weaknesses: pointers, and undefined values. Representations of these
values at comptime cannot be easily serialized/deserialized whilst
preserving data, which means many bitcasts would become runtime-known if
pointers were involved, or would turn `undefined` values into `0xAA`.
The new logic works by "flattening" the datastructure to be cast into a
sequence of bit-packed atomic values, and then "unflattening" it; using
serialization when necessary, but with special handling for `undefined`
values and for pointers which align in virtual memory. The resulting
code is definitely slower -- more on this later -- but it is correct.
The pointer access and bitcast logic required some helper functions and
types which are not generally useful elsewhere, so I opted to split them
into separate files `Sema/comptime_ptr_access.zig` and
`Sema/bitcast.zig`, with simple re-exports in `Sema.zig` for their small
public APIs.
Whilst working on this branch, I caught various unrelated bugs with
transitive Sema errors, and with the handling of `undefined` values.
These bugs have been fixed, and corresponding behavior test added.
In terms of performance, I do anticipate that this commit will regress
performance somewhat, because the new pointer access and bitcast logic
is necessarily more complex. I have not yet taken performance
measurements, but will do shortly, and post the results in this PR. If
the performance regression is severe, I will do work to to optimize the
new logic before merge.
Resolves: #19452Resolves: #19460
* fix UB when Thread.spawn fails, and we try to join uninitialized
threads (through new `thread_count` variable).
* make sure that the tests pins down synchronization guarantees: in the
main thread, we can observe `1` due to synchronization from
`Thread.join()`. To make sure that once uses Acq/Rel, and not just
relaxed, we should also additionally check that each thread observes
1, regardless of whether it was the one to call once.
On Windows, the command line arguments of a program are a single WTF-16 encoded string and it's up to the program to split it into an array of strings. In C/C++, the entry point of the C runtime takes care of splitting the command line and passing argc/argv to the main function.
https://github.com/ziglang/zig/pull/18309 updated ArgIteratorWindows to match the behavior of CommandLineToArgvW, but it turns out that CommandLineToArgvW's behavior does not match the behavior of the C runtime post-2008. In 2008, the C runtime argv splitting changed how it handles consecutive double quotes within a quoted argument (it's now considered an escaped quote, e.g. `"foo""bar"` post-2008 would get parsed into `foo"bar`), and the rules around argv[0] were also changed.
This commit makes ArgIteratorWindows match the behavior of the post-2008 C runtime, and adds a standalone test that verifies the behavior matches both the MSVC and MinGW argv splitting exactly in all cases (it checks that randomly generated command line strings get split the same way).
The motivation here is roughly the same as when the same change was made in Rust (https://github.com/rust-lang/rust/pull/87580), that is (paraphrased):
- Consistent behavior between Zig and modern C/C++ programs
- Allows users to escape double quotes in a way that can be more straightforward
Additionally, the suggested mitigation for BatBadBut (https://flatt.tech/research/posts/batbadbut-you-cant-securely-execute-commands-on-windows/) relies on the post-2008 argv splitting behavior for roundtripping of the arguments given to `cmd.exe`. Note: it's not necessary for the suggested mitigation to work, but it is necessary for the suggested escaping to be parsed back into the intended argv by ArgIteratorWindows after being run through a `.bat` file.
signature/s:
Algorithm Before After
---------------+---------+-------
ecdsa-p256 3707 4396
ecdsa-p384 1067 1332
ecdsa-secp256k1 4490 5147
Add ECDSA to the benchmark by the way.
This was overlooked in #19458. Using the fully qualified name of each
module usually makes sense, but there is one module where it does not,
namely, the root module, since its name is `root`. The original Autodoc
tar creation logic used `comp.root_name` for the root module back when
it was the only module included in `sources.tar`, and that made sense.
Now, we get the best of both worlds, using the proper root name for the
root module while using the module name for the rest.
This makes the host http header have the port if and only if it differs
from the defaults based on the protocol.
This is an alternate implementation that closes#19624.
This flips things around such that std/hash/crc.zig is generated
by the catalog-based generation tool, and the real code that used
to be in that file is moved out to std/hash/crc/impl.zig. The
generated tests are moved to std/hash/crc/test.zig. By going this
route, we eliminate the need for usingnamespace without changing
anything for callers of these interfaces. The Crc32 tests are
simply added to the fixed part of the generated output and
compactified a bit.
This was the second-to-last usage of usingnamespace left in std.
Don't know why UEFI wasn't excluded but freestanding is, probably an oversight since I want to have detailed debug info on my panic function on my Headstart bootloader.
The previous commit deleted the deprecated API and then made all the
follow-up changes; this commit reverts only the breaking API changes.
This commit can be reverted once 0.12.0 is tagged.
This adds the *std.Build owner to LazyPath so that lazy paths returned
from a dependency can be used in the application without friction or
footguns.
closes#19313
This allows `std.Uri.resolve_inplace` to properly preserve the fact
that `new` is already escaped but `base` may not be. I originally tried
just moving `raw_uri` around, but it made uri resolution unmanagably
complicated, so I instead added per-component information to `Uri` which
allows extra allocations to be avoided when constructing uris with
components from different sources, and in some cases, deferring the work
all the way to when the uri is printed, where an allocator may not even
be needed.
Closes#19587
https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-185.pdf
This adds useful standard SHA3-based constructions from the
NIST SP 800-185 document:
- cSHAKE: similar to the SHAKE extensible hash function, but
with the addition of a context parameter.
- KMAC: SHAKE-based authentication / keyed XOF
- TupleHash: unambiguous hashing of tuples
These are required by recent protocols and specifications.
They also offer properties that none of the currently available
constructions in the stdlib offer, especially the ability to safely
hash tuples.
Other keyed hash functions/XOFs will fall back to using HMAC, which
is suboptimal from a performance perspective, but fine from a
security perspective.
Reference:
https://github.com/ziglang/zig/pull/19500#discussion_r1556476973
Arena is now used for Diagnostic (tar and git). `deinit` is not called on Diagnostic
allowing us to reference strings from Diagnostic in UnpackResult without
dupe.
That seamed reasonable to me. Instead of using gpa for Diagnostic, and
then dupe to arena. Or using arena for both and making dupe so we can deinit
Diagnostic.
Using test cases from:
https://github.com/ianprime0509/pathological-packages repository.
Depends on existence of the FAT32 file system. Folder is in FAT32 file
system because it is case insensitive and and does not support symlinks.
It is complicated test case requires internet connection, depends on
existence of FAT32 in the specific location. But it is so valuable for
development. Running `zig test Package.zig` is so much faster than
building zig binary and running `zig fetch URL`. Committing it here
although it should probably be removed.
Commit 0b7123f41d regressed the
`include_path` option of ConfigHeader which is intended to set the path,
including subdirectories, that C code would pass to an include
directive.
For example if it passes
.include_path = "config/config.h",
Then the C code should be able to have
#include "config/config.h"
This regressed https://github.com/andrewrk/nasm/ but this commit fixes
it.
Previously, indentation was not being handled correctly in some cases,
causing examples such as `std.json.WriteStream` to be rendered with
improper list nesting.
Additionally, some more test cases have been added to ensure
indentation (or lack of indentation) is handled correctly in some other
constructs.
This field has not been referenced by compile steps since
e76ce2c1d0, all the way back in 2019.
To specify the language standard, pass `-std=[value]` as a regular
C flag instead.
Previously, `Step.Compile.installHeader` and friends would incorrectly
modify the default `install` top-level step, when the intent was for
headers to get bundled with and installed alongside an artifact. This
change set implements the intended behavior.
This carries with it some breaking changes; `installHeader` and
`installConfigHeader` both have new signatures, and
`installHeadersDirectory` and `installHeadersDirectoryOptions` have been
merged into `installHeaders`.
`{}` for decls
`{p}` for enum fields
`{p_}` for struct fields and in contexts following a `.`
Elsewhere, `{p}` was used since it's equivalent to the old behavior.
This is a breaking change.
This updates `std.zig.fmtId` to support conditionally escaping
primitives and the reserved `_` identifier via format specifiers:
- `{}`: escape invalid identifiers, identifiers that shadow primitives
and the reserved `_` identifier.
- `{p}`: same as `{}`, but don't escape identifiers that
shadow primitives.
- `{_}`: same as `{}`, but don't escape the reserved `_` identifier.
- `{p_}` or `{_p}`: only escape invalid identifiers.
(The idea is that `p`/`_` mean "allow primitives/underscores".)
Any other format specifiers will result in compile errors.
Additionally, `isValidId` now considers `_` a valid identifier. If this
distinction is important, consider combining existing uses of this
function with the new `isUnderscore` function.
Closes#19557Closes#19561
Previously, `build.zig` was not being detected correctly by
`computeHash` for packages where there is a containing root directory.
This allows us to more sanely allocate a continuous
range of result-ids, and avoids a bunch of nasty
casting code in a few places. Its currently not used
very often, but will be useful in the future.
Filter should be applied on path where package root folder (if
there is any) is stripped. Manifest is inside package root and has paths
relative to package root not temporary directory root.
Based on comment:
https://github.com/ziglang/zig/pull/19111#discussion_r1548640939
computeHash finds all files in temporary directory. There is no
difference on what path are they. When calculating hash normalized_path
must be set relative to package root. That's the place where we strip
root if needed.
While iterating over all files in tarball set root_dir in diagnostic if
there is single root in tarball. Will be used in package manager with
strip_components = 0 to find the root of the fetched package.
There is no way to know the expected parent pointer attributes (most
notably alignment) from the type of the field pointer, so provide them
in the first argument.
The 64-bit backend supports printing all floats up to 64-bits. The
128-bit continues to be used for larger values.
This backend is approximately ~3x faster. Code size is a little smaller
in the full table case and much smaller if using the samll tables.
The implementation uses the same code-paths, parameterized by a set of
tables and their pow5 implementations. We continue to use the same
rounding/formatting mechanisms. Initially I explored a separate
implementation, as upstream does this and has specific optimizations for
these paths but for simplicity we don't. The performance loss is small
enough at this point and keeping them combined keeps them in sync.
Closes#19264.
- Changed `modf_result` to `Modf` to better fit naming conventions
- Reworked `modf` to be far simpler and support all floating point types (as well as vectors) (I have done benchmarks and can confirm that the performance is roughly equivalent to the old implementation)
- Added more descriptive tests for modf
- Deprecated `modf32_result` and `modf64_result` in favor of `Modf(f32)` and `Modf(f64)` respectively
Now that `-femit-docs` includes all modules, including the builtin
module, in the generated source tarball, it makes sense to apply the
same logic to the std-docs server. std-docs constructs its own tarball,
so a different approach is needed to achieve the same end result.
Closes#19403
This commit adds all modules in the compilation to the generated
`sources.tar` when using `-femit-docs` (including `std` and `builtin`).
Additionally, it considers the main module when doing so, rather than
the root module, so the behavior when running `zig test -femit-docs
test.zig` is now correct.
Notably, this improves string printing from
`@as(*[5:0]u8, &@as([5:0]u8, "hello".*))` to `@as(*[5:0]u8, "hello")`,
omitting the pointless ref-deref pair.
Legacy anon decls now have three uses:
* Type owner decls
* Function owner decls
* `@export` and `@extern`
Therefore, there are no longer any cases where we wish to explicitly
omit legacy anon decls from the binary. This means we can remove the
concept of an "alive" vs "dead" `Decl`, which also allows us to remove
the separate `anon_work_queue` in `Compilation`.
This commit also performs some refactors to `TypedValue.print` in
preparation for improved comptime pointer access logic. Once that logic
exists, `TypedValue.print` can use Sema to access pointers for more
helpful printing.
This commit also implements proposal #19435, because the existing logic
there relied on some blatantly incorrect code in `Value.sliceLen`.
Resolves: #19435
Good riddance!
Most of these changes are trivial. There's a fix for a minor bug this
exposed in `Value.readFromPackedMemory`, but aside from that, it's all
just things like changing `intern` calls to `toIntern`.
Perhaps someday, we will make Sema operate on mutable values more
generally. For now, it makes sense to split out this representation,
since it is only used in comptime pointer accesses.
There are some currently unused methods on `MutableValue` which will
be used once I rewrite the comptime pointer access logic to be less
terrible.
The commit following this one will - at long last - delete the legacy
Value representation
`Decl` can no longer store un-interned values, so this field is now
unnecessary. The type can instead be fetched with the new `typeOf`
helper method, which just gets the type of the Decl's `Value`.
This issue was causing debug information to sometimes not function
correctly for some local variables, with debuggers simply reporting that
the variable does not exist. What was happening was that after an AIR
body - and thus debug lexical scope - begins, but before any `dbg_stmt`
within it, the `scope` on `self.wip.debug_location` refers to the parent
scope, but the `scope` field on the `DILocalVariable` metadata passed to
`@llvm.dbg.declare` points, correctly, to the nested scope. I haven't
looked into precisely what happens here, but in short, it would appear
that LLVM Doesn't Like It (tm).
The fix is simple: when we change `self.scope` at the start or end of an
AIR body, also modify the scope on `self.wip.debug_location`. This is
correct as we always want the debug info for an instruction to be
associated with the block it is within, even if the line/column are
slightly outdated for any reason.
This test has regressed due to a bug in the self-hosted COFF linker.
Jakub recommended disabling it for now, since the COFF linker is being
effectively rewritten, so there is little point in fixing the bug now.
This commit changes how we represent comptime-mutable memory
(`comptime var`) in the compiler in order to implement the intended
behavior that references to such memory can only exist at comptime.
It does *not* clean up the representation of mutable values, improve the
representation of comptime-known pointers, or fix the many bugs in the
comptime pointer access code. These will be future enhancements.
Comptime memory lives for the duration of a single Sema, and is not
permitted to escape that one analysis, either by becoming runtime-known
or by becoming comptime-known to other analyses. These restrictions mean
that we can represent comptime allocations not via Decl, but with state
local to Sema - specifically, the new `Sema.comptime_allocs` field. All
comptime-mutable allocations, as well as any comptime-known const allocs
containing references to such memory, live in here. This allows for
relatively fast checking of whether a value references any
comptime-mtuable memory, since we need only traverse values up to
pointers: pointers to Decls can never reference comptime-mutable memory,
and pointers into `Sema.comptime_allocs` always do.
This change exposed some faulty pointer access logic in `Value.zig`.
I've fixed the important cases, but there are some TODOs I've put in
which are definitely possible to hit with sufficiently esoteric code. I
plan to resolve these by auditing all direct accesses to pointers (most
of them ought to use Sema to perform the pointer access!), but for now
this is sufficient for all realistic code and to get tests passing.
This change eliminates `Zcu.tmp_hack_arena`, instead using the Sema
arena for comptime memory mutations, which is possible since comptime
memory is now local to the current Sema.
This change should allow `Decl` to store only an `InternPool.Index`
rather than a full-blown `ty: Type, val: Value`. This commit does not
perform this refactor.
This extension to the typical `<>` Markdown autolink syntax allows
HTTP(S) links to be recognized in normal text without being delimited by
`<>`. This is the most natural way to write links in text, so it makes
sense to support it and allow documentation comments to be written in a
more natural way.
Closes#19265
This commit implements support for Markdown autolinks delimited by angle
brackets. The precise syntax accepted is documented in the doc comment
of `markdown.zig`.
Some users are hitting this limit. I think it's primarily due to not
deduplicating (solved in the previous commit) but this seems like a
better limit regardless.
The zig way is to let the compiler provide errors, rather than trying to
implement the compiler in the standard library.
I played around with this and found the compile errors to be easier to
comprehend without this logic.
1. Entirely rewrote frexp with generics, reducing the implementation to a single function and enabling parameters of types f80 and f16
2. Expanded upon the tests, making them more descriptive and comprehensive, and automatically generating the test bodies for each floating point type
3. Added a doctest for frexp
Symmetry with parse_float and to hide the implementation from the user.
Additionally, we expose the entire namespace and provide some aliases so
everything is available to a user.
Closes#19366
Stores the original ref as a query parameter in the URL so that it is
possible to automatically check the upstream if there are any newer
commits.
Also adds a flag which opts-out of the new behaivour, restoring the old.
When the slice-by-length start position is runtime-known, it is likely
protected by a runtime-known condition and therefore a compile error is
less appropriate than a runtime panic check.
This is demonstrated in the json code that was updated and then reverted
in this commit.
When #3806 is implemented, this decision can be reassessed.
Revert "std: work around compiler unable to evaluate condition at compile time"
Revert "frontend: comptime array slice-by-length OOB detection"
This reverts commit 7741aca96c.
This reverts commit 2583b389ea.
netbsd fix:
- `Futex.zig:542:56: error: expected error union type, found 'c_int'`
openbsd fix:
- `emutls.zig:10:21: error: root struct of file 'os' has no member named 'abort'`
- `Thread.zig:627:22: error: expected 6 argument(s), found 5`
This was a mistake from day one. This is the wrong abstraction layer to
do this in.
My alternate plan for this is to make all I/O operations require an IO
interface parameter, similar to how allocations require an Allocator
interface parameter today.
For module parsing and assembling, we will also need to know
all of the SPIR-V extensions and their instructions. This commit
updates the generator to generate those. Because there are
multiple instruction sets that each have a separate list of Opcodes,
no separate enum is generated for these opcodes. Additionally, the
previous mechanism for runtime instruction information, `Opcode`'s
`fn operands()`, has been removed in favor for
`InstructionSet.core.instructions()`.
Any mapping from operand to instruction is to be done at runtime.
Using a runtime populated hashmap should also be more efficient
than the previous mechanism using `stringToEnum`.
This fixes an issue with boostrapping the compiler using MSVC. There is a CircularBuffer with
an array of length 65536 initialized to undefined, and because the undefined path of `renderValue`
was using `StringLiteral` to render this, the resulting zig2.c would fail to compile using MSVC.
The solution was to move the already-existing array initializer path (used in the non-undefined path)
into StringLiteral, and make StringLiteral aware of the total length so it could decide between which
style of initialization to use. We prefer to use string literals if we can, as this results in the least
amount of emitted C source.
A pointer type already has an alignment, so this information does not
need to be duplicated on the function type. This already has precedence
with addrspace which is already disallowed on function types for this
reason. Also fixes `@TypeOf(&func)` to have the correct addrspace and
alignment.
This adds std.debug.SafetyLock and uses it in std.HashMapUnmanaged by
adding lockPointers() and unlockPointers().
This provides a way to detect when an illegal modification has happened and
panic rather than invoke undefined behavior.
The error unions for WindowsDynLib and ElfDynLib do not contain all the possible errors.
So user code that relies on DynLib.Error will fail to compile.
This implementation is now a direct replacement for the `kernel32` one.
New bitflags for named pipes and other generic ones were added based on
browsing the ReactOS sources.
`UNICODE_STRING.Buffer` has also been changed to be nullable, as
this is what makes the implementation work.
This required some changes to places accesssing the buffer after a
`SUCCESS`ful return, most notably `QueryObjectName` which even referred
to it being nullable.
Closes https://github.com/ziglang/zig/issues/19284
As of 9f2cb920c0 configuring the global
cache directory to a relative path causes errors of the following form
when cross compiling to `windows-gnu`.
```
error: unable to generate DLL import .lib file for kernel32: FileNotFound
```
The issue occurred because the `def_final_path` included the global
cache path as a prefix and later on the `def_final_file` was created
using that path with the global cache directory handle.
* io_uring: ring mapped buffers
Ring mapped buffers are newer implementation of ring provided buffers, supported
since kernel 5.19. Best described in Jens Axboe [post](https://github.com/axboe/liburing/wiki/io_uring-and-networking-in-2023#provided-buffers)
This commit implements low level io_uring_*_buf_ring_* functions as mostly
direct translation from liburing. It also adds BufferGroup abstraction over those
low level functions.
* io_uring: add multishot recv to BufferGroup
Once we have ring mapped provided buffers functionality it is possible to use
multishot recv operation. Multishot receive is submitted once, and completions
are posted whenever data arrives on the socket. Received data are placed in a
new buffer from buffer group.
Reference: [io_uring and networking in 2023](https://github.com/axboe/liburing/wiki/io_uring-and-networking-in-2023#multi-shot)
Getting NOENT for cancel completion result, meaning:
-ENOENT
The request identified by user_data could not be located.
This could be because it completed before the cancelation
request was issued, or if an invalid identifier is used.
https://man7.org/linux/man-pages/man3/io_uring_prep_cancel.3.htmlhttps://github.com/ziglang/zig/actions/runs/6801394000/job/18492139893?pr=17806
Result in cancel/recv cqes are different depending on the kernel.
on older kernel (tested with v6.0.16, v6.1.57, v6.2.12, v6.4.16)
cqe_cancel.err() == .NOENT
cqe_crecv.err() == .NOBUFS
on kernel (tested with v6.5.0, v6.5.7)
cqe_cancel.err() == .SUCCESS
cqe_crecv.err() == .CANCELED
Context: user provides `-` to read from stdin instead of a file.
Before: it would create a file in $ZIG_LOCAL_CACHE_DIR/tmp/ for each
invocation and leave it there.
After: it hashes the file contents and renames the file to the hash.
Result: repeated invocations of `zig cc` do not cause so much trash to
be created.
See comment. This slightly regresses a previous fix from this branch. A
proper solution will come soon, with the splitting up of `Decl` into
`Cau` and `Nav`.
This makes tracking easier across incremental updates: `scanDecl` can
now tell whether an existing decl in a namespace was mapped to the one
it's analyzing in the new ZIR.
There is no reason to perform this detection during semantic analysis.
In fact, doing so is problematic, because we wish to utilize detection
of existing decls in a namespace in incremental compilation.
It's been seen on Windows 11 (22H2) Build 22621.3155 that NtCreateFile
will return the OBJECT_NAME_INVALID error code with certain path names.
The path name we saw this with started with `C:Users` (rather than
`C:\Users`) and also contained a `$` character. This PR updates our
OpenFile wrapper to propagate this error code as `error.BadPathName`
instead of making it `unreachable`.
see https://github.com/marler8997/zigup/issues/114#issuecomment-1994420791
I think of lerp() as a way to change coordinate systems, essentially
remapping the input numberline onto a shifted+rescaled numberline. In
my mind the full numberline is remapped, not just the 0-1 segment.
An example of how this is useful: in a game, you can write:
`myPos = lerp(pos0, pos1, easeOutBack(u))`
for some `u` that changes from 0 to 1 over time.
(see https://easings.net/#easeOutBack)
This will animate `myPos` between `pos0` and `pos1`, overshooting the
goal position `pos1` in a nicely-animated way.
`easeOutBack(float)->float` is a pure function that overshoots 1,
and by combining it with `lerp()` we can remap coordinates in other
coordinate systems, making them overshoot in the same way.
However, this overshooting is only possible because `easeOutBack(t)`
sometimes exceeds the range 0-1 (e.g. `easeOutBack(0.5)` is 1.0877),
which is not allowed by the current `math.lerp` implementation.
This commit removes the asserts that prevented this use-case. Now, any
value can be inputted for t. For example, `lerp(10,20, 2.0)` will now
return 30, instead of throwing an assert error.
In the example from the issue #19052 to_read holds 213_315_584
uncompressed bytes. Calling read with small output results in many
shifts of that big buffer.
This removes need to shift to_read after each read.
Andrew and I have discovered that on Linux max peak rss value
is taken to be `max(build_runner, test_suite)` and since the thunks
test emit a huge binary, we will easily exceed the declared maximum
for any of the test suites. This can be worked around for now by not
checking for $thunk symbols in this test since it doesn't really
yield any additional information; however ideally we would implement
per-thread local temp arena that can be freed.
Support linking against tbd-v3 SDKs such as those bundled with
Xcode 10.3 → 11.3.1 .
- Map target os=`ios` and abi=`macabi` to macho.PLATFORM.MACCATALYST.
This allows for matches against tbdv4 targets, eg. `x86_64-maccatalyst`.
- When parsing old tbdv3 files with `zippered` platform, we append
[`ARCH-macos`, `ARCH-maccatalyst`] to list of "tbd" targets. This
enables linking for standard targets like `ARCH-macos-none` and
maccatalyst targets `ARCH-ios-macabi`.
- Update mappings for macho platform and zig target macabi. While this
is not full maccatalyst support, a basic exe can be built as follows:
```
zig libc > libc.txt
zig build-exe z0.zig --libc libc.txt -target x86_64-ios-macabi
```
closes#19110
These systems write the number of *bits* of their inputs as a u64.
However if `@sizeOf(usize) == 4`, an input message or associated data
whose size is > 512 MiB could overflow.
On 64-bit systems, it is safe to assume that no machine has more than
2 EiB of memory.
This takes the code that was previously in src/Compilation.zig to turn resinator diagnostics into Zig error bundles and puts it in resinator/main.zig, and then makes resinator emit the resulting error bundles via std.zig.Server (which is used by the build runner, etc). Also adds support for turning Aro diagnostics into ErrorBundles.
This moves .rc/.manifest compilation out of the main Zig binary, contributing towards #19063
Also:
- Make resinator use Aro as its preprocessor instead of clang
- Sync resinator with upstream
Make std.tar look better in docs. Remove from public interface what is
not necessary. Add comment to the public methods. Add doctest as usage
examples for iterator and pipeToFileSystem.
Problem was manifested only on windows with target `-target
aarch64-windows-gnu`.
I was creating new files but not closing any of them. Tmp dir cleanup
hangs looping in deleteTree forever.
Fixing ci error: error:
'tar.test.test.pipeToFileSystem' failed: slices differ. first difference occurs at index 2 (0x2)
============ expected this output: ============= len: 9 (0x9)
2E 2E 2F 61 2F 66 69 6C 65 ../a/file
============= instead found this: ============== len: 9 (0x9)
2E 2E 5C 61 5C 66 69 6C 65 ..\a\file
After #19136 dir.symlink changes path separtors to \ on windows.
Fixing error from ci:
std/tar.zig:423:54: error: expected type 'usize', found 'u64'
std/tar.zig:423:54: note: unsigned 32-bit int cannot represent all possible unsigned 64-bit values
for the then end users:
1. Don't require user to call file.skip() on file returned from
iterator.next if file is not read. Iterator will now handle this.
Previously that returned header parsing error, without knowing some tar
internals it is hard to understand what is required from user.
2. Use iterator.File.kind enum which is similar to fs.File.Kind,
something familiar. Internal Header.Kind has many types which are not
exposed but the user needs to have else in kind switch to cover those
cases.
3. Add reader interface to the iterator.File.
Tar header stores name in max 256 bytes and link name in max 100 bytes.
Those are minimums for provided buffers. Error is raised during iterator
init if buffers are not long enough.
Pax and gnu extensions can store longer names. If such extension is
reached during unpack and don't fit into provided buffer error is
returned.
Fixes compilation errors in functions that are syntaxic sugar
to operate on serialized scalars.
Also make it explicit that square roots in fields whose size is
not congruent to 3 modulo 4 are not an error, they are just
not implemented yet.
Reported by @vitalonodo - Thanks!
A lot of these "shorthand" doc comments were redundant, low quality
filler content. Better to let the actual modules speak for themselves
with top level doc comments rather than trying to document their
aliases.
ML-KEM is the Kyber post-quantum secure key encapsulation mechanism,
as being standardized by NIST.
Too bad, they decided to rename it; the "Kyber" name was so much
better!
This implements the current draft (NIST FIPS-203), which is already
being deployed even though the specification is not finalized.
This fixes a bug where, at least with the LLVM backend, `@extern` calls
which had the same name as a normal `extern` in the same Zcu would
result in the `@extern` incorrectly suffixing the identifier `.2`.
Usually, the LLVM backend has a system to change the generated globals
to "collapse" them all together, but it only works if `updateDecl` is
called!
This replaces the errol backend with one based on ryu. The 128-bit
backend only is implemented. This supports all floating-point types and
does not use fp logic to print.
Closes#1181.
Closes#1299.
Closes#3612.
This commit adds several fixes and improvements for the Zig compiler
test harness.
1. -Dskip-translate-c option added for skipping the translate-c tests.
2. translate-c/run-translated-c tests in test/cases/* have been added to
the steps test-translate-c and test-run-translated-c. Closes#18224.
3. Custom name added to the CheckFile step for the translate-c step to
better communicate which test failed.
4. Test manifest key validation added to return an error if a manifest
contains an invalid key.
* `linux.IO_Uring` -> `linux.IoUring` to align with naming conventions.
* All functions `io_uring_prep_foo` are now methods `prep_foo` on `io_uring_sqe`, which is in a file of its own.
* `SubmissionQueue` and `CompletionQueue` are namespaced under `IoUring`.
This is a breaking change.
The new file and namespace layouts are more idiomatic, and allow us to
eliminate one more usage of `usingnamespace` from the standard library.
2 remain.
Some of the structs I shuffled around might be better namespaced under
CONTEXT, I'm not sure. However, for now, this approach preserves
backwards compatibility.
Eliminates one more usage of `usingnamespace` from the standard library.
3 remain.
Thanks to Zig's lazy analysis, it's fine for these symbols to be
declared on platform they won't exist on. This is already done in
several places in this file; e.g. `pthread` functions are declared
unconditionally.
Eliminates one more usage of `usingnamespace` from the standard library.
4 remain.
Searching GitHub indicated that the only use of these types in the wild is
support in getty-zig, and testing for that support. This eliminates 3 more uses
of usingnamespace from the standard library, and removes some unnecessarily
complex generic code.
This usage of `usingnamespace` was removed fairly trivially - the
resulting code is, IMO, more clear.
Eliminates one more usage of `usingnamespace` from the standard library.
This is a trivial change - this code did `usingnamespace` into an
otherwise empty namespace, so the outer namespace was just unnecessary.
Eliminates one more usage of `usingnamespace` from the standard library.
Previously, when multiple modules had builtin modules with identical
sources, two distinct `Module`s and `File`s were created pointing at the
same file path. This led to a bug later in the frontend. These modules
are now deduplicated with a simple hashmap on the builtin source.
This fixes an issue with the implementation of #18816. Consider the
following code:
```zig
pub fn Wrap(comptime T: type) type {
return struct {
pub const T1 = T;
inner: struct { x: T1 },
};
}
```
Previously, the type of `inner` was not considered to be "capturing" any
value, as `T1` is a decl. However, since it is declared within a generic
function, this decl reference depends on the context, and thus should be
treated as a capture.
AstGen has been augmented to tunnel references to decls through closure
when the decl was declared in a potentially-generic context (i.e. within
a function).
This implements the accepted proposal #18816. Namespace-owning types
(struct, enum, union, opaque) are no longer unique whenever analysed;
instead, their identity is determined based on their AST node and the
set of values they capture.
Reified types (`@Type`) are deduplicated based on the structure of the
type created. For instance, if two structs are created by the same
reification with identical fields, layout, etc, they will be the same
type.
This commit does not produce a working compiler; the next commit, adding
captures for decl references, is necessary. It felt appropriate to split
this up.
Resolves: #18816
Namespace types (`struct`, `enum`, `union`, `opaque`) do not use
structural equality - equivalence is based on their Decl index (and soon
will change to AST node + captures). However, we previously stored all
other information in the corresponding `InternPool.Key` anyway. For
logical consistency, it makes sense to have the key only be the true key
(that is, the Decl index) and to load all other data through another
function. This introduces those functions, by the name of
`loadStructType` etc. It's a big diff, but most of it is no-brainer
changes.
In future, it might be nice to eliminate a bunch of the loaded state in
favour of accessor functions on the `LoadedXyzType` types (like how we
have `LoadedUnionType.size()`), but that can be explored at a later
date.
This changes the representation of closures in Zir and Sema. Rather than
a pair of instructions `closure_capture` and `closure_get`, the system
now works as follows:
* Each ZIR type declaration (`struct_decl` etc) contains a list of
captures in the form of ZIR indices (or, for efficiency, direct
references to parent captures). This is an ordered list; indexes into
it are used to refer to captured values.
* The `extended(closure_get)` ZIR instruction refers to a value in this
list via a 16-bit index (limiting this index to 16 bits allows us to
store this in `extended`).
* `Module.Namespace` has a new field `captures` which contains the list
of values captured in a given namespace. This is initialized based on
the ZIR capture list whenever a type declaration is analyzed.
This change eliminates `CaptureScope` from semantic analysis, which is a
nice simplification; but the main motivation here is that this change is
a prerequisite for #18816.
Fixes a mismatch where the manifests would be written to the local cache dir, but the .def/.lib files themselves would be written to the global cache dir. This meant that if you cleared your global cache, your local cache would still think that the .lib files existed in the cache and it'd lead to 'No such file or directory' errors at linktime.
This mismatch was introduced in the stage1 -> stage2 transition of this code.
fill(0) will fill all bytes in bit reader. If bit reader is aligned to
the byte, as it is at the end of the stream this ensures no overshoot
when reading footer. Footer is 4 bytes (zlib) or 8 bytes (gzip). For
zlib we will use 4 bytes BitReader and 8 for gzip. After align and fill
we will read those bytes and leave BitReader empty after that.
* Introduce `-Ddebug-extensions` for enabling compiler debug helpers
* Replace safety mode checks with `std.debug.runtime_safety`
* Replace debugger helper checks with `!builtin.strip_debug_info`
Sometimes, you just have to debug optimized compilers...
Thanks to jacobly0 for figuring this out. The chain of events causing
the failure this triggered is as follows.
* As of a recent commit, certain bodies no longer emit a redundant
`block`, meaning there are more likely to be "interesting"
instructions (i.e. not blocks) at the end of parent GenZir scopes.
* When emitting the first `dbg_stmt` in such a body, the elision logic
incorrectly looks at a tag from an instruction in an enclosing scope.
* The tag of this instruction may be `undefined`, meaning that in unsafe
builds it may be incorrectly identified as a `dbg_stmt` instruction.
* This instruction from another body is clobbered rather than emitting
an actual `dbg_stmt` instruction. Note that this does not produce
invalid ZIR, since the creator of the undefined instruction replaces
the previously-undefined payload later.
In `ld -r` mode, the linker will emit `N_GSYM` for any defined
external symbols as well as private externals. In the former case,
the thing is easy since `N_EXT` bit will be set in the nlist's type.
In the latter however we will encounter a local symbol with `N_PEXT`
bit set (non-extern, but was private external) which we also need
to include when resolving symbol stabs.
The major change in the logic for parsing symbol stabs per input
object file is that we no longer try to force-resolve a `N_GSYM`
as a global symbol. This was a mistake since every symbol stab
always describes a symbol defined within the parsed input object file.
We then work out if we should forward `N_GSYM` in the output symtab
after we have resolved all symbols, but never before - intel we lack
when initially parsing symbol stabs. Therefore, we simply record
which symbol has a debug symbol stab, and work out its precise type
when emitting output symtab after symbol resolution has been done.
Now, all the tests that use `testWithAllSupportedPathTypes` will also run each test with both `/` and `\` as the path separator on Windows.
Also, removes the now-redundant "Dir.symLink with relative target that has a / path separator" since the same thing is now tested in the "Dir.readLink" test
Since we now elide more ZIR blocks in AstGen, care must be taken in
codegen to introduce lexical scopes for every body, not just `block`s.
Also, elide a few unnecessary AIR blocks in Sema.
The signature and variants of Sema's main loop have evolved over time to
what was a quite confusing state of affairs. This commit makes minor
changes to how `analyzeBodyInner` works, and restructures/renames the
wrapper functions, adding doc comments to clarify their purposes. The
most notable change is that `analyzeBodyInner` now returns
`CompileError!void`; inline breaks are now all communicated via
`error.ComptimeBreak`.
In the code `if (cond) { ... }`, the "then body" of the `if` is
technically a block. However, we don't need to emit a real ZIR `block`
corresponding to it, because we are already within a condbr body; we
have a separate gz, and appropriate scoping for allocs and debug
variables. In this case, and many like it, we can trivially elide the
block here, instead emitting the block statements directly into the
current `GenZir`. This results in a significant decrease in ZIR bytes
for real code.
Certain symbols were left unmarked, meaning they would not be emit into
the final binary incorrectly. We now mark the synthetic symbols to ensure
they are emit as they are already created under the circumstance they're
needed for. This also re-enables disabled tests that were left disabled
in a previous merge conflict.
Lastly, this adds the shared-memory test to the test harnass as it was
previously forgotten and therefore regressed.
Rather than using the logger, we now emit proper 'compiler'-errors just
like the ELF and MachO linkers with notes. We now also support emitting
multiple errors before quiting the linking process in certain phases,
such as symbol resolution. This means we will print all symbols which
were resolved incorrectly, rather than the first one we encounter.
This introduces some type safety so we cannot accidently give an atom
index as a symbol index. This also means we do not have to store any
optionals and therefore allow for memory optimizations. Lastly, we can
now always simply access the symbol index of an atom, rather than having
to call `getSymbolIndex` as it is easy to forget.
We now use a single function to use the in-house WebAssembly linker
rather than wasm-ld. For both incremental compilation and traditional
linking we use the same codepath.
Previously we could directly write the type index because we used the
index that was known in the final binary. However, as we now process
the Zig module as its own relocatable object file, we must ensure to
generate a relocation for type indexes. This also ensures that we can
later link the relocatable object file as a standalone also.
This also fixes generating indirect function table entries for ZigObject
as it now correctly points to the relocation symbol index rather than
the symbol index that owns the relocation.
We cannot keep function indexes as maxInt(u32) due to functions being
dedupliated when they point to the same function. For this reason we now
use a regular arraylist which will have new functions appended to, and
when deleted, its index is appended to the free list, allowing us to
re-use slots in the function list.
Removes the symbol from the decl's list of exports, marks it as dead,
as well as appends it to the symbol free list. Also removes it from
the list of global symbols as all exports are global.
In the future we should perhaps use a map for the export list to prevent
linear lookups. But this requires a benchmark as having more than 1
export for the same decl is very rare.
We now correctly create a symbol for each exported decl with its export-
name. The symbol points to the same linker-object. We store a map from
decl to all of its exports so we can update exports if it already exists
rather than infinitely create new exports.
We now parse the decls right away into atoms and allocate the
corresponding linker-object, such as segment and function, rather than
waiting until `flushModule`.
This function was previously only called by the backend which generates
a synthetical function that is not represented by any AIR or Zig code.
For this reason, the ownership is moved to the zig-object and stored
there so it can be linked with the other object files without the driver
having to specialize it.
Rather than using the optional, we now directly use `File.Index` which
can already represent an unknown file due to its `.null` value. This
means we do not pay for the memory cost.
This type of index is now used for:
- SymbolLoc
- Key of the functions map
- InitFunc
Now we can simply pass things like atom.file, object.file, loc.file etc
whenever we need to access its representing object file which makes it
a lot easier.
When merging sections we now make use of the `File` abstraction so all
objects such as globals, functions, imports, etc are also merged from
the `ZigObject` module. This allows us to use a singular way to perform
each link action without having to check the kind of the file.
The logic is mostly handled in the abstract file module, unless its
complexity warrants the handling within the corresponding module itself.
CodeGen will create linking objects such as symbols, function types, etc
in ZigObject, rather than in the linker driver where the final result
will be stored. They will end up in the linker driver module during
the `flush` phase instead.
This must mean we must call functions such as `addOrGetFuncType` in the
correct namespace or else it will be created in the incorrect list and
therefore return incorrect indexes.
When we have a ZigCompileUnit and don't use LLVM, we initialize the
ZigObject which will encapsulate the Zig Module as an object file in-
memory. During initialization we also create symbols which the object
will need such as the stack pointer.
Rather than specializing the linker-driver to be able to handle objects
generated by a ZCU, we store all data in-memory in ZigObject. ZigObject
acts more like a regular object file which will allow us to treat it
as us. This will make linking much more simple, but will also reduce
the complexity of incremental-linking as we can simply update ZigObject
and relink it.
Just to pass ci of regression fix#19126.
I'll return to this later.
Currently can't reproduce on my Windows wm, here I'm failing on symlink creation
in ci fails later in the process.
In the iterator function which is the low-level API, don't depend on
`std.fs.MAX_PATH_BYTES` because this is not defined on all operating
systems, such as freestanding.
However in such environments it still makes sense to be able to extract
from a tar file.
An even more flexible solution would be to accept the buffers as
arguments to iterator() which I think is a good idea, but for now let's
just set the same upper limmit across all operating systems.
Part of #19063.
Primarily, this moves Aro from deps/ to lib/compiler/ so that it can be
lazily compiled from source. src/aro_translate_c.zig is moved to
lib/compiler/aro_translate_c.zig and some of Zig CLI logic moved to a
main() function there.
aro_translate_c.zig becomes the "common" import for clang-based
translate-c.
Not all of the compiler was able to be detangled from Aro, however, so
it still, for now, remains being compiled with the main compiler
sources due to the clang-based translate-c depending on it. Once
aro-based translate-c achieves feature parity with the clang-based
translate-c implementation, the clang-based one can be removed from Zig.
Aro made it unnecessarily difficult to depend on with these .def files
and all these Zig module requirements. I looked at the .def files and
made these observations:
- The canonical source is llvm .def files.
- Therefore there is an update process to sync with llvm that involves
regenerating the .def files in Aro.
- Therefore you might as well just regenerate the .zig files directly
and check those into Aro.
- Also with a small amount of tinkering, the file size on disk of these
generated .zig files can be made many times smaller, without
compromising type safety in the usage of the data.
This would make things much easier on Zig as downstream project,
particularly we could remove those pesky stubs when bootstrapping.
I have gone ahead with these changes since they unblock me and I will
have a chat with Vexu to see what he thinks.
I moved part of the compiler that checks environment variables to the
standard library, so it doesn't have access to `build_options.only_c`
anymore, which means some environment variable checks are leaking into
the zig1.wasm build of zig. This logic still makes it return "no
environment variables found" though.
InvalidHandle in OpenError is no longer a possible error on any platform. In the past it was able to be returned in `openOptionsFromFlagsWasi`, but the implementation was changed in 7680c5330c to make it no longer possible.
InvalidHandle in RealPathError was a holdover from before d5312d53a0, which made realpath a compile error on WASI. However, InvalidHandle was also a possible error in the FreeBSD fallback implementation added in 537624734c. This commit changes the FreeBSD fallback implementation to return FileNotFound instead of InvalidHandle which matches how EBADF is handled in all the other `realpath` implementations (including the FreeBSD non-fallback implementation).
Closes#19084
* Support different keep alive defaults with different http versions.
* Fix incorrect usage of `copyBackwards`, which copies in a backwards
direction allowing data to be moved forward in a buffer, not
backwards in a buffer.
- Add default values to the list of comptime-known elements in
`zirValidatePtrArrayInit`
- In `structFieldValueComptime`, only assert `haveFieldInits` if we
enter the`fieldIsComptime` branch (otherwise they are not needed).
Before this fix, passing an undefined union value to `Sema.switchCond`
returned an undefined value of the union type, not the tag type, since
`Value.unionTag` forwards undefined values unchanged.
This leads us into the `.Union` branch in `Sema.zirSwitchBlock` which is
unreachable, now we take the `.Enum` branch instead.
The generic call `S.foo()` was evaluated with the
capture scope of the owner decl (i.e the `test` block), when it should
use the capture scope of the function declaration.
Follow up to #19079, which made test names fully qualified.
This fixes tests that now-redundant information in their test names. For example here's a fully qualified test name before the changes in this commit:
"priority_queue.test.std.PriorityQueue: shrinkAndFree"
and the same test's name after the changes in this commit:
"priority_queue.test.shrinkAndFree"
This commit eliminates the `dbg_block_{begin,end}` instructions from
both ZIR and AIR. Instead, lexical scoping of `dbg_var_{ptr,val}`
instructions is decided based on the AIR block they exist within. This
is a much more robust system, and also results in a huge drop in ZIR
bytes - around 7% for Sema.zig.
This required some enhancements to Sema to prevent elision of blocks
when they are required for debug variable scoping. This can be observed
by looking at the AIR for the following simple test program with and
without `-fstrip`:
```zig
export fn f() void {
{
var a: u32 = 0;
_ = &a;
}
{
var a: u32 = 0;
_ = &a;
}
}
```
When `-fstrip` is passed, no AIR blocks are generated. When `-fno-strip`
is passed, the ZIR blocks are lowered to true AIR blocks to give correct
lexical scoping to the debug vars.
The changes here incidentally reolve #19060. A corresponding behavior
test has been added.
Resolves: #19060
* make test names contain the fully qualified name
* make test filters match the fully qualified name
* allow multiple test filters, where a test is skipped if it does not
match any of the specified filters
Found while fuzzing. Previously 1.1897314953572317650857593266280070162E4932
was parsed as +inf, which caused issues for round-trip serialization of
floats. Only f128 had issues, but have added other tests for all
floating point large normals.
The max_exponent for f128 was wrong, it is subtly different in the
decimal code-path as it is based on where the decimal digit should go.
This needs to be 2 greater than the max exponent (e.g. 308 or 4932) to
work correctly (greater by 1, then we use a >= comparision).
In addition, I've removed the redundant `optimize` constant which was only
use for testing the slow path locally.
std.heap.c_allocator was already doing this, however,
std.heap.raw_c_allocator, which asserts no allocations more than 16
bytes aligned, was not.
The zig compiler uses std.heap.raw_c_allocator, so it is affected by
this.
Like in issue #18089, this tar contains, same file name in two case
sensitive name version. Unpack should fail on case insensitive file
systems and succeed on case sensitive.
$ tar tvf 18089.tar
18089/
18089/alacritty/
18089/alacritty/darkermatrix.yml
18089/alacritty/Darkermatrix.yml
Windows paths now use WTF-16 <-> WTF-8 conversion everywhere, which is lossless. Previously, conversion of ill-formed UTF-16 paths would either fail or invoke illegal behavior.
WASI paths must be valid UTF-8, and the relevant function calls have been updated to handle the possibility of failure due to paths not being encoded/encodable as valid UTF-8.
Closes#18694Closes#1774Closes#2565
Ill-formed UTF-8 byte sequences are replaced by the replacement character (U+FFFD) according to "U+FFFD Substitution of Maximal Subparts" from Chapter 3 of the Unicode standard, and as specified by https://encoding.spec.whatwg.org/#utf-8-decoder
ie.
C:\zig\current\lib\std\debug.zig:726:23: error: no field or member function named 'getDwarfInfoForAddress' in 'dwarf.DwarfInfo'
if (try module.getDwarfInfoForAddress(unwind_state.debug_info.allocator, unwind_state.dwarf_context.pc)) |di| {
~~~~~~^~~~~~~~~~~~~~~~~~~~~~~
C:\zig\current\lib\std\dwarf.zig:663:23: note: struct declared here
pub const DwarfInfo = struct {
^~~~~~
referenced by:
next_internal: C:\zig\current\lib\std\debug.zig:737:29
next: C:\zig\current\lib\std\debug.zig:654:31
remaining reference traces hidden; use '-freference-trace' to see all reference traces
C:\zig\current\lib\std\debug.zig:970:31: error: no field or member function named 'getSymbolAtAddress' in 'dwarf.DwarfInfo'
const symbol_info = module.getSymbolAtAddress(debug_info.allocator, address) catch |err| switch (err) {
~~~~~~^~~~~~~~~~~~~~~~~~~
C:\zig\current\lib\std\dwarf.zig:663:23: note: struct declared here
pub const DwarfInfo = struct {
Found by fuzzing. Previous numeric function assumed that is is getting
buffer of size 12, mode is size 8. Fuzzing found overflow.
Fixing and adding test cases.
The HTTP specification does not provide a way to escape \r\n in headers,
so it's the API user's responsibility to ensure the header names and
values do not contain \r\n. Also header names must not contain ':'.
It's an assertion, not an error, because the calling code very likely is
using hard-coded values or server-provided values that do not need to be
checked, and the error would be unreachable anyway.
Untrusted user input must not be put directly into into HTTP headers.
The API automatically handles these requests as expected. After
receiveHead(), the server has a chance to notice the expectation and do
something about it. If it does not, then the Server implementation will
handle it by sending the continuation header when the read stream is
created.
Both respond() and respondStreaming() send the continuation header as
part of discarding the request body, only if the read stream has not
already been created.
Before I mistakenly thought that missing content-length meant zero when
it actually means to stream until the connection is closed.
Now the respond() function accepts transfer_encoding which can be left
as default (use content.len for content-length), set to none which makes
it omit the content-length, or chunked, which makes it format the
response as a chunked transfer even though the server has the entire
contents already buffered.
The echo-content tests are moved from test/standalone/http.zig to the
standard library where they are actually run.
* Uncouple std.http.ChunkParser from protocol.zig
* Fix receiveHead not passing leftover buffer through the header parser.
* Fix content-length read streaming
This implementation handles the final chunk length correctly rather than
"hoping" that the buffer already contains \r\n.
In a previous commit I removed a load-bearing use of `@hasDecl` to
detect whether the SO.REUSEPORT option should be set. `@hasDecl` should
not be used for OS feature detection because it can hide bugs.
The new logic checks for the operating system specifically and then does
the thing that is supposed to be done on that operating system directly.
Mainly, this removes the poorly named `wait`, `send`, `finish`
functions, which all operated on the same "Response" object, which was
actually being used as the request.
Now, it looks like this:
1. std.net.Server.accept() gives you a std.net.Server.Connection
2. std.http.Server.init() with the connection
3. Server.receiveHead() gives you a Request
4. Request.reader() gives you a body reader
5. Request.respond() is a one-shot, or Request.respondStreaming() creates
a Response
6. Response.writer() gives you a body writer
7. Response.end() finishes the response; Response.endChunked() allows
passing response trailers.
In other words, the type system now guides the API user down the correct
path.
receiveHead allows extra bytes to be read into the read buffer, and then
will reuse those bytes for the body or the next request upon connection
reuse.
respond(), the one-shot function, will send the entire response in one
syscall.
Streaming response bodies no longer wastefully wraps every call to write
with a chunk header and trailer; instead it only sends the HTTP chunk
wrapper when flushing. This means the user can still control when it
happens but it also does not add unnecessary chunks.
Empirically, in my example project that uses this API, the usage code is
significantly less noisy, it has less error handling while handling
errors more correctly, it's more obvious what is happening, and it is
syscall-optimal.
Additionally:
* Uncouple std.http.HeadParser from protocol.zig
* Delete std.Server.Connection; use std.net.Server.Connection instead.
- The API user supplies the read buffer when initializing the
http.Server, and it is used for the HTTP head as well as a buffer
for reading the body into.
* Replace and document the State enum. No longer is there both "start"
and "first".
Before, this code constructed an arena allocator and then used it when
handling redirects.
You know what's better than having threads fight over an allocator?
Avoiding dynamic memory allocation in the first place.
This commit reuses the http headers static buffer for handling
redirects. The new location is copied to the beginning of the static
header buffer and then the subsequent request uses a subslice of that
buffer.
* "storage" is a better name than "strategy".
* The most flexible memory-based storage API is appending to an
ArrayList.
* HTTP method should default to POST if there is a payload.
* Avoid storing unnecessary data in the FetchResult
* Avoid the need for a deinit() method in the FetchResult
The decisions that this logic made about how to handle files is beyond
repair:
- fail to use sendfile() on a plain connection
- redundant stat
- does not handle arbitrary streams
So, file-based response storage is no longer supported. Users should use
the lower-level open() API which allows avoiding these pitfalls.
* add API for iterating over custom HTTP headers
* remove `trailing` flag from std.http.Client.parse. Instead, simply
don't call parse() for trailers.
* fix the logic inside that parse() function. it was using wrong std.mem
functions, ignoring malformed data, and returned errors on dead
branches.
* simplify logic inside wait()
* fix HeadersParser not dropping the 2 read bytes of \r\n after a
chunked transfer
* move the trailers test to be a std lib unit test and make it pass
This reverts commit 42be972a72c86b32ad8403d082ab42763c6facec.
Using a bit to distinguish between headers and trailers is fine. It was
just named and documented poorly.
I originally removed these in 402f967ed5.
I allowed them to be added back in #15299 because they were smuggled in
alongside a bug fix, however, I wasn't kidding when I said that I wanted
to take the design of std.http in a different direction than using this
data structure.
Instead, some headers are provided via explicit field names populated
while parsing the HTTP request/response, and some are provided via
new fields that support passing extra, arbitrary headers.
This resulted in simplification of logic in many places, as well as
elimination of the possibility of failure in many places. There is
less deinitialization code happening now.
Furthermore, it made it no longer necessary to clone the headers data
structure in order to handle redirects.
http_proxy and https_proxy fields are now pointers since it is common
for them to be unpopulated.
loadDefaultProxies is changed into initDefaultProxies to communicate
that it does not actually load anything from disk or from the network.
The function now is leaky; the API user must pass an already
instantiated arena allocator. Removes the need to deinitialize proxies.
Before, proxies stored arbitrary sets of headers. Now they only store
the authorization value.
Removed the duplicated code between https_proxy and http_proxy. Finally,
parsing failures of the environment variables result in errors being
emitted rather than silently ignoring the proxy.
error.CompressionNotSupported is renamed to
error.CompressionUnsupported, matching the naming convention from all
the other errors in the same set.
Removed documentation comments that were redundant with field and type
names.
Disabling zstd decompression in the server for now; see #18937.
I found some apparently dead code in src/Package/Fetch/git.zig. I want
to check with Ian about this.
I discovered that test/standalone/http.zig is dead code, it is only
being compiled but not being run. Furthermore it hangs at the end if you
run it manually. The previous commits in this branch were written under
the assumption that this test was being run with
`zig build test-standalone`.
This is a state machine that already has a `state` field. No need to
additionally store "done" - it just makes things unnecessarily
complicated and buggy.
The buffer for HTTP headers is now always provided via a static buffer.
As a consequence, OutOfMemory is no longer a member of the read() error
set, and the API and implementation of Client and Server are simplified.
error.HttpHeadersExceededSizeLimit is renamed to
error.HttpHeadersOversize.
Running fuzzing tar test with [zig std lib
fuzzing](https://github.com/squeek502/zig-std-lib-fuzzing) reached and
assert in tar implementation. Assert (in std lib) should not be
reachable by external input, so I'm fixing this to return error.
If an adapted string key with embedded nulls was put in a hash map with
`std.hash_map.StringIndexAdapter`, then an incorrect hash would be
entered for that entry such that it is possible that when looking for
the exact key that matches the prefix of the original key up to the
first null would sometimes match this entry due to hash collisions and
sometimes not if performed later after a grow + rehash, causing the same
key to exist with two different indices breaking every string equality
comparison ever, for example claiming that a container type doesn't
contain a field because the field name string in the struct and the
string representing the identifier to lookup might be equal strings but
have different string indices. This could maybe be fixed by changing
`std.hash_map.StringIndexAdapter.hash` to only hash up to the first
null, therefore ensuring that the entry's hash is correct and that all
future lookups will be consistent, but I don't trust anything so instead
I assert that there are no embedded nulls.
This fixes a problem where empty strings where not emitted as null.
This should also emit a smaller stringtab as all metadata strings were
emitted in both the strtab and in the strings block inside the metadata
block.
BufferedTee provides reader interface to the consumer. Data read by consumer
is also written to the output. Output is hold lookahead_size bytes behind
consumer. Allowing consumer to put back some bytes to be read again. On flush
all consumed bytes are flushed to the output.
input -> tee -> consumer
|
output
input - underlying unbuffered reader
output - writer, receives data read by consumer
consumer - uses provided reader interface
If lookahead_size is zero output always has same bytes as consumer.
The LLVM bitcode requires all type references in
structs to be to earlier defined types.
We make sure types are ordered in the builder
itself in order to avoid having to iterate the
types multiple times and changing the values
of type indicies.
This introduces the new test step `test-c-import`, and removes the
ability of the behavior tests to `@cImport` paths relative to `test`.
This allows the behavior tests to be run without translate c.
This commit works around #18967 by adding an `AccumulatingReader`, which
accumulates data read from the underlying packfile, and by keeping track
of the position in the packfile and hash/checksum information separately
rather than using reader composition. That is, the packfile position and
hashes/checksums are updated with the accumulated read history data only
after we can determine what data has actually been used by the
decompressor rather than merely being buffered.
The only addition to the standard library APIs to support this change is
the `unreadBytes` function in `std.compress.flate.Inflate`, which allows
the user to determine how many bytes have been read only for buffering
and not used as part of compressed data.
These changes can be reverted if #18967 is resolved with a decompressor
that reads precisely only the number of bytes needed for decompression.
This code is run when printing a stack trace in a debug executable, so
it has to be fast even without compiler optimizations.
Adding a `@panic` to the top of `main` and running an x86_64 backend
compiled compiler goes from `1m32.773s` to `0m3.232s`.
Until now literal and distance code lengths where treated as two
different arrays. But according to rfc they can overlap:
The code length repeat codes can cross from HLIT + 257 to the
HDIST + 1 code lengths. In other words, all code lengths form
a single sequence of HLIT + HDIST + 258 values.
Now code lengths are decoded in single array which is then split
to literal and distance part.
Similar to the previous commit, errors coercing the panic message to
`[]const u8` now point at the operand to `@panic` rather than the actual
builtin call.
When coercing the operand of a `ret_node` etc instruction, the source
location for errors used to point to the entire `return` statement.
Instead, we now point to the operand, as would be expected if there was
an explicit `as_node` instruction (like there used to be).
Previously, the `src_node` field of `struct_decl`, `union_decl`,
`enum_decl`, and `opaque_decl` was optional, included in trailing data
only if a flag in `Small` was set. However, this was unnecessary logic:
AstGen always provided the source node. We can simplify a few bits of
logic by making this field non-optional, moving it into non-trailing
data.
There was one place where the field was actually omitted before: the
root struct of a file was at source node 0, so the node was
coincidentally elided. Therefore, this commit has a fixed cost of 4
bytes of ZIR per file.
In most cases where AstGen is coercing to a fixed type (such as `u29`,
`type`, `std.builtin.CallingConvention) we do not necessarily require an
explicit coercion instruction. Instead, Sema knows the type that is
required, and can perform the coercion after the fact. This means we can
use the `coerced_ty` result location kind, saving unnecessary coercion
instructions and therefore ZIR bytes.
This required a few enhancements to Sema to introduce missing coercions.
`sema.src` is a failed experiment. It introduces complexity, and makes
often unwarranted assumptions about the existence of instructions
providing source locations, requiring an unreasonable amount of caution
in AstGen for correctness. Eliminating it simplifies the whole frontend.
This required adding source locations to a few instructions, but the
cost in ZIR bytes should be counteracted by the other work on this
branch.
AstGen has logic to elide leading `dbg_stmt` instructions when multiple
are emitted consecutively; however, it only applied in some cases. A
simple reshuffle here makes this logic apply universally, saving some
bytes in ZIR.
This is a small optimization to generated ZIR. In any function where the
return type is not a trivial Ref, we know it is almost certainly not
`void` (unless the user aliased it or did something else weird to fool
AstGen), and thus the return type is very likely to be required for
return value RLS at some point. Thus, we can just emit one `ret_type` at
the start of the function and use it throughout.
This sees a very small improvement in overall ZIR bytes.
It was usefull during development.
From andrewrk code review comment:
In fact, Zig does not guarantee the @sizeOf structs, and so these tests are not valid.
Encountered in a recent CI run on an aarch64-windows dev kit.
Pretty sure I disabled the virus scanner but it looks like it turned
itself back on with a Windows Update.
Rather than marking the new error code as unreachable in the places
where it is unexpected, this commit makes it return `error.Unexpected`.
Zig deflate compression/decompression implementation. It supports compression and decompression of gzip, zlib and raw deflate format.
Fixes#18062.
This PR replaces current compress/gzip and compress/zlib packages. Deflate package is renamed to flate. Flate is common name for deflate/inflate where deflate is compression and inflate decompression.
There are breaking change. Methods signatures are changed because of removal of the allocator, and I also unified API for all three namespaces (flate, gzip, zlib).
Currently I put old packages under v1 namespace they are still available as compress/v1/gzip, compress/v1/zlib, compress/v1/deflate. Idea is to give users of the current API little time to postpone analyzing what they had to change. Although that rises question when it is safe to remove that v1 namespace.
Here is current API in the compress package:
```Zig
// deflate
fn compressor(allocator, writer, options) !Compressor(@TypeOf(writer))
fn Compressor(comptime WriterType) type
fn decompressor(allocator, reader, null) !Decompressor(@TypeOf(reader))
fn Decompressor(comptime ReaderType: type) type
// gzip
fn compress(allocator, writer, options) !Compress(@TypeOf(writer))
fn Compress(comptime WriterType: type) type
fn decompress(allocator, reader) !Decompress(@TypeOf(reader))
fn Decompress(comptime ReaderType: type) type
// zlib
fn compressStream(allocator, writer, options) !CompressStream(@TypeOf(writer))
fn CompressStream(comptime WriterType: type) type
fn decompressStream(allocator, reader) !DecompressStream(@TypeOf(reader))
fn DecompressStream(comptime ReaderType: type) type
// xz
fn decompress(allocator: Allocator, reader: anytype) !Decompress(@TypeOf(reader))
fn Decompress(comptime ReaderType: type) type
// lzma
fn decompress(allocator, reader) !Decompress(@TypeOf(reader))
fn Decompress(comptime ReaderType: type) type
// lzma2
fn decompress(allocator, reader, writer !void
// zstandard:
fn DecompressStream(ReaderType, options) type
fn decompressStream(allocator, reader) DecompressStream(@TypeOf(reader), .{})
struct decompress
```
The proposed naming convention:
- Compressor/Decompressor for functions which return type, like Reader/Writer/GeneralPurposeAllocator
- compressor/compressor for functions which are initializers for that type, like reader/writer/allocator
- compress/decompress for one shot operations, accepts reader/writer pair, like read/write/alloc
```Zig
/// Compress from reader and write compressed data to the writer.
fn compress(reader: anytype, writer: anytype, options: Options) !void
/// Create Compressor which outputs the writer.
fn compressor(writer: anytype, options: Options) !Compressor(@TypeOf(writer))
/// Compressor type
fn Compressor(comptime WriterType: type) type
/// Decompress from reader and write plain data to the writer.
fn decompress(reader: anytype, writer: anytype) !void
/// Create Decompressor which reads from reader.
fn decompressor(reader: anytype) Decompressor(@TypeOf(reader)
/// Decompressor type
fn Decompressor(comptime ReaderType: type) type
```
Comparing this implementation with the one we currently have in Zig's standard library (std).
Std is roughly 1.2-1.4 times slower in decompression, and 1.1-1.2 times slower in compression. Compressed sizes are pretty much same in both cases.
More resutls in [this](https://github.com/ianic/flate) repo.
This library uses static allocations for all structures, doesn't require allocator. That makes sense especially for deflate where all structures, internal buffers are allocated to the full size. Little less for inflate where we std version uses less memory by not preallocating to theoretical max size array which are usually not fully used.
For deflate this library allocates 395K while std 779K.
For inflate this library allocates 74.5K while std around 36K.
Inflate difference is because we here use 64K history instead of 32K in std.
If merged existing usage of compress gzip/zlib/deflate need some changes. Here is example with necessary changes in comments:
```Zig
const std = @import("std");
// To get this file:
// wget -nc -O war_and_peace.txt https://www.gutenberg.org/ebooks/2600.txt.utf-8
const data = @embedFile("war_and_peace.txt");
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer std.debug.assert(gpa.deinit() == .ok);
const allocator = gpa.allocator();
try oldDeflate(allocator);
try new(std.compress.flate, allocator);
try oldZlib(allocator);
try new(std.compress.zlib, allocator);
try oldGzip(allocator);
try new(std.compress.gzip, allocator);
}
pub fn new(comptime pkg: type, allocator: std.mem.Allocator) !void {
var buf = std.ArrayList(u8).init(allocator);
defer buf.deinit();
// Compressor
var cmp = try pkg.compressor(buf.writer(), .{});
_ = try cmp.write(data);
try cmp.finish();
var fbs = std.io.fixedBufferStream(buf.items);
// Decompressor
var dcp = pkg.decompressor(fbs.reader());
const plain = try dcp.reader().readAllAlloc(allocator, std.math.maxInt(usize));
defer allocator.free(plain);
try std.testing.expectEqualSlices(u8, data, plain);
}
pub fn oldDeflate(allocator: std.mem.Allocator) !void {
const deflate = std.compress.v1.deflate;
// Compressor
var buf = std.ArrayList(u8).init(allocator);
defer buf.deinit();
// Remove allocator
// Rename deflate -> flate
var cmp = try deflate.compressor(allocator, buf.writer(), .{});
_ = try cmp.write(data);
try cmp.close(); // Rename to finish
cmp.deinit(); // Remove
// Decompressor
var fbs = std.io.fixedBufferStream(buf.items);
// Remove allocator and last param
// Rename deflate -> flate
// Remove try
var dcp = try deflate.decompressor(allocator, fbs.reader(), null);
defer dcp.deinit(); // Remove
const plain = try dcp.reader().readAllAlloc(allocator, std.math.maxInt(usize));
defer allocator.free(plain);
try std.testing.expectEqualSlices(u8, data, plain);
}
pub fn oldZlib(allocator: std.mem.Allocator) !void {
const zlib = std.compress.v1.zlib;
var buf = std.ArrayList(u8).init(allocator);
defer buf.deinit();
// Compressor
// Rename compressStream => compressor
// Remove allocator
var cmp = try zlib.compressStream(allocator, buf.writer(), .{});
_ = try cmp.write(data);
try cmp.finish();
cmp.deinit(); // Remove
var fbs = std.io.fixedBufferStream(buf.items);
// Decompressor
// decompressStream => decompressor
// Remove allocator
// Remove try
var dcp = try zlib.decompressStream(allocator, fbs.reader());
defer dcp.deinit(); // Remove
const plain = try dcp.reader().readAllAlloc(allocator, std.math.maxInt(usize));
defer allocator.free(plain);
try std.testing.expectEqualSlices(u8, data, plain);
}
pub fn oldGzip(allocator: std.mem.Allocator) !void {
const gzip = std.compress.v1.gzip;
var buf = std.ArrayList(u8).init(allocator);
defer buf.deinit();
// Compressor
// Rename compress => compressor
// Remove allocator
var cmp = try gzip.compress(allocator, buf.writer(), .{});
_ = try cmp.write(data);
try cmp.close(); // Rename to finisho
cmp.deinit(); // Remove
var fbs = std.io.fixedBufferStream(buf.items);
// Decompressor
// Rename decompress => decompressor
// Remove allocator
// Remove try
var dcp = try gzip.decompress(allocator, fbs.reader());
defer dcp.deinit(); // Remove
const plain = try dcp.reader().readAllAlloc(allocator, std.math.maxInt(usize));
defer allocator.free(plain);
try std.testing.expectEqualSlices(u8, data, plain);
}
```
* Add `timedWait` to `std.Thread.Semaphore`
Add example to documentation of `std.Thread.Semaphore`
* Add unit test for thread semaphore timed wait
Fix missing try
* Change unit test to be simpler
* Change `timedWait()` to keep a deadline
* Change `timedWait()` to return earlier in some scenarios
* Change `timedWait()` to keep a deadline (based on std.Timer)
(similar to std.Thread.Futex)
---------
Co-authored-by: protty <45520026+kprotty@users.noreply.github.com>
* std.c: consolidate some definitions, making them share code. For
example, freebsd, dragonfly, and openbsd can all share the same
`pthread_mutex_t` definition.
* add type safety to std.c.O
- this caught a bug where mode flags were incorrectly passed as the
open flags.
* 3 fewer uses of usingnamespace keyword
* as per convention, remove purposeless field prefixes from struct field
names even if they have those prefixes in the corresponding C code.
* fix incorrect wasi libc Stat definition
* remove C definitions from incorrectly being in std.os.wasi
* make std.os.wasi definitions type safe
* go through wasi native APIs even when linking libc because the libc
APIs are problematic and wasteful
* don't expose WASI definitions in std.posix
* remove std.os.wasi.rights_t.ALL: this is a footgun. should it be all
future rights too? or only all current rights known? both are
the wrong answer.
When an argument is a 'local', which is the case when it's a parameter,
we should not attempt to load it from memory. Instead, we directly emit
it to the stack. Only when the `WValue` is ensure to live in the linear
data section do we load it from memory onto the stack.
closes#18894
The default logging function used to have no buffer. So a single log
statement could result in many individual write syscalls each writing
only a couple of bytes.
After this change the logging function now has a 4kb buffer. Only log
statements longer than 4kb now do multiple write syscalls.
4kb is the default bufferedWriter size and was choosen arbitrarily.
The downside of this is that the log function now allocates 4kb more
stack space but I think that is an acceptable trade-off.
The old definitions had some problems:
- In `lowerBound`, the `lhs` (left hand side) argument was passed on the
right hand side.
- In `upperBound`, the `greaterThan` function needed to return
`greaterThanOrEqual` for the function work, so either the name or the
implementation is incorrect.
To fix both problems, define the functions in terms of a `lessThan` function that returns `lhs < rhs`.
The is more consistent with the rest of `sort.zig` and it's also how C++ implements lower/upperBound (1)(2).
(1) https://en.cppreference.com/w/cpp/algorithm/lower_bound
(2) https://en.cppreference.com/w/cpp/algorithm/upper_bound
- Rewrite doc comments.
- Add a couple of more test cases.
- Add docstring for std.sort.binarySearch
Authored by https://github.com/CraigglesO
Original Discussion: #9890
I already had to create these functions and test cases for my own
project so I decided to contribute to the main code base in hopes it
would simplify my own.
I realize this is still under discussion but this was a trivial amount
of work so I thought I could help nudge the discussion towards a
decision.
Why add these to the standard library
To better illustrate and solidify their value, the standard library's
"sort" module already contains several binary search queries on arrays
such as binarySearch and internal functions binaryFirst & binaryLast. A
final example of its use: the Zig code itself created and used a
bounding search in the linker.
There still lacks the ability to allow the programmer themselves to
search the array for a rough position and find an index to read &/
update.
Adding these functions would also help to complement dynamic structures
like ArrayList with it's insert function.
Example Case
I'm building a library in Zig for GIS geometry. To store points, lines,
and polygons each 3D point is first translated into what's called an
S2CellId. This is a fancy way of saying I reduce the Earth's spherical
data into a 1D Hilbert Curve with cm precision. This gives me 2
convenient truths:
Hilbert Curves have locality of reference.
All points can be stored inside a 1D array
Since lowerBound and upperBound to find data inside a radius for
instance. If I'm interested in a specific cell at a specific "level" and
want to iterate all duplicates, equalRange is a best fit.
As pointed out in the issue this behavior test branches on an undefined
value. That's not valid Zig code.
Also behavior tests should not depend on the standard library in this
manner. They need to minimally isolate the specific language thing that
is being tested.
Closes#12681
This change causes `__isPlatformVersionAtLeast` to no longer exist in
compiler_rt when targetting a min os version earlier than 10.15, which
is earlier than the default os version and so only affects builds that
explicitly target an older version than Zig officially supports.
Turns out that around 10.13/10.14 macOS release version, Apple changed the target tags in
tbd files from `macosx` to `macos`. In order to be compliant and therefore actually support
linking on older platforms against `libSystem.tbd`, we add `<cpu_arch>-macosx` to target strings.
This logic (currently) has a non-trivial cost (particularly in terms of
peak RSS) for tracking dependencies. Until incremental compilation is in
use in the wild, it doesn't make sense for users to pay that cost.
* Functions failing codegen now set this failure on the function
analysis state. Decl analysis `codegen_failure` is reserved for
failures generating constant values.
* `liveness_failure` is consolidated into `codegen_failure`, as we do
not need to distinguish these, and Liveness.Verify is just a debugging
feature anyway.
* `sema_failure_retryable` and `codegen_failure_retryable` are removed.
Instead, retryable failures are recorded in the new
`Zcu.retryable_failures` list. On an incremental update, this list is
flushed, and all elements are marked as outdated so that we re-attempt
analysis and code generation.
Also remove the `generation` fields from `Zcu` and `Decl` as these are
not needed by our new strategy for incremental updates.
* Mark root Decls for re-analysis separately
* Check for re-analysis of root Decls
* Remove `outdated` entry when analyzing fn body
* Remove legacy `outdated` field from Decl analysis state
* Invalidate `decl_val` dependencies
* Recursively mark and un-mark all dependencies correctly
* Queue analysis of outdated dependers in `Compilation.performAllTheWork`
Introduces logic to invalidate `decl_val` dependencies after
`Zcu.semaDecl` completes. Also, recursively un-mark dependencies as PO
where needed.
With this, all dependency invalidation logic is in place. The next step
is analyzing outdated dependencies and triggering appropriate
re-analysis.
Sema now tracks dependencies appropriately. Early logic in Zcu for
resolving outdated decls/functions is in place. The setup used does not
support `usingnamespace`; compilations using this construct are not yet
supported by this incremental compilation model.
Wrapping strange integers before an operation was initially
done as an attempt to minimize the amount of normalizations
required: This way, there would not be a normalization
necessary between two modular operations. This was a
premature optimization, since the resulting logic is more
complicated than naive way of wrapping the result after
the operation.
This commit updates handling of strange integers to do
wrapping after each operation. It also seems slightly
more efficient in terms of size of generated code, as
it reduces the size of the behavior tests binary by
about 1%.
As reported by jacobly, the Apple system linker matches sections to
segments by name and not by flags causing Zig's executable section
ending up in a segment with incorrect permission flags.
Pass the required lazy dependencies from the build runner to the parent
process via a tmp file instead of stdout. I'll reproduce this comment to
explain it:
The `zig build` parent process needs a way to obtain results from the
configuration phase of the child process. In the future, the make phase
will be executed in a separate process than the configure phase, and we
can then use stdout from the configuration phase for this purpose.
However, currently, both phases are in the same process, and Run Step
provides API for making the runned subprocesses inherit stdout and stderr
which means these streams are not available for passing metadata back
to the parent.
Until make and configure phases are separated into different processes,
the strategy is to choose a temporary file name ahead of time, and then
read this file in the parent to obtain the results, in the case the child
exits with code 3.
This commit also extracts some common logic from the loop that rebuilds
the build runner so that it does not run again when the build runner is
rebuilt.
Build manifest files support `lazy: true` for dependency sections.
This causes the auto-generated dependencies.zig to have 2 more
possibilities:
1. It communicates whether a dependency is lazy or not.
2. The dependency might be acknowledged, but missing due to being lazy
and not fetched.
Lazy dependencies are not fetched by default, but if they are already
fetched then they are provided to the build script.
The build runner reports the set of missing lazy dependenices that are
required to the parent process via stdout and indicates the situation
with exit code 3.
std.Build now has a `lazyDependency` function. I'll let the doc comments
speak for themselves:
When this function is called, it means that the current build does, in
fact, require this dependency. If the dependency is already fetched, it
proceeds in the same manner as `dependency`. However if the dependency
was not fetched, then when the build script is finished running, the
build will not proceed to the make phase. Instead, the parent process
will additionally fetch all the lazy dependencies that were actually
required by running the build script, rebuild the build script, and then
run it again.
In other words, if this function returns `null` it means that the only
purpose of completing the configure phase is to find out all the other
lazy dependencies that are also required.
It is allowed to use this function for non-lazy dependencies, in which
case it will never return `null`. This allows toggling laziness via
build.zig.zon without changing build.zig logic.
The CLI for `zig build` detects this situation, but the logic for then
redoing the build process with these extra dependencies fetched is not
yet implemented.
This allows a `zig build` command to specify intention to create a
release build, regardless of what per-project options exist. It also
allows the command to specify a "preferred optimization mode", which is
chosen if the project itself does not choose one (in other words, the
project gets first choice). If neither the build command nor the project
specify a preferred release mode, an error occurs.
Before it was named "library" inconsistently.
Now the CLI args are -fsys=[name] and -fno-sys=[name] and it is a more
general-purpose "system integration" which could be a library name or
perhaps a project name such as "ffmpeg" or a binary such as "nasm".
This also makes a long-overdue change of extracting common state from
Build into a shared Graph object.
Getting the semantics right for these flags turned out to be quite
tricky. In the end it works like this:
* The override only happens when the target is fully native, with no
additional query parameters, such as versions or CPU features added.
* The override affects the resolved Target but leaves the original Query
unmodified.
* The "is native?" detection logic operates on the original, unmodified
query. This makes it possible to provide invalid host target
information, causing confusing errors to occur. Don't do that.
There are some minor breaking changes to std.Build API such as the fact
that `b.zig_exe` is now moved to `b.graph.zig_exe`, as well as a handful
of other similar flags.
These are advanced options that make it possible to simultaneously
target the "host" computer, while overriding assumptions about the host
computer such as what CPU features are available, or what OS version
range to target.
This is useful only for the case of integrating with system package
manager builds which want to pretend the host is also the target.
This eliminates some simple usages of `usingnamespace` in the standard
library. This construct may in future be removed from the language, and
is generally an inappropriate way to formulate code. It is also
problematic for incremental compilation, which may not initially support
projects using it.
I wasn't entirely sure what the appropriate namespacing for the types in
`std.os.uefi.tables` would be, so I ofted to preserve the current
namespacing, meaning this is not a breaking change. It's possible some
of the moved types should instead be namespaced under `BootServices`
etc, but this can be a future enhancement.
Since `bufPrint` and `count` both control the writers used internally,
they can leverage type-erased writers while maintaining correct error
handling. This reduces generic instantiations when using `allocPrint`,
which calls both `count` and `bufPrint` internally.
Closes#18628
This commit splits the arguments obtained from pkg-config into two
groups, cflags and libs, and consistently applies the cflags to each
individual module linking the library while applying the libs only once
for each compilation.
This is to ensure that the loader correctly zeroes-out zerofill
sections when mapping them. For context, Apple's loader dyld
will map the regions where any zerofill would theoretically reside
as belonging to zerofill section.
This is incredibly confusing and I really need to simplify it.
Elf also possesses this shortcoming so once I get Coff up to speed
it should hopefully become clear on how to refactor this.
`NIX_LDFLAGS` typically contains just `-rpath` and `-L`, which we already
handle. However, at least one setup hook in Nixpkgs [0] adds a linkage
directive to it. To prevent library paths from being missed (as I've
observed myself with `NIX_LDFLAGS` being `-liconv ...`, making it so that
*all* paths are missed), let's just skip over them.
[0]: 08f615eb1b/pkgs/development/libraries/libiconv/setup-hook.sh
It is problematic for the cached `InternPool` state to directly
reference ZIR instruction indices, as these are not stable across
incremental updates. The existing ZIR mapping logic attempts to handle
this by iterating the existing Decl graph for a file after `AstGen` and
update ZIR indices on `Decl`s, struct types, etc. However, this is
unreliable due to generic instantiations, and relies on specialized
logic for everything which may refer to a ZIR instruction (e.g. a
struct's owner decl). I therefore determined that a prerequisite change
for incremental compilation would be to rework how we store these
indices.
This commit introduces a `TrackedInst` type which provides a stable
index (`TrackedInst.Index`) for a single ZIR instruction in the
compilation. The `InternPool` now stores these values in place of ZIR
instruction indices. This makes the ZIR mapping logic relatively
trivial: after `AstGen` completes, we simply iterate all `TrackedInst`
values and update those indices which have changed. In future, if the
corresponding ZIR instruction has been removed, we must also invalidate
any dependencies on this instruction to trigger any required
re-analysis, however the dependency system does not yet exist.
This commit changes how declarations (`const`, `fn`, `usingnamespace`,
etc) are represented in ZIR. Previously, these were represented in the
container type's extra data (e.g. as trailing data on a `struct_decl`).
However, this introduced the complexity of the ZIR mapping logic having
to also correlate some ZIR extra data indices. That isn't really a
problem today, but it's tricky for the introduction of `TrackedInst` in
the commit following this one. Instead, these type declarations now
simply contain a trailing list of ZIR indices to `declaration`
instructions, which directly encode all data related to the declaration
(including containing the declaration's body). Additionally, the ZIR for
`align` etc have been split out into their own bodies. This is not
strictly necessary, but it's much simpler to understand for an
insignificant cost in bytes, and will simplify the resolution of #131
(where we may need to evaluate the pointer type, including align etc,
without immediately evaluating the value body).
When formatting a pointer to user type, currently it needs to be
dereferenced first, then call `formatType` on the child type.
Fix the problem by checking for "format" function on not only the type
itself, but also the struct it points to. Add hasMethod to std.meta.
Free the allocated threads in the initialization of a thread pool only with pool.join instead of additionally calling allocator.free causing free to be called twice.
Resolves#18643
Currently, std.fmt has a misguided, half-assed Unicode implementation
with an ambiguous definition of the word "character". This commit does
almost nothing to mitigate the problem, but it lets me close an open PR.
In the future I will revert 473cb1fd74 as
well as 279607cae5, and redo the whole
std.fmt API, breaking everyone's code and unfortunately causing nearly
every Zig user to have a bad day. std.fmt will go back to only dealing
in bytes, with zero Unicode awareness whatsoever. I suggest a third
party package provide Unicode functionality as well as a more advanced
text formatting function for when Unicode awareness is needed. I have
always suggested this, and I sincerely apologize for merging pull
requests that compromised my stance on this matter.
Most applications should, instead, strive to make their code independent
of Unicode, dealing strictly in encoded UTF-8 bytes, and never attempt
operations such as: substring manipulation, capitalization, alignment,
word replacement, or column number calculations.
Exceptions to this include web browsers, GUI toolkits, and terminals. If
you're not making one of these, any dependency on Unicode is probably a
bug or worse, a poor design decision.
closes#18536
This creates a section in the language reference about doctests, which
is currently referenced by Autodoc in a tooltip when displaying a
doctest.
Some advice relevant to writing doctests is included, based on the
discussion on #16472.
The following test fails since NonCanonical is not handled
test "foo" {
std.net.Ip4Address.resolveIp("1.1.1.1", 0) catch unreachable;
}
/usr/lib/zig/std/net.zig:240:60: error: switch must handle all possibilities
if (parse(name, port)) |ip4| return ip4 else |err| switch (err) {
^~~~~~
/usr/lib/zig/std/net.zig:240:60: note: unhandled error value: 'error.NonCanonical'
referenced by:
test.foo: src/dhcp.zig:383:23
During semantic analysis the value may be an unresolved lazy value
which makes using `toUnsignedInt` invalid.
Add assertions to detect similar issues in the future.
Closes#18624
This logic was previously in Sema, which was unnecessary complexity, and meant the issue was not detected unless the declaration was semantically analyzed. This commit finishes the work which 941090d started.
Resolves: #17916
Uses the new `-M[name][=src]` CLI syntax to omit the source when the
module does not have a zig root source file.
Only some kinds of link objects imply that this should happen.
I changed my mind on how the CLI for Zig modules should work. I don't
like that `--mod` takes 2 parameters. Instead let's swing all the way in
the other direction: `-M[name][=src]`
This is shorter (Zig CLI invocations are long enough already), avoids
the double parameter edge case, and supports the concept of omitting the
source file part of the argument, which was already wanted for `-Mstd`.
The legacy way to encode that was `--mod std ''` - awkward!
Undocumented support for `--mod` remains so that this branch does not
need a zig1.wasm update. The next time that file is updated, support for
`--mod` can be dropped.
Importantly, this commit also adds support for modules that do not have
a root zig source file. In such case, it sets root to cwd and
root_src_path to empty string, and only sets have_zcu to true if a
module is provided with a root zig source file.
The deleted lines here are redundant because they happen first thing
inside the function call below.
Additionally, skip hashing the root source file if it is an empty
string. I explored making this field along with `root` optional but
found this to be less messy actually.
This section documented how a buggy feature worked at one point in time
but it's not a description of what is supposed to happen. What is
supposed to happen is simple enough to not warrant any documentation
about it. When a file is imported, all the test decls are supposed to be
queued for analysis.
Also, refAllDecls() is a hack which should not be celebrated or even
mentioned in the language reference.
closes#18042
The code asserted that the range to be replaced is within bounds of
`self.items`.
This is now reflected in the doc comment.
The old, wrong doc comment was copied from the `insert*` fns.
With this assertion holding true, `start + len` is always within the
address space and `start + new_items.len` is, at this point, always
strictly within bounds of `self.items`.
Previously `@as(i64, undefined) +% 1` would produce `@as(@TypeOf(undefined), undefined)` which now gives `@as(i64, undefined)`.
Previously `@as(i64, undefined) +| 1` would hit an assertion which now gives `@as(i64, undefined)`.
These changes enable me to use `GeneralPurposeAllocator` with my "Bring
Your Own OS" package. The previous checks for a freestanding target have
been expanded to `@hasDecl` checks.
- `root.os.heap.page_allocator` is used if it exists.
- `debug.isValidMemory` only calls `os.msync` if it's supported.
Per last paragraph of RFC 8446, Section 5.2, the length of the inner content of an encrypted record must not exceed 2^14 + 1, while that of the whole encrypted record must not exceed 2^14 + 256.
Because creation of a symlink can fail on Windows with an Access Denied
error (https://learn.microsoft.com/en-us/windows/security/threat-protection/security-policy-settings/create-symbolic-links)
any tests that need a symbolic link "skip" if they run into this problem.
This change factors out a "setupSymbolicLink()" routine to make this
clearer, a bit tighter, and easier to use in future tests.
I also collapsed the "symlink in parent directory" test into the existing
"Dir.readlink" test, because the latter uses the more comprehensive
testWithAllSupportedPathTypes wrapper.
The test runner uses "." in its output between the test module and the
test name, so quote the leading '.' in these test names to make them
easier to read.
Client for tls was using a function that wasn't declared on the
interface for it. The issue wasn't apparent because net stream
implemented that function.
I changed it to keep the interface promise of what's required to be
compatible with the tls client functionality.
As suggested by @matu3ba, it can be better to use Security Attributes
directly while creating the handle instead of creating the handle then
setting the handle to inherit. Doing so can prevent potentially leaking
to other parallel spawned processes which would inherit the opened `\Device\Null`
handle.
This change also allows windows.OpenFile to handle when bInheritHandle
is set.
Note that we are using the same `saAttr`, but since it's taken as a
pointer to a const in all calls, it's never mutated, and OpenFile never alters it.
This also saves 1 kernel call for setting the handle to inherit.
This commit allows write access to the `\\Device\\Null` Handle.
Without a write access, it's not possible for the child process to write
SdOut to Null. As a requirement `SetHandleInformation` was also changed
to mark the handle as iheritable (by adding it to Flags) by the spawned process.
This allows the child to access the NUL device that was opened.
This also makes the Windows part to behave similarly to `spawnPosix`.
The original test was checking the types of irrelevant slices, the test
is for slicing of multi-pointers _without_ an end value, but the types
of slices with an end value were being checked.
- Add syscall bindings/structures for the `futex2` family.
The documentation is taken from the syscall definitions.
- Add documnentation for the `cachestat` bindings and structures.
Taken from work I did in Cosmopolitian libc.
- Add binding for `map_shadow_stack`.
No documentation for this one, since the kernel devs didn't bother to
do it ¯\_(ツ)_/¯.
This syscall was added to simplify the the libc implementations of
fchmodat, as the original syscall does not take a `flags` argument.
Another syscall, `map_shadow_stack`, was also added for x86_64.
This reverts commit d9d840a33a, reversing
changes made to a04d433094.
This is not an adequate implementation of the missing safety check, as
evidenced by the changes to std.json that are reverted in this commit.
Reopens#18382Closes#18510
Itarator has `next` function, iterates over tar files. When using from
outside of module with `tar.` prefix makes more sense.
var iter = tar.iterator(reader, null);
while (try iter.next()) |file| {
...
}
Split reading/parsing tar file and writing results to the disk in two
separate steps. So we can later test parsing part without need to write
everyting to the disk.
This corrects calculating the offsets to the code section as we now
correctly allocate the code atoms during write taking the 'size' into
account. We also handle dead symbols which are garbage-collected by
writing -2 and -1 to skip ranges, loc and other sections respectively.
We delay atom allocation for the code section until we write the actual
atoms. We do this to ensure the offset of the atom also includes the
'size' field which is leb128-encoded and therefore variable. We need this
correct offset to ensure debug info works correctly.
The ordering of the code section is now automatic due to iterating the
function section and then finding the corresponding atom to each
function. This also ensures each function corresponds to the right atom,
and they do not go out-of-sync.
Lastly, we removed the `next` field as it is no longer required and also
removed manually setting the offset in synthetic functions. This means
atoms use less memory and synthetic functions are less prone. They will
also be placed in order of function order correctly.
Not all custom sections are represented by a symbol, which means the
section will not be parsed by the lazy parsing and therefore get garbage-
collected. This is problematic as it may contain debug information that
should not be garbage-collected. To resolve this, we manually create
local symbols for those sections and also ensure they do not get garbage-
collected.
* Specifically recognize stderr as a different concept than an error
message in Step results.
* Display it differently when only stderr occurs but the build proceeds
successfully.
closes#18473
std.fs.dir.makePath silently failed if one of the items in the path already exists. For example:
cwd.makePath("foo/bar/baz")
Silently failing is OK if "bar" is already a directory - this is the intended use of makePath (like mkdir -p). But if bar is a file then the subdirectory baz cannot be created - the end result is that makePath doesn't do anything which should be a detectable error because baz is never created.
The existing code had a TODO comment that did not specifically cover this error, but the solution for this silent failure also accomplishes the TODO task - the code now stats "foo" and returns an appropriate error. The new code also handles potential race condition if "bar" is deleted/permissions changed/etc in between the initial makeDir and statFile calls.
We would rather use the ucrt for these, but sometimes dependencies on
the mingw stdio functions creep in. 仕方ない.
The cost is only paid if they are used; otherwise the symbols are
garbage-collected at link time.
Martin Storsjö kindly took the time to discuss things at length with me,
and the results are that this status quo is correct. I added comments so
that I don't think it should be changed later.
Upstream commit dddccbc3ef50ac52bf00723fd2f68d98140aab80
* adds ucrtbase.def.in
* mingwex: replace mingw crt files with ucrt files
* adds missing mingw-w64 ucrt files
The rules that govern which set of files are included or excluded is
contained in the logic for tools/update_mingw.zig
Upstream commit dddccbc3ef50ac52bf00723fd2f68d98140aab80
Martin Storsjö suggested synchronizing with git snapshots rather than
waiting for tagged releases. Let's try this for a few releases of Zig
and see how we like it.
These headers were configured with `--with-default-msvcrt=ucrt`.
See related issue #18477.
This mainly replaces ChunkIterator with std.mem.window and also
prints \n, \r, \t using Unicode symbols instead of periods because
they're common non-printable characters.
This same code exists in std.debug.hexdump.
At some point maybe this code could be exposed through a public
function. Then we could reuse the code in both places.
Recently, when I've been working with structures of data that is not
directly in RAM but rather laid out in bytes somewhere else,
it was always very useful to print out maybe the next 50 bytes or the
previous 50 bytes or so to see what's ahead or before me.
I would usually do this with a quick
`std.debug.print("{any}\n", .{bytes});` or something but the output is
not as nice obviously.
Changes the types of `std.builtin.Type` `name` fields from `[]const u8`
to `[:0]const u8`, which should make them easier to pass to C APIs
expecting null-terminated strings.
This will break code that reifies types using `[]const u8` strings, such
as code that uses `std.mem.tokenize()` to construct types from strings
at comptime. Luckily, the fix is simple: simply concatenate the
`[]const u8` string with an empty string literal (`name ++ ""`) to
explicitly coerce it to `[:0]const u8`.
Co-authored-by: Krzysztof Wolicki <der.teufel.mail@gmail.com>
The strlcpy symbol was added in v2.38, so this is a handy symbol for
creating binaries that won't run on relatively modern systems (e.g., mine,
that has glibc 2.36 installed).
At a minimum required glibc is v2.17, as earlier versions do not define
some symbols (e.g., getauxval()) used by the Zig standard library.
Additionally, glibc only supports some architectures at more recent
versions (e.g., riscv64 support came in glibc v2.27). So add a
`glibc_min` field to `available_libcs` for architectures with stronger
version requirements.
Extend the existing `canBuildLibC` function to check the target against
the Zig minimum, and the architecture/os minimum.
Also filter the list shown by `zig targets`, too:
$ zig targets | jq -c '.glibc'
["2.17.0","2.18.0","2.19.0","2.20.0","2.21.0","2.22.0","2.23.0","2.24.0","2.25.0","2.26.0","2.27.0","2.28.0","2.29.0","2.30.0","2.31.0","2.32.0","2.33.0","2.34.0","2.35.0","2.36.0","2.37.0","2.38.0"]
Fixes#17034Fixes#17769
The fstat,lstat,stat,mknod stubs used to build older (before v2.33)
glibc versions depend on the weak_hidden_alias macro. It was removed
from the glibc libc-symbols header, so patch it back in for the older
builds.
The scope of libc_nonshared.a was greatly changed in glibc 2.33 and
2.34, but only the change from 2.34 was reflected so far. Glibc 2.33
finally switched to versioned symbols for stat functions, meaning that
libc_nonshared.a no longer contains them since 2.33. Relevant files were
therefore reverted to 2.32 versions and renamed accordingly.
This commit also removes errno.c, which was probably added to
libc_nonshared.a based on a wrong assumption that glibc/include/errno.h
requires glibc/csu/errno.c. In reality errno.h should refer to
__libc_errno (not to be confused with the public __errno_location),
which should be imported from libc.so. The inclusion of errno.c resulted
in wrong compile options as well; this commit fixes them as well.
Fixes#16152
Adds a variant to the LazyPath union representing a parent directory
of a generated path.
```zig
const LazyPath = union(enum) {
generated_dirname: struct {
generated: *const GeneratedFile,
up: usize,
},
// ...
}
```
These can be constructed with the new method:
```zig
pub fn dirname(self: LazyPath) LazyPath
```
For the cases where the LazyPath is already known
(`.path`, `.cwd_relative`, and `dependency`)
this is evaluated right away.
For dirnames of generated files and their dirnames,
this is evaluated at getPath time.
dirname calls can be chained, but for safety,
they are not allowed to escape outside a root
defined for each case:
- path: This is relative to the build root,
so dirname can't escape outside the build root.
- generated: Can't escape the zig-cache.
- cwd_relative: This can be a relative or absolute path.
If relative, can't escape the current directory,
and if absolute, can't go beyond root (/).
- dependency: Can't escape the dependency's root directory.
Testing:
I've included a standalone case for many of the happy cases.
I couldn't find an easy way to test the negatives, though,
because tests cannot yet expect panics.
Fixes edge cases where the `startsWith` that was used previously would return a false positive on a resolved path like `foo.zig` when the resolved root was `foo`. Before this commit, such a path would be treated as a sub path of 'foo' with a resolved sub file path of 'zig' (and the `.` would be assumed to be a path separator). After this commit, `foo.zig` will be correctly treated as outside of the root of `foo`.
Closes#18355
When depending on a module that depends on a static library, there was a
missing step dependency on the static library, which caused a compile
error due to missing header file.
This fixes the problem by adding the proper step dependencies.
Reviewing this code, I'm starting to wonder if it might be simpler to
have Module instances create dummy Step objects to better model
dependencies and dependees, rather than trying to maintain this graph
without an actual node. That would be an improvement for a future
commit.
Updated `zirShl`, to compute `shl_exact` with `comptime_int` LHS operand
like `shl`, and added test case for `@shlExact` with `comptime_int` LHS
operand.
This commit changes the type of the second parameter to `anytype`, which should make it easier to pass literals to these functions. This change shouldn't *silently* break existing code (the assertions themselves should retain the same behavior as before) but it may result in some new compile errors when struct/union/array literals or builtins like `@bitCast` are used for the second argument. These compile errors can be fixed by explicitly coercing these expressions to the correct type using `@as`.
Current implementation fails to handle the following enum
```zig
const E = enum {
X,
pub const X = 1;
}
```
because `@field(type, name)` prefers declarations over enum fields.
This branch introduced an arena allocator for temporary allocations in
Compilation.update. Almost every implementation of flush() inside the
linker code was already creating a local arena that had the lifetime of
the function call. This commit passes the update arena so that all those
local ones can be deleted, resulting in slightly more efficient memory
usage with every compilation update.
While at it, this commit also removes the Compilation parameter from the
linker flush function API since a reference to the Compilation is now
already stored in `link.File`.
This isn't technically needed since per-module -I args can suffice, but
this can produce very long CLI invocations when several --mod args are
combined with --search-prefix args since the -I args have to be repeated
for each module.
This is a partial revert of ecbe8bbf2df2ed4d473efbc32e0b6d7091fba76f.
The linker needs to know the file system path of output in the flush
function because file paths inside the build artifacts reference each
other. Fixes a regression introduced in this branch.
This issue already existed in master branch, however, the more
aggressive caching of builtin.zig in this branch made it happen more
often. I added doc comments to AtomicFile to explain when this problem
can occur.
For the compiler's use case, error.AccessDenied can be simply swallowed
because it means the destination file already exists and there is
nothing else to do besides proceed with the AtomicFile cleanup.
I never solved the mystery of why the log statements weren't printing
but those are temporary debugging instruments anyway, and I am already
too many yaks deep to whip out another razor.
closes#14978
Without this commit, unrelated test builds using incremental cache mode
(self-hosted, no lld) would end up using the same cache namespace, which
is undesireable since concurrent builds will clobber each other's work.
This happened because of passing the root module to
addModuleToCacheHash. In the case of a test build, the root module
actually does not connect to the rest of the import table. Instead, the
main module needs to be passed, which has "root" in its import table.
The other call to addModuleTableToCacheHash which is in
addNonIncrementalStuffToCacheManifest already correctly passes the main
module.
In the future, I think this problem can be fully addressed by obtaining
an advisory lock on the output binary file. However, even in that case,
it is still valuable to make different compilations use different cache
namespaces lest unrelated compilations suffer from pointless thrashing
rather than being independently edited.
Instead of making its own inside create. 10 out of 10 calls to create()
had already an arena in scope, so this commit means that 10 instances of
Compilation now reuse an existing arena with the same lifetime rather
than creating a redundant one.
In other words, this very slightly optimizes initialization of the
frontend in terms of memory allocation.
Now, link.File will always be null when -fno-emit-bin is specified, and
in the case that LLVM artifacts are still required, the Zcu instance has
an LlvmObject.
before this commit it was trying to hash based on resolved bin_file
settings, but bin_file was always null since cache mode is always whole
when this function is called! hash based on the lf_open_opts instead.
This reintroduced flag makes zig build behave the same as the previous
commit. Without this flag, the default behavior is now changed to
display compilation errors inline with the rest of error messages and
the build tree context.
This behavior is essential for making sense of error logs from projects
that have two or more steps emitting compilation errors which is why it
is now the default.
* move wasi_emulated_libs into Compilation
- It needs to be accessed from Compilation, which needs to potentially
build those artifacts.
* Compilation: improve error reporting for two cases
- the setMiscFailure mechanism is handy - let's use it!
* fix one instance of incorrectly checking for emit_bin via
`comp.bin_file != null`. There are more instances of this that need to
be fixed in a future commit.
* fix renameTmpIntoCache not handling the case where it needs to make
the "o" directory in the zig-cache directory.
- while I'm at it, simplify the logic for handling the fact that
Windows returns error.AccessDenied rather than
error.PathAlreadyExists for failure to rename a directory over
another one.
* fix missing cache hash additions
- there are still more to add in a future commit -
addNonIncrementalStuffToCacheManifest is called when bin_file is
always null, and then it incorrectly checks if bin_file is non-null
and only then adds a bunch of stuff to the cache hash. It needs to
instead add to the cache hash based on lf_open_opts.
This is necessary because on COFF, the entry symbol name is not known
until the linker has looked at the set of global symbol names to
determine which of the four possible main entry points is present.
use_llvm=false does not always mean there needs to be a zig compiler
backend available. In particular, when there is no zig compilation unit,
use_llvm=false and yet no zig backend will be used to produce code.
In Compilation.create, update the resolved config to account for the
default resolution of the root module. This makes it so that, for
example, reading comp.config.any_non_single_threaded is valid in order
to determine whether any module has single_threaded=false.
Now that we always pass --image-base to LLD, including when Zig chooses
the default value, LLD is complaining about 32-bit architectures because
it requires being at least equal to the max page size, which is 64K.
When using the build system to do unit testing, it lowers to --mod
arguments which were incorrectly tripping a "zig test requires a source
file argument" error.
* fix relationship between createEmpty/open (similar logic as
607111aa758002bc51914b7dc800b23927c931b8)
* still resolve the start symbol when linking libc because when zig is
the linker it still needs to know the entry symbol.
* make use_llvm=false when there is no zig compilation unit.
when linking libc, the entry point is within libc.
when producing C code, the entry point is decided when compiling the C
code and does not need to be known up front.
fixes a false positive "error: unknown target entry point" when using
-ofmt=c.
This is redundant with CacheMode.whole which caches everything,
including linking output. Linker code does not need to concern itself
with caching like this.
implement builtin.zig file population for all modules rather than
assuming there is only one global builtin.zig module.
move some fields from link.File to Compilation
move some fields from Module to Compilation
compute debug_format in global Compilation config resolution
wire up C compilation to the concept of owner modules
make whole cache mode call link.File.createEmpty() instead of
link.File.open()
Commit 97e23896a9 regressed this behavior
because it made target_util.supportsStackProtector *correctly*
notice which zig backend is being used to generate code, while the
logic calling that function *incorrectly assumed* that .zig code is being
compiled, when in reality it might be only C code being compiled.
This commit adjusts the option resolution logic for stack protector so
that it takes into account the zig backend only if there is a zig
compilation unit. A separate piece of logic checks whether clang
supports stack protector for a given target.
closes#18009closes#18114closes#18254
These options are only supposed to be provided to the initialization
functions, resolved, and then computed values stored in the appropriate
place (base struct or the object-format-specific structs).
Many more to go...
Much of the logic from Compilation.create() is extracted into
Compilation.Config.resolve() which accepts many optional settings and
produces concrete settings. This separate step is needed by API users of
Compilation so that they can pass the resolved global settings to the
Module creation function, which itself needs to resolve per-Module
settings.
Since the target and other things are no longer global settings, I did
not want them stored in link.File (in the `options` field). That options
field was already a kludge; those options should be resolved into
concrete settings. This commit also starts to work on that, deleting
link.Options, moving the fields into Compilation and
ObjectFormat-specific structs instead. Some fields were ephemeral and
should not have been stored at all, such as symbol_size_hint.
The link.File object of Compilation is now a `?*link.File` and `null`
when -fno-emit-bin is passed. It is now arena-allocated along with
Compilation itself, avoiding some messy cleanup code that was there
before.
On the command line, it is now possible to configure the standard
library itself by using `--mod std` just like any other module. This
meant that the CLI needed to create the standard library module rather
than having Compilation create it.
There are a lot of changes in this commit and it's still not done. I
didn't realize how quickly this changeset was going to balloon out of
control, and there are still many lines that need to be changed before
it even compiles successfully.
* introduce std.Build.Cache.HashHelper.oneShot
* add error_tracing to std.Build.Module
* extract build.zig file generation into src/Builtin.zig
* each CSourceFile and RcSourceFile now has a Module owner, which
determines some of the C compiler flags.
This change is seemingly insignificant but I actually agonized over this
for three days. Some other things I considered:
* (status quo in master branch) make Compile step creation functions
accept a Target.Query and delete the ResolvedTarget struct.
- downside: redundantly resolve target queries many times
* same as before but additionally add a hash map to cache target query
resolutions.
- downside: now there is a hash map that doesn't actually need to
exist, just to make the API more ergonomic.
* add is_native_os and is_native_abi fields to std.Target and use it
directly as the result of resolving a target query.
- downside: they really don't belong there. They would be available
as comptime booleans via `@import("builtin")` but they should not
be exposed that way.
With this change the downsides are:
* the option name of addExecutable and friends is `target` instead of
`resolved_target` matching the type name.
- upside: this does not break compatibility with existing build
scripts
* you likely end up seeing `target.result.cpu.arch` rather than
`target.cpu.arch`.
- upside: this is an improvement over `target.target.cpu.arch` which
it was before this commit.
- downside: `b.host.target` is now `b.host.result`.
Previously when an error message is printed, it is sometimes not
possible to know which build step it corresponds to. With the subtree
printed in this commit, the context is always clear.
Introduce the concept of "target query" and "resolved target". A target
query is what the user specifies, with some things left to default. A
resolved target has the default things discovered and populated.
In the future, std.zig.CrossTarget will be rename to std.Target.Query.
Introduces `std.Build.resolveTargetQuery` to get from one to the other.
The concept of `main_mod_path` is gone, no longer supported. You have to
put the root source file at the module root now.
* remove deprecated API
* update build.zig for the breaking API changes in this branch
* move std.Build.Step.Compile.BuildId to std.zig.BuildId
* add more options to std.Build.ExecutableOptions, std.Build.ObjectOptions,
std.Build.SharedLibraryOptions, std.Build.StaticLibraryOptions, and
std.Build.TestOptions.
* remove `std.Build.constructCMacro`. There is no use for this API.
* deprecate `std.Build.Step.Compile.defineCMacro`. Instead,
`std.Build.Module.addCMacro` is provided.
- remove `std.Build.Step.Compile.defineCMacroRaw`.
* deprecate `std.Build.Step.Compile.linkFrameworkNeeded`
- use `std.Build.Module.linkFramework`
* deprecate `std.Build.Step.Compile.linkFrameworkWeak`
- use `std.Build.Module.linkFramework`
* move more logic into `std.Build.Module`
* allow `target` and `optimize` to be `null` when creating a Module.
Along with other fields, those unspecified options will be inherited
from parent `Module` when inserted into an import table.
* the `target` field of `addExecutable` is now required. pass `b.host`
to get the host target.
This moves many settings from `std.Build.Step.Compile` and into
`std.Build.Module`, and then makes them transitive.
In other words, it adds support for exposing Zig modules in packages,
which are configured in various ways, such as depending on other link
objects, include paths, or even a different optimization mode.
Now, transitive dependencies will be included in the compilation, so you
can, for example, make a Zig module depend on some C source code, and
expose that Zig module in a package.
Currently, the compiler frontend autogenerates only one
`@import("builtin")` module for the entire compilation, however, a
future enhancement will be to make it honor the differences in modules,
so that modules can be compiled with different optimization modes, code
model, valgrind integration, or even target CPU feature set.
closes#14719
This reverts commit 7161ed79c4, reversing
changes made to 3f2a65594e.
Unfortunately, this sat in the PR queue too long and the merge broke the
zig1.wasm bootstrap process.
The function returns the vector length, not the byte size of the vector or the bit size of individual elements. This distinction is very important and some usages of this function in the stdlib operated under these incorrect assumptions.
The first instruction in the block was never checked resulting in `struct_is_comptime` being incorrectly cleared if there are no instructions before the first field of the comptime struct.
Fixes#17119
On some architectures, including AMD Zen CPUs, dividing a secret
by a constant denominator may not be a constant-time operation.
And most Kyber implementations, including ours, could leak the
hamming weight of the shared secret because of this. See:
https://kyberslash.cr.yp.to
Multiplications aren't guaranteed to be constant-time either, but
at least on the CPUs we currently support, it is.
- Clean up array formatting code. Remove buggy formatting of array
pointers, deference pointer to reuse existing array formatting logic.
- Change default specifier for array pointers to be "{any}", to be
consistent with slices.
- Allow using "{x}" and "{e}" for arrays and slices for all number
types, including u8.
Fixes#18185
In theory this is part of https://github.com/ziglang/zig/issues/18335, but these tests already pass since deleteTree does not depend on `OpenDirOptions.no_follow` behavior for these test cases:
- `deleteTree` always tries to delete the initial path as a file first, which will succeed on symlinks because `deleteFile` doesn't follow symlinks
- `deleteTree` when iterating a directory will get the type of symlinks as .sym_link, not as .directory (even if the symlink points to a directory), meaning it will never try to open a symlink as a directory.
The `no_follow` behavior happened to allow opening a file descriptor of a symlink itself on Windows, but that behavior may change in the future. Instead, we implement the opening of the symlink as a file descriptor manually (and per-platform) in the test case.
ctime is last file status/metadata change, not creation time. Note that this mistake was not made in the `File.metadata`/`File.Metadata` implementation, which allows getting the actual creation time.
Closes#18290
Requires an extra NtQueryInformationFile call when FILE_ATTRIBUTE_REPARSE_POINT is set to determine if it's actually a symlink or some other kind of reparse point (https://learn.microsoft.com/en-us/windows/win32/fileio/reparse-point-tags). This is something that `File.Metadata.kind` was already doing, so the same technique is used in `stat`.
Also, replace the std.os.windows.DeviceIoControl call in `metadata` with NtQueryInformationFile (NtQueryInformationFile is what gets called during kernel32.GetFileInformationByHandleEx with FileAttributeTagInfo, verified using NtTrace).
For computing the zig version number, pass --abbrev=9 rather than
requiring the user to set their git configuration in order to make zig
versions match the standard.
The old implementation had a bug in it in that it didn't quote empty strings, but it also didn't properly follow the special quoting rules required for the first argument (the executable name). This new implementation serializes the argv correctly such that it can be parsed by the `CommandLineToArgvW` algorithm.
This adds `ArgIteratorWindows`, which faithfully replicates the quoting and escaping behavior observed in `CommandLineToArgvW` and should make Zig applications play better with processes that abuse these quirks.
This option is not needed since the link_libc flag can be set directly
when creating compiler_rt.
This fixes a problem where an immutable flag was being mutated in Sema.
This way people can use `const` with a `std.EnumMap`
and be able to `getPtrAssertContains(...)` like the
would with a mutable `var` instance.
Aligns with the existing `getPtr(...)`/`getPtrConst(...)`
methods.
This reverts commit f88b523065.
Let's please go through the language proposal process for this change. I
don't see any justification for this breaking change even in the commit
message.
previously when T was smaller than 8 bits, it was possible for base
to overflow T (because base is a u8). this patch prevents this by
accumulating into a U rather than T which is at least 8 bits wide.
this is the best way i could think of to maintain performance. this
will only affect parsing of integers less than 8 bits by adding one
additional cast at return. additionally, this patch may be slightly
slower to return an error for integers less than 8 bits which overflow
because it will accumulate a few more digits before the overflow check
at return.
* add tests which previously overflowed when they shouldn't have
closes#18157
* Update the msdos stub to be eight bytes smaller, which moves the
machine PE header field into the first 128 bytes of the file,
allowing it to be matched by a binfmt_misc magic sequence.
This allows the build system to get the correct error during exec.
* Fix library name memory leaks in Sema.
* std.debug: optimized printLineFromFileAnyOs
Uses mem.indexOfScalar to speed line iteration instead of byte for byte.
Also prints the whole line in a single write (or up to a page size at a
time)
Closes#18099
* add test cases for printLineFromFileAnyOs
By default we garbage-collect sections for Wasm to reduce size, as well
as finish linking quicker (as we have fewer things to do). However,
when the user specifies `--no-gc-sections` we ensure all resolved symbols
get marked and therefore do not get garbage collected.
This is supported in both incremental-mode and traditional linking.
When using the Wasm backend, we will now also perform garbage collection
there, to ensure unreferenced symbols do not get parsed nor emit into
the final binary.
When we encounter a debug info symbol, we initially have to parse it
into an atom to find its relocations. We then go through its relocations
to find out if any of the target symbols are marked alive. When it
finds an alive symbol, we also mark the debug symbol as alive to ensure
this piece of debug info is emit to the binary. When it does not encounter
any alive symbols, the debug symbol remains dead and will be garbage-
collected during `allocateAtoms`.
When multiple symbols point to the same function, we ensure any
other symbol other than the original will be discarded and point
to the original instead. This prevents emitting the same function
code more than once.
Rather than parsing every symbol into an atom, we now only parse them
into an atom when such atom is marked. This means garbage-collected
symbols will also not be parsed into atoms, and neither are discarded
symbols which have been resolved by other symbols. (Such as multiple
weak symbols).
This also introduces a binary search for finding the start index into
the list of relocations. This speeds up finding the corresponding
relocations tremendously as they're ordered ascended by address.
Lastly, we re-use the memory of atom's data as well as relocations
instead of duplicating it. This means we half the memory usage of
atom's data and relocations for linked object files. As we are
aware of decls and synthetic atoms, we free the memory of those
atoms indepedently of the atoms of object files to prevent double-frees.
Symbols which are exported to the host, or contain the `NO_STRIP`
flag, will be marked. All symbols which are referenced by this symbol
are marked likewise. We achieve this by parsing all relocations of a
symbol, and then marking the symbol it points to within the relocation.
When a linked object contains references to the __tls_base symbol,
we lazily create this symbol. However, we wouldn't create the corresponding
Wasm global. This meant its address wasn't set correctly as well as fail
to output it into the `Names` section.
The logic here already caught the case when a dependency path tried to
escape out of the zig cache directory using up directories. However, it
did not catch the case when the relative path tried to reach into a
different path within the zig-cache. For example, if it asked for
"../../../blah" then it would be caught, but if it asked for "../blah"
then it would try to resolve as "zig-cache/p/blah" and probably result
in file-not-found, or perhaps resolve to a different package if someone
inadvertently used a valid package hash instead of "blah".
Now it correctly gives a "dependency path outside project" error,
however, still allows relative paths with up-dirs that were not fetched
via URL.
reads on eg. connected TCP sockets can fail with ETIMEDOUT, and ENOTCONN
happens eg. if you try to read a TCP socket that has not been connected
yet.
interestingly read() was already handling CONNRESET & TIMEDOUT, but
readv(), pread(), and preadv() were somewhat inconsistent.
This duplicates the source file string (as is done in other places such
as `addAssemblyFile()`) in order to prevent a segfault when the supplied
string is freed by the caller. This is still seen when the caller makes
use of a defer statement.
This reverts commit 8b10970836.
This implementation has the following problems:
* It does not provide context to the less than function. This will be an
API break in order to fix.
* It uses recursion, causing unbounded stack memory usage - likely
depending on user input, which is extra problematic.
* Sorting linked lists is generally an inefficient operation;
encouraging it by having a standard library function for it may
lead to suboptimal software being written in Zig.
Furthermore, there is almost no benefit to providing a sort function as
a method, when a third party implementation can easily be passed a
linked list to then be sorted.
A parameter like this is not always optional, even if that is
usually implied. SPIR-V tools fail to parse a module with an
OpLoopMerge instruction where the loop control parameter is
left out.
* move std.atomic.Atomic to std.atomic.Value
* fix incorrect argument order passed to testing.expectEqual
* make the functions be a thin wrapper over the atomic builtins and
stick to the naming conventions.
* remove pointless functions loadUnchecked and storeUnchecked. Instead,
name the field `raw` instead of `value` (which is redundant with the
type name).
* simplify the tests by not passing every possible combination. Many
cases were iterating over every possible combinations but then not
even using the for loop element value!
* remove the redundant compile errors which are already implemented by
the language itself.
* remove dead x86 inline assembly. this should be implemented in the
language if at all.
This was originally supposed to be a lock-free queue, but I gave up on
that and made it be a thread-safe queue instead.
Putting the mutex directly inside the queue data structure makes it
non-composeable. Instead, the recommendation is to use a normal queue
protected by an external mutex.
This was originally supposed to be a lock-free stack, but I gave up on
that and made it be a thread-safe stack which is implemented poorly
using spin locks. Nobody should use this data structure.
The alternative is a normal stack protected by a mutex.
* remove `std.atomic.Ordering`
- it is provided by the language with `std.builtin.AtomicOrder`.
* remove `std.atomic.fence`
- it is provided by the language with `@fence`.
* remove `std.atomic.compilerFence`
- if this is desired, it should be a language feature, not a standard
library function with inline asm.
This reverts commit da94227f78, reversing
changes made to 8f943b3d33.
I was against this change originally, but decided to approve it to keep
an open mind. After a year of trying it in practice, I firmly believe
that the previous way of doing it was better.
These arrays don't really all have an upper bound of 16; in fact they
have different upper bounds. Presumably the reason 16 was used for all
of them was to avoid code bloat with BoundedArray. Well, now even more
code bloat has been eliminated because now it's using
`ArrayList([]const u8)` which is certainly instantiated elsewhere.
Furthermore, the different corrected upper bounds can be specified at
each instance of the array list.
I consider this change to be neutral. It inlines a bit of logic that
previously was abstracted, however, it also eliminates an instance of
`catch unreachable` as well as a clumsy failed pop / append in favor of
writing directly to the appropriate array element.
This simplifies the logic. For example, in generateExactWidthType, it no
longer has to save a previous length and then use defer to reset the
length after mutating it.
In this case it improved maintainability because magic number `4` is no
longer repeated 3 times, and there is no longer a redundant branch in
the loop.
* Take advantage of multi-object for loops.
* Remove use of BoundedArray since it had no meaningful impact on safety
or readability.
* Simplify some complex expressions, such as using `!` to invert a
boolean value.
We definitely want ArrayList in the standard library. Do we want
BoundedArray? Maybe, maybe not. But that makes ArrayList a more stable
dependency for std.fs.
This is useful when you want to have an array list backed by a fixed
slice of memory and no Allocator will be used.
It's an alternative to BoundedArray as you will see in the following
commit.
In general, I don't like the idea of std.meta.trait, and so I am
providing some guidance by deleting the entire namespace from the
standard library and compiler codebase.
My main criticism is that it's overcomplicated machinery that bloats
compile times and is ultimately unnecessary given the existence of Zig's
strong type system and reference traces.
Users who want this can create a third party package that provides this
functionality.
closes#18051
Justification: It is common for non-CPU bound short routines to do
non-blocking accept to eliminate unnecessary delays before subscribing
to data, for example in hardware integration tests.
```zig
const std = @import("std");
pub fn main() !void {
var addr: *u8 = @ptrFromInt(0xaaaaaaaaaaaaaaaa);
addr.* = 1;
}
```
On x86_64-linux:
Before:
```
$ zig run x.zig
Segmentation fault at address 0x0
/home/wooster/Desktop/zig/x.zig:5:5: 0x21d887 in main (x)
addr.* = 1;
^
/home/wooster/Desktop/zig-linux-x86_64/lib/std/start.zig:583:37: 0x21d847 in posixCallMainAndExit (x)
const result = root.main() catch |err| {
^
/home/wooster/Desktop/zig-linux-x86_64/lib/std/start.zig:251:5: 0x21d371 in _start (x)
asm volatile (switch (native_arch) {
^
???:?:?: 0x0 in ??? (???)
Aborted (core dumped)
```
After:
```
$ zig run x.zig --zig-lib-dir lib
General protection exception
/home/wooster/Desktop/zig/x.zig:5:5: 0x21d907 in main (x)
addr.* = 1;
^
/home/wooster/Desktop/zig/lib/std/start.zig:583:37: 0x21d8c7 in posixCallMainAndExit (x)
const result = root.main() catch |err| {
^
/home/wooster/Desktop/zig/lib/std/start.zig:251:5: 0x21d3f1 in _start (x)
asm volatile (switch (native_arch) {
^
???:?:?: 0x0 in ??? (???)
Aborted (core dumped)
```
As @IntegratedQuantum pointed out in <https://github.com/ziglang/zig/issues/17745#issuecomment-1783815386>,
it seems that if `code` of the `siginfo_t` instance is a certain value (128), you are able to distinguish between
a general protection exception and a segmentation fault.
This does not seem to be documented on `man sigaction`:
```
The following values can be placed in si_code for a SIGSEGV signal:
SEGV_MAPERR
Address not mapped to object.
SEGV_ACCERR
Invalid permissions for mapped object.
SEGV_BNDERR (since Linux 3.19)
Failed address bound checks.
SEGV_PKUERR (since Linux 4.6)
Access was denied by memory protection keys. See pkeys(7). The protection key which applied to this access is available via si_pkey.
```
(those constants are 1, 2, 3, and 4; none of them are the 128)
I can't find a lot of documentation about this but it seems to work consistently for me on x86_64-linux.
Here is a gist which provides additional evidence that this is a reliable way of checking for a general protection fault:
https://gist.github.com/ytoshima/5682393 (read comment in first line)
See also: https://stackoverflow.com/questions/64309366/why-is-the-segfault-address-null-when-accessing-memory-that-has-any-of-the-16-mo
This only seems to affect x86_64 and on 32-bit x86 this does not seem to be a problem.
Helps with #17745 but doesn't close it because the issue still exists on Windows and other POSIX OSs.
I also limited this to x86_64-linux for now because that's the only platform where I tested it. Might work on more POSIX OSs.
Instead of `zig init-lib` and `zig init-exe`, now there is only
`zig init`, which initializes any of the template files that do not
already exist, and makes a package that contains both an executable and
a static library. The idea is that the user can delete whatever they
don't want. In fact, I think even more things should be added to the
build.zig template.
When a local variable is never used as an lvalue, we can determine that
`const` would be sufficient for this variable, so emit an error in this
case. More sophisticated checking is unfortunately not possible with
Zig's current analysis model, since whether an lvalue is actually
mutated depends on semantic analysis, in which some code paths may not
be analyzed, so attempting to determine this would result in false
positive compile errors.
It's worth noting that an unfortunate consequence of this is that any
field call `a.b()` will allow `a` to be `var`, even if `b` does not take
a pointer as its first parameter - this is again a necessary compromise
because the parameter type is not known until semantic analysis.
Also update `translate-c` to not trigger these errors. This is done by
replacing the `_ = @TypeOf(x)` emitted with `_ = &x` - the reference
there means that the local is permitted to be `var`. A similar strategy
will be used to prevent compile errors in the behavior tests, where we
sometimes want to force a value to be runtime-known.
Resolves: #224
Seems like this restriction was actual when Ziglang had extern enums,
but now it's not neccessary and can be lifted. It was present since
original PR which introduced std.enums, https://www.github.com/ziglang/zig/pull/8171.
See also: https://ziggit.dev/t/catching-invalid-enum-value-errors/2206/11
* Make `count` comptime_int instead of usize
With previous type, creating EnumIndexer for enum(usize) and enum(isize)
would cause compile error since `count` could not store maxInt(usize) + 1.
Now it can store it and reflects len field from std.builtin.Type.Array
(most common use case of count field inside std.enums functions is creating arrays).
Signed-off-by: Eric Joldasov <bratishkaerik@getgoogleoff.me>
Getting this error in ci:
run test std-arm-linux-none-generic-Debug: error: 'test.accept/connect/send_zc/recv' failed: /home/ci/actions-runner1/_work/zig/zig/lib/std/os/linux/io_uring.zig:60:23: 0x70b06b in init_params (test)
.NOSYS => return error.SystemOutdated,
^
/home/ci/actions-runner1/_work/zig/zig/lib/std/os/linux/io_uring.zig:27:16: 0x70b6b7 in init (test)
return try IO_Uring.init_params(entries, ¶ms);
^
/home/ci/actions-runner1/_work/zig/zig/lib/std/os/linux/io_uring.zig:3807:16: 0x72405b in test.accept/connect/send_zc/recv (test)
var ring = try IO_Uring.init(16, 0);
https://github.com/ziglang/zig/actions/runs/6909813408/job/18801841015?pr=18025
Reverting previous change.
I'm building test bin and then running it in virtual machines with different
kernels. So Linux kernel checks has to be runtime instead of comptime.
So far we relied on getting EINVAL in CQE for operations that kernel don't
support. The problem with that approach is that there are many other reasons
(like wrong params) to get EINVAL. The other problem is when we have an
operation that existed before and gets new behavior via different attributes,
like accept and accept_direct. Older kernels can fall back to non direct
operation although we set attributes for direct operation. Operation completes
successfully in both cases but with different results.
This commit introduces kernel version check at the start of the test. Making
body of the test free of checking for various kernel version differences.
Feature availability references:
* https://manpages.debian.org/unstable/liburing-dev/io_uring_enter.2.en.html
* https://kernel.dk/axboe-kr2022.pdf
* 5acf7969bc/lib/std/os/linux.zig (L3727)
* 5acf7969bc/lib/std/os/linux.zig (L3993)
The idea here is that the zig2 executable is perhaps the more useful
deliverable until we implement our own optimization passes. This will
allow system packages to provide Zig, and use it to compile Zig
projects, all without LLVM!
The motivating problem here was a memory leak in the hash maps of
Module.Namespace.
The commit deletes more of the legacy incremental compilation
implementation. It had things like use of orderedRemove and trying to do
too much OOP-style creation and deletion of objects.
Instead, this commit iterates over all the namespaces on Module deinit
and calls deinit on the hash map fields. This logic is much simpler to
reason about.
Similarly, change global inline assembly to an array hash map since
iterating over the values is a primary use of it, and clean up the
remaining values on Module deinit, solving another memory leak.
After this there are no more memory leaks remaining when using the
x86 backend in a libc-less compiler.
When a zig compiler without LLVM extensions is satisfactory, this
greatly simplified build-from-source process can be used.
This could be useful for users who only want to contribute to the
standard library, for example.
This reverts commit 547481c31c.
There is a comment that did not get addressed with this patch, and the
required test cases are not added.
Reopens#17798.
Addresses a comment in #17779 pointing out the inability to fetch the
upstream BoringSSL sources over Git. The reason for this is because the
Git server used in this case did not include the optional (but
recommended) LF terminator for textual pkt-line data. This commit
adjusts handling of textual pkt-line data so that it works both with and
without the optional trailing LF.
This change fixes some division-by-zero bugs introduced by the optimized
ring buffer read/write functions in d8c067966.
There are edge cases where decompression can use a length zero ring
buffer as the size of the ring buffer used is exactly the the window
size specified by a Zstandard frame, and this can be zero. Switching
away from loops to mem copies means that we need to ensure ring buffers
do not have length zero ring when attempting to read/write from them.
`send_zc` tries to avoid making intermediate copies of data. Zerocopy execution
is not guaranteed and may fall back to copying.
The flags field of the first struct io_uring_cqe may likely contain
IORING_CQE_F_MORE , which means that there will be a second completion event /
notification for the request, with the user_data field set to the same value.
The user must not modify the data buffer until the notification is posted. The
first cqe follows the usual rules and so its res field will contain the number
of bytes sent or a negative error code. The notification's res field will be set
to zero and the flags field will contain IORING_CQE_F_NOTIF. The two step model
is needed because the kernel may hold on to buffers for a long time, e.g.
waiting for a TCP ACK, and having a separate cqe for request completions allows
userspace to push more data without extra delays. Note, notifications are only
responsible for controlling the lifetime of the buffers, and as such don't mean
anything about whether the data has atually been sent out or received by the
other end. Even errored requests may generate a notification, and the user must
check for IORING_CQE_F_MORE rather than relying on the result.
Available since kernel 6.0.
References:
https://man7.org/linux/man-pages/man3/io_uring_prep_send_zc.3.htmlhttps://man7.org/linux/man-pages/man2/io_uring_enter.2.html
Boundary symbols have a special name prefix:
* section$start$segname$sectname
* section$stop$segname$sectname
* segment$start$segname
* segment$stop$segname
and will resolve to either start or end of the respective
section/segment if found.
If not found, we return an error stating we couldn't find the
requested section/segment rather than silently failing and resolving
the address to 0 which seems to be the case with Apple's ld64.
There are two optimizations here, which work together to avoid a
pathological case.
The first optimization is that AstGen now records the result type of an
array multiplication expression where possible. This type is not used
according to the language specification, but instead as an optimization.
In the expression '.{x} ** 1000', if we know that the result must be an
array, then it is much more efficient to coerce the LHS to an array with
length 1 before doing the multiplication. Otherwise, we end up with a
1000-element tuple which we must coerce to an array by individually
extracting each field.
Secondly, the previous logic would repeatedly extract element/field
values from the LHS when initializing the result. This is unnecessary:
each element must only be extracted once, and the result reused.
These changes together give huge improvements to compiler performance on
a pathological case: AIR instructions go from 65551 to 15, and total AIR
bytes go from 1.86MiB to 264.57KiB. Codegen time spent on this function
(in a debug compiler build) goes from minutes to essentially zero.
Resolves: #17586
Previously the symbol tag field would remain `undefined` until it
was set during `flush`. However, the symbol's tag would be observed
earlier than where it was being set. We now set it to the explicit
tag `undefined` so this can be caught during debug. The symbol tag of
a decl will now also be set right after `updateDecl` and `updateFunc`.
Likewise, we now also set the `name` field during atom creation for
decls, as well as set the other fields to the max(u32) to ensure we
get a compiler crash during debug to ensure any misses will be caught.
There is no grantee that `copy_cqes` will return exactly wait_nr number of cqes.
If there are ready cqes it can return > 0 but < wait_nr number of cqes.
The low-level `Curve25519.fromEdwards25519()` function assumed
that the X/Y coordinates were not scaled (Z=1).
But this is not guaranteed to be the case.
In most real-world applications, the coordinates are freshly decoded,
either directly or via the `X25519.fromEd25519()` function, so this
is not an issue.
However, since we offer the ability to do that conversion after
arbitrary computations, the assertion was not correct.
This change allows struct field inits to use layout information
of their own struct without causing a circular dependency.
`semaStructFields` caches the ranges of the init bodies in the `StructType`
trailing data. The init bodies are then resolved by `resolveStructFieldInits`,
which is called before the inits are actually required.
Within the init bodies, the struct decl's instruction is repurposed to refer
to the field type itself. This is to allow us to easily rebuild the inst_map
mapping required for the init body instructions to refer to the field type.
Thanks to @mlugg for the guidance on this one!
The logic in 509be7cf1f assumed that
`use_llvm` meant that the LLVM backend would be used, however, use_llvm
is false when there are no zig files to compile, which is the case for
zig cc. This logic resulted in `-fsingle-threaded` which made libc++
fail to compile for C++ code that includes the threading abstractions
(such as LLVM).
Perform these transformations in this priority order:
1. If the `else` expression is missing or an empty block, replace the condition with `if (true)` if it is not already.
2. If the `then` block is empty, replace the condition with `if (false)` if it is not already.
3. If the condition is `if (true)`, replace the `if` expression with the contents of the `then` expression.
4. If the condition is `if (false)`, replace the `if` expression with the contents of the `else` expression.
Changes:
- Add `isMangledIdent` to determine if `fmtIdent` would make any edits to the identifier
- Any function that has a mangled identifier is referred to using the mangled identifer
within the current file, but if it is exported the first export will be with the non-mangled name.
- Add `zig_import` to import a symbol under a different name
- Add a level of indirection to float function names. Now, they are referred to as
`zig_float_fn_<float type>_<operation>`. The definitions in zig.h are wrapped
with `zig_import` to import the symbol under the real name.
The specific problem that sparked this change was the combination of
`zig_libc_name_f80(name) __##name##x` with the input `fma`, resulting
in `__fmax`, which is a new intrinsic in recent versions of cl.exe.
With the above changes in place, compiler_rt can output the following:
```
static zig_weak_linkage_fn zig_f80 zig_e___fmax(zig_f80, zig_f80, zig_f80);
zig_export(zig_weak_linkage_fn zig_f80 zig_e___fmax(zig_f80, zig_f80, zig_f80), __fmax, "__fmax");
```
Within compiler_rt, `zig_e___fmax` is used to refer to the function, but consumers
will import `__fmax`, which maps to their `zig_float_fn_f80_fma` definition from zig.h.
Server networking application typically accept multiple connections. Multishot
accept simplifies handling these situations. Applications submits once and
receives CQE whenever a new connection request comes in.
Multishot is active until it is canceled or experience error. While active, and
further notification are expected CQE completion will have IORING_CQE_F_MORE set
in the flags. If this flag isn't set, the application must re-arm this request
by submitting a new one.
Reference: [io_uring and networking in 2023](https://github.com/axboe/liburing/wiki/io_uring-and-networking-in-2023#multi-shot)
* don't reset the rng. it seems like it was getting stuck trying the
same transforms over and over again.
* slide the window over after failing a large transform set, idea being
that transformations in the failed set are more likely to be
problematic.
and use that to fix up usused variable declarations and parameters that
are caused by transformations.
also add a transformation to replace global variable init with
`undefined`.
* atom names - are stored locally and pulled from defining object's
strtab
* local symbols - same
* global symbols - in principle, we could store them locally, but
for better debugging experience - when things go wrong - we
store the offsets in a global strtab used by the symbol resolver
Now it works like this:
1. Walk the AST of the source file looking for independent
reductions and collecting them all into an array list.
2. Randomize the list of transformations. A future enhancement will add
priority weights to the sorting but for now they are completely
shuffled.
3. Apply a subset consisting of 1/2 of the transformations and check for
interestingness.
4. If not interesting, half the subset size again and check again.
5. Repeat until the subset size is 1, then march the transformation
index forward by 1 with each non-interesting attempt.
At any point if a subset of transformations succeeds in producing an interesting
result, restart the whole process, reparsing the AST and re-generating the list
of all possible transformations and shuffling it again.
As for std.zig.render, the fixups operate based on AST Node Index rather
than Nth index of the function occurence. This allows precise control
over how to mutate the input.
This reverts a change introduced in #17400 causing a bug when
decompressing an RLE block into a ring buffer.
RLE blocks contain only a single byte of data to copy into the output,
so attempting to copy a slice causes buffer overruns and incorrect
decompression.
This incremental compilation logic will need to be reworked so that it
does not depend on buried pointers - that is, long-lived pointers that
are owned by non-top-level objects such as Decl.
In the meantime, this fixes memory leaks since the memory management of
these dependencies has bitrotted.
This updates all linker tests to include `no_entry` as well as changes
all tests to executable so they do not need to be updated later when
the in-house WebAssembly linker supports dynamic libraries.
When linking a WebAssembly binary using wasm-ld, we will now correctly pass
--shared when building a dynamic library and --pie when `-fpic` is given.
This is a breaking change and users who currently build dynamic libraries
for WebAssembly but wish to have an executable without a main, should use
build-exe instead while supplying `-fno-entry`.
We also verify the user does not supply --export-memory when -dynamic is
enabled. By default we set this flag now to 'false' when `-dynamic` is used
to ensure we do not enable it as it's enabled by default for regular
WebAssembly binaries.
This adds support for the `-fno-entry` and `-fentry` flags respectively, for
zig build-{exe/lib} and the build system. For `zig cc` we use the `--no-entry`
flag to be compatible with clang and existing tooling.
In `start.zig` we now make the main function optional when the target is
WebAssembly, as to allow for the build-exe command in combination with
`-fno-entry`.
When the execution model is set, and is set to 'reactor', we now verify
when an entry name is given it matches what is expected. When no entry
point is given, we set it to `_initialize` by default. This means the user
will also be met with an error when they use the reactor model, but did
not provide the correct function.
Arrays are currently always passed by reference, this means that we
always keep the value in linear memory and never load it to Wasm's
stack. Scalar values however do get lowered to Wasm's stack.
This means when bitcasting from an array to a scalar value, we must
load the memory of the array as such scalar type. To bitcast
a scalar type to an array, we allocate a new temporary in the
linear data segment, and then store the scalar value there.
Use inline to vastly simplify the exposed API. This allows a
comptime-known endian parameter to be propogated, making extra functions
for a specific endianness completely unnecessary.
- Add resolveUnionAlignment, to resolve a union's alignment only, without triggering layout resolution.
- Update resolveUnionLayout to cache size, alignment, and padding. abiSizeAdvanced and abiAlignmentAdvanced
now use this information instead of computing it each time.
* autodoc: Some support for field_call
* autodoc: Change handling of field_call to respect tryResolveRefPath, add fieldVal to Expr
* autodoc: Fixed errors
* autodoc: sync with latest master changes
---------
Co-authored-by: Loris Cro <kappaloris@gmail.com>
As suggested by mlugg, always returns `error.NeedLazy`. If this has a
performance impact, it could be replaced by adding lazy handling to
`comptimeOnlyAdvanced`.
This commit starts by making Zir.Inst.Index a nonexhaustive enum rather
than a u32 alias for type safety purposes, and the rest of the changes
are needed to get everything compiling again.
This field had the wrong type. It's not a `Zir.Inst.Index`, it's
actually a `Zir.OptionalExtraIndex`. Also, the former is currently
aliased to `u32` while the latter is a nonexhaustive enum that gives us
more type checking.
This commit is preparation for making this field non-optional. Now it
can be changed to `Zir.ExtraIndex` and then the compiler will point out
all the places that the non-optional assumption is being violated.
This is consistent with `JSON.parse("-0")` in JavaScript, RFC 8259
doesn't specifically mention what to do in this case.
If a negative zero is encoded the intention is likely to preserve the
sign.
add error.EndOfStream to readEnum() and isBytes() so that users can
catch these errors. this also prevents them from panicing with
'invalid error value' on EndOfStream.
test both methods.
This adds scheme guessing when loading proxies, such that
`HTTP_PROXY=127.0.0.1` and such are valid now and it behaves identically
to `HTTP_PROXY=http://127.0.0.1`. Additionally fixed a typo that was
causing loadDefaultProxies to never populate the https_proxy.
The main motivating change here is to prevent the creation of a fake
Decl object by the frontend in order to `@export()` a value.
Instead, `link.updateDeclExports` is renamed to `link.updateExports` and
accepts a tagged union which can be either a Decl.Index or a
InternPool.Index.
The main problem being fixed here is there was a getOrPut() that held on
to a reference to the value pointer too long, and meanwhile the call to
`lowerConst` ended up being recursive and mutating the hash map,
invoking undefined behavior.
caught via #17719
* 128-bit integer multiplication with overflow
* more instruction encodings used by std inline asm
* implement the `try_ptr` air instruction
* follow correct stack frame abi
* enable full panic handler
* enable stack traces
While this is a less flexible approach to being able to
allocated the PHDR anywhere in file, it is sadly generally expected
by the tooling in the wild.
This logic is not correct in most cases. If any instruction needs to
operate with different semantics within `@TypeOf`, it should be made to
do so explicitly.
This broke a line in `std.mem`: I have opted to fix this in std for now,
since as far as I know it's not yet been discussed which operations (if
any) should be special-cased like this within `@TypeOf`.
Having simplified these functions in a previous commit, I felt inclined
to refactor their names, which were previously quite inconsistent. There
are now 4 "core" functions:
* `resolveValue` (previously `resolveMaybeUndefVal`) allows runtime-known and undef values.
* `resolveConstValue` (previously `resolveConstMaybeUndefVal`) allows undef but not runtime-known values.
* `resolveDefinedValue` (name unchanged) allows runtime-known values but not comptime-known undef.
* `resolveConstDefinedValue` (previously `resolveConstValue`) does not allow runtime-known or undef values.
You can see the inconsistencies in the old names here - sometimes we
specified "maybe undef", and sometimes we went the other way by
specifying "defined". With the new names, the most common function,
`resolveValue`, has the shortest name and does the most general thing,
and is the baseline that the other functions are adding logic to.
Some other functions were also renamed:
* `resolveMaybeUndefLazyVal` -> `resolveValueResolveLazy`
* `resolveMaybeUndefValIntable` -> `resolveValueIntable`
* `resolveMaybeUndefValAllowVariables` -> `resolveValueAllowVariables`
The main goal of this commit is to remove the `runtime_value` field from
`InternPool.Key` (and its associated representation), but there are a
few dominos. Specifically, this mostly eliminates the "maybe runtime"
concept from value resolution in Sema: so some resolution functions like
`resolveMaybeUndefValAllowVariablesMaybeRuntime` are gone. This required
a small change to struct/union/array initializers, to no longer
use `runtime_value` if a field was a `variable` - I'm not convinced this
case was even reachable, as `variable` should only ever exist as the
trivial value of a global runtime `var` decl.
Now, the only case in which a `Sema.resolveMaybeUndefVal`-esque function
can return the `variable` key is `resolveMaybeUndefValAllowVariables`,
which is directly called from `Sema.resolveInstValueAllowVariables`
(previously `Sema.resolveInstValue`), which is only used for resolving
the value of a Decl from `Module.semaDecl`.
While changing these functions, I also slightly reordered and
restructured some of them, and updated their doc comments.
When the code is written this way, you get a compile error if the
pointer given to Tracy does not have a static lifetime.
This would have caught the regression in #13315.
In general, let's not lean on GitHub issue numbers as having meaning.
The goal of behavior tests is to produce a minimum set of tests that
test 100% of the language.
This feature was made to work with the legacy incremental compilation
mechanism which is being reworked.
This commit regresses the ability to update files used with `@embedFile`
while the compiler is running.
In exchange, we get these benefits:
* The embedded file contents are copied directly into InternPool rather
than there being an extra allocation and memcpy.
* The EmbedFile struct, which represents a long-lived object, is made
more serialization friendly.
* Eliminate the creation and use of a Decl as an anonymous decl.
When implementing the new incremental compilation mechanism,
functionality will need to be added back for handling `@embedFile`.
In most cases "GLIBC_2.X" strings and `/lib/libc-2.x.so` files do not contain third (`patch`) field,
which causes std.SemanticVersion.parse function to return error. To fix this, we
reuse [now-public] std.zig.CrossTarget.parseVersion function,
which accounts for this third field and makes it 0 in case it was not found.
This new behaviour is similar to std.builtin.Version.parse, which was removed in
6e84f46990
Fixes regression from 6e84f46990
and https://github.com/ziglang/zig/pull/13998 .
Related: https://github.com/ziglang/zig/issues/17626 . Results with `zig end`:
Before: `"target": "x86_64-linux.6.5.7...6.5.7-gnu.2.19",`
After: `"target": "x86_64-linux.6.5.7...6.5.7-gnu.2.36",`
Also, while we are here, write explicit error sets and remove duplicate
logic from std.zig.system.darwin.macos.parseSystemVersion .
Signed-off-by: Eric Joldasov <bratishkaerik@getgoogleoff.me>
This reverts commit 0c99ba1eab, reversing
changes made to 5f92b070bf.
This caused a CI failure when it landed in master branch due to a
128-bit `@byteSwap` in std.mem.
Justification: exec, execv etc are unix concepts and portable version
should be called differently.
Do no touch non-Zig code. Adjust error names as well, if associated.
Closes#5853.
std_options.http_connection_pool_size removed in favor of
```
client.connection_pool.resize(client.allocator, size);
```
std_options.http_disable_tls will remove all https capability from
std.http when true. Any https request will error with
`error.TlsInitializationFailed`.
Solves #17051.
adds connectTunnel to form a HTTP CONNECT tunnel to the desired host.
Primarily implemented for proxies, but like connectUnix may be called by
any user.
adds loadDefaultProxies to load proxy information from common
environment variables (http_proxy, HTTP_PROXY, https_proxy, HTTPS_PROXY,
all_proxy, ALL_PROXY).
- no_proxy and NO_PROXY are currently unsupported.
splits proxy into http_proxy and https_proxy, adds headers field for
arbitrary headers to each proxy.
Commit 5393e56500d499753dbc39704c0161b47d1e4d5c has a flaw pointed out
by @mlugg: the `ty` field of pointer values changes when comptime values
are pointer-casted. This commit introduces a new encoding which
additionally stores the "original pointer type" which is used to store
the alignment of the anonymous decl, and potentially other information
in the future such as section and pointer address space. However, this
new encoding is only used when the original pointer type differs from
the casted pointer type in a meaningful way.
I was able to make the LLVM backend and the C backend lower anonymous
decls with the appropriate alignment, however I will need some help
figuring out how to do this for the backends that lower anonymous decls
via src/codegen.zig and the wasm backend.
Instead of creating Module.Decl objects, directly create InternPool
pointer values using the anon_decl Addr encoding.
The LLVM backend needed code to notice the alignment of the pointer and
lower accordingly. The other backends likely need a similar change.
* Add missing period in Stack's description
This looks fine in the source, but looks bad when seen on the documentation website.
* Correct documentation for attachSegfaultHandler()
The description for attachSegfaultHandler() looks pretty bad without indicating that the stuff at the end is code
* Added missing 'the's in Queue.put's documentation
* Fixed several errors in Stack's documentation
`push()` and `pop()` were not styled as code
There was no period after `pop()`, which looks bad on the documentation.
* Fix multiple problems in base64.zig
Both "invalid"s in Base64.decoder were not capitalized.
Missing period in documentation of Base64DecoderWithIgnore.calcSizeUpperBound.
* Fix capitalization typos in bit_set.zig
In DynamicBitSetUnmanaged.deinit's and DynamicBitSet.deinit's documentation, "deinitializes" was uncapitalized.
* Fix typos in fifo.zig's documentation
Added a previously missing period to the end of the first line of LinearFifo.writableSlice's documentation.
Added missing periods to both lines of LinearFifo.pump's documentation.
* Fix typos in fmt.bufPrint's documentation
The starts of both lines were not capitalized.
* Fix minor documentation problems in fs/file.zig
Missing periods in documentation for Permissions.setReadOnly, PermissionsWindows.setReadOnly, MetadataUnix.created, MetadataLinux.created, and MetadataWindows.created.
* Fix a glaring typo in enums.zig
* Correct errors in fs.zig
* Fixed documentation problems in hash_map.zig
The added empty line in verify_context's documentation is needed, otherwise autodoc for some reason assumes that the list hasn't been terminated and continues reading off the rest of the documentation as if it were part of the second list item.
* Added lines between consecutive URLs in http.zig
Makes the documentation conform closer to what was intended.
* Fix wrongfully ended sentence in Uri.zig
* Handle wrongly entered comma in valgrind.zig.
* Add missing periods in wasm.zig's documentation
* Fix odd spacing in event/loop.zig
* Add missing period in http/Headers.zig
* Added missing period in io/limited_reader.zig
This isn't in the documentation due to what I guess is a limitation of autodoc, but it's clearly supposed to be. If it was, it would look pretty bad.
* Correct documentation in math/big/int.zig
* Correct formatting in math/big/rational.zig
* Create an actual link to ZIGNOR's paper.
* Fixed grammatical issues in sort/block.zig
This will not show up in the documentation currently.
* Fix typo in hash_map.zig
This completes the migration from spv.ptrType to self.ptrType.
Unfortunately this requires us to pass a list of types to
constructStruct, which also requires some extra allocations
here and there.
This removes the strategy where union with different active
fields would be generated, and instead simply pointer casts
the active field type where required. This also allows removing
spv.ptrType and using self.ptrType instead, and allows caching
all union types (because there is only the canonical one).
This struct is used to configure the load, such as to make
it volatile. Previously this was done using a single bool, but
this struct makes it shorter to write non-volatile loads (the
usual) and more clear whats going on when a volatile load is
required.
To support self-referential pointers, in the future we will
need to pass the Zig type to any pointer that is created. This
lays some ground work for that by replacing most uses of
spv.ptrType with a new ptrType function that also accepts the
Zig type. This function's contents will soon be replaced by
a version that also supports self-referential pointers.
Also fixed some bugs regarding the use of direct/indirect.
RSA exponents are typically 3 or 65537, and public.
For those, we don't need to use conditional moves on the exponent,
and precomputing a lookup table is not worth it. So, save a few
cpu cycles and some memory for that common case.
For safety, make `powWithEncodedExponent()` constant-time by default,
and introduce a `powWithEncodedPublicExponent()` function for exponents
that are assumed to be public.
With `powWithEncodedPublicExponent()`, short (<= 36 bits) exponents
will take the fast path.
Previous commit caused this error to be printed:
zig build-exe zig Debug aarch64-macos-none: error: memory usage peaked
at 6323765248 bytes, exceeding the declared upper bound of 5200000000
I observed some out-of-memory conditions happening on one of the CI
servers. This should make it avoid scheduling compiler builds at the
same time as other memory-intensive operations.
Currently, Zig uses the oldest Debian release that is still under LTS
for the default version minimum. This is now Debian 10 (Buster), with
long-term support until 2024-06-30.
Debian Buster uses Linux 4.19 and glibc 2.28.
For the default version maximum, Zig uses the newest stable Linux
version, which is currently 6.5.7.
Citations:
* https://www.debian.org/News/2019/20190706
* https://packages.debian.org/source/buster/glibc
* https://kernel.org/
Previous update commit: 1530203c80
* support 0x prefixed hex code for CLI seed arguments
* don't change the build summary; the printed CLI on build runner
failure is sufficient
* use `std.crypto.random` instead of system time for entropy
This reverts commit d7b73af8f6.
I did not look at this closely enough. This is incorrect; it should not
implicitly add rpaths for every library, and it should not disable the
nice default of each_lib_path when compiling for the native OS.
See #16062 where we are working on a follow-up improvement to this.
The INCLUDE variable being used during `.rc` preprocessing was an accidental regression in https://github.com/ziglang/zig/pull/17412.
Closes#17585.
resinator changes:
source_mapping: Protect against NUL bytes in #line filenames
lex: Avoid recalculating column on every tab stop within string literals
Proper error handling for failing to open cwd instead of `catch unreachable`
Use platform-specific delimiter for INCLUDE env var parsing
Before this commit, the logic would fail with a "LTO not available"
error if the user set `-fno-lto` which doesn't make sense. This commit
corrects the logic to understand when the user is explicitly requesting
to turn LTO off.
This script is sometimes timing out, so let's test fewer things to save
time. The x86_64-windows-release script does include test coverage for
release builds of the standard library, behavior tests, etc.
Introduce introspect.EnvVar which tracks all the environment variables
that are observed by the compiler, so that we can print them with `zig
env`.
The `zig env` command now prints both the resolved values as well as all
the possibly observed environment variables.
The current json output is not very convenient to use from the shell or
a Zig program.
An example using the shell: `zig env | jq -r .lib_dir`.
Remove the json output, and instead use the POSIX shell syntax:
name="value"
Additionally, when args is not empty, assume each argument is the
environment variable name and print the associated value.
Unrecognized environment variables are ignored.
The new output format has been copied from `go env`, with the difference
that `go env` uses OS specific syntax for windows and plan9.
Define the environment variables in a single place, in order to avoid
possible bugs.
XXH3 is the faster alternative to XXH64 which utilizes SIMD
when hashing large chunks of memory and similar mixing to
WyHash (64x64 -> 128 mul) for smaller inputs.
Co-authored-by: Reixcon226 <87927264+Rexicon226@users.noreply.github.com>
---------
Co-authored-by: kprotty <kbutcher6200@gmail.com>
This argument causes zig to print verbose hashing information to stdout,
which can be used to diff two different fetches and find out why the
hashes do not equal each other.
The build was previously failing with `error: unknown command: -print-file-name=libstdc++.a`
because the command invocation was
`zig -print-file-name=libstdc++.a`
instead of
`zig c++ -print-file-name=libstdc++.a`
note: .cxx_compiler_arg1 = "" instead of undefined to avoid breaking existing setups without requiring to run cmake again.
Turns out order matters as otherwise we face unexplainable
segfaults to do with improper page mapping in static environments
(dynamic environments seem unaffected).
Currently this ignores ZigModule, however, I believe we can make it
so that this is done excluding sections/segments emitted by ZigModule
and everything should work out just fine.
An embedded manifest file is really just XML data embedded as a RT_MANIFEST resource (ID = 24). Typically, the Windows-only 'Manifest Tool' (`mt.exe`) is used to embed manifest files, and `mt.exe` also seems to perform some transformation of the manifest data before embedding, but in testing it doesn't seem like the transformations are necessary to get the intended result.
So, to handle embedding manifest files, Zig now takes the following approach:
- Generate a .rc file with the contents `1 24 "path-to-manifest.manifest"`
- Compile that generated .rc file into a .res file
- Link the .res file into the final binary
This effectively achieves the same thing as `mt.exe` minus the validation/transformations of the XML data that it performs.
How this is used:
On the command line:
```
zig build-exe main.zig main.manifest
```
(on the command line, specifying a .manifest file when the target object format is not COFF is an error)
or in build.zig:
```
const exe = b.addExecutable(.{
.name = "manifest-test",
.root_source_file = .{ .path = "main.zig" },
.target = target,
.optimize = optimize,
.win32_manifest = .{ .path = "main.manifest" },
});
```
(in build.zig, the manifest file is ignored if the target object format is not COFF)
Note: Currently, only one manifest file can be specified per compilation. This is because the ID of the manifest resource is currently always 1. Specifying multiple manifests could be supported if a way for the user to specify an ID for each manifest is added (manifest IDs must be a u16).
Closes#17406
options
The Khronos SPIRV-LLVM translator does not parse OpSource correctly. This
was causing tests to fail and other mysterious issues.
These are resolved by only generating a single OpSource instruction for now,
which does not have the source file locations also.
See https://github.com/KhronosGroup/SPIRV-LLVM-Translator/issues/2188
The min and max builtins in Zig have some intricate behavior
related to floats, that is not replicated with the min and max
wasm instructions or using simple select operations. By lowering
these instructions to compiler_rt, handling around NaNs is done
correctly.
See also https://github.com/WebAssembly/design/issues/214
These are tripping on 32-bit x86 but are intended to prevent glibc
itself from being built with a bad configuration. Zig is only using this
file to create libc_nonshared.a, so it's not relevant.
This is the only place in all of glibc that this macro is referenced.
What is it doing? Only preventing fstatat.c from knowing the type
definition of `__time64_t`, apparently.
Fixes compilation of fstatat.c on 32-bit x86.
I could have just included the file from upstream glibc, but it was too
silly so I just inlined it. This patch could be dropped in a future
glibc update if desired. If omitted it will cause easily solvable
C compilation failures building glibc nonshared.
- `fcntl` was renamed to `fcntl64` in glibc 2.28 (see #9485)
- `res_{,n}{search,query,querydomain}` became "their own" symbols since
glibc 2.34: they were prefixed with `__` before.
This PR makes it possible to use `fcntl` with glibc 2.27 or older and
the `res_*` functions with glibc 2.33 or older.
These patches will become redundant with universal-headers and can be
dropped. But we have to do with what we have now.
This is a patch to glibc features.h which makes
_DYNAMIC_STACK_SIZE_SOURCE undefined unless the version is >= 2.34.
This feature was introduced with glibc 2.34 and without this patch, code
built against these headers but then run on an older glibc will end up
making a call to sysconf() that returns -1 for the value of SIGSTKSZ
and MINSIGSTKSZ.
The previous code incorrectly added `sub_path` twice.
Also for the compilation unit, it was passing empty string to realpath,
resulting in the error handling codepath being used.
closes#17482
When analyzing the `validate_ref_ty` ZIR instruction, an assertion would
trip if the result type was a var args function argument. The fix is the
same as e6b73be870 - inline the logic of
`resolveType` and handle the case of var args.
Closes#17494
When analyzing the `as` ZIR instruction, an assertion would trip if the
result type was a var args function argument. The fix is simple: inline
a little bit of the `resolveType` logic into `analyzeAs` to make it
detect this situation - which it was already attempting to do.
Closes#16197
SPARCs have delayed branches, that is, it will unconditionally
run the next instruction following a branch.
Slightly reorder the _start code sequence to prevent it from
accidentally executing stray instructions, which may result in odd
program behavior.
This makes tmp_directory close before calling renameTmpIntoCache which fixes occurences of those AccessDenied errors that aren't synonymous with PathAlreadyExists on Windows
The following pairs of functions have been combined using the "advanced"
pattern used for other type queries:
* `Sema.fnHasRuntimeBits`, `Type.isFnOrHasRuntimeBits`
* `Sema.typeRequiresComptime`, `Type.comptimeOnly`
When not using libllvm, it means the compiler is not capable of
producing an object file or executable, making the self-hosted backends
be a better default.
`ContainerDeclarations` is an abstraction of `ContainerDeclaration*`.
Removing this abstraction allows the `ContainerMembers` rule to contain
more concrete information without having to look at the definition
of `ContainerDeclarations`.
When `std.mem.indexOf` is called with a single-item needle, use `indexOfScalarPos` which is significantly faster than the more general `indexOfPosLinear`. This can be done without introducing overhead to normal cases (where `needle.len > 1`).
* fs/test.zig: use arena allocator more consistently
* fs/test.zig: remove unnecessary type information
Zig can (now?) implicitly cast a `&.{ "foo"}` when passed to
`fs.path.join()`, so the `[_][]const u8` is unnecessary.
* fs/test.zig: Use fs.path.join() for longer paths
Replace long path constructions (that use several "++ path_sep ++")
with a single call to `fs.path.join`. Seems more readable to me.
* fs/test.zig: fmt
Instead of every file deletion being followed by a recursive attempt to
remove the parent directory, while walking the directory, every
directory that will have nonzero files deleted from it is tracked in an
array hash map.
After all the threads have finished deleting and hashing, the parent
thread sorts the "suspicious" directories by length, descending,
ensuring that children appear before parents, and then iterates over the
array hash map, attempting a rmdir operation on each. Any rmdir that
succeeds appends the parent directory to the map so that it will be
removed if empty.
This prevents bogus "error: file exists in multiple modules" errors due
to file paths looking like:
```
note: root of module foo/freetype/
note: root of module foo/fontconfig/../freetype/
```
It also enables checking for dependency paths outside the root package.
* add Module instances for each package's build.zig and attach it to the
dependencies.zig module with the hash digest hex string as the name.
* fix incorrectly skipping the wrong packages for creating
dependencies.zig
* a couple more renaming of "package" to "module"
Only problem is that it looks like `has_build_zig` is being false when
it should be true.
After that is fixed then main.zig needs to create the `@dependencies`
module from the generated source code.
Finish the work started in 4c4fb839972f66f55aa44fc0aca5f80b0608c731.
Now the compiler compiles again.
Wire up dependency tree fetching code in the CLI for `zig build`.
Everything is hooked up except for `createDependenciesModule` is not yet
implemented.
* start renaming "package" to "module" (see #14307)
- build system gains `main_mod_path` and `main_pkg_path` is still
there but it is deprecated.
* eliminate the object-oriented memory management style of what was
previously `*Package`. Now it is `*Package.Module` and all pointers
point to externally managed memory.
* fixes to get the new Fetch.zig code working. The previous commit was
work-in-progress. There are still two commented out code paths, the
one that leads to `Compilation.create` and the one for `zig build`
that fetches the entire dependency tree and creates the required
modules for the build runner.
Organize everything around a Fetch task which does a bunch of stuff in a
worker thread without touching any shared state, and then queues up
Fetch tasks for its dependencies.
This isn't the theoretical optimal package fetching performance because
CPU cores don't necessarily map 1:1 with I/O tasks, and each fetch task
contains a mixture of computations and I/O. However, it is expected for
this to significantly outperform master branch, which fetches everything
recursively with only one thread.
The logic is now a lot more linear and easy to follow. Everything that
is embarassingly parallel is done on the thread pool, and then after
everything is fetched, the worker threads are joined and the main thread
does the finishing touches of stitching together the dependencies.zig
import files. There is only one tiny little critical section and it does
not even have any error handling in it.
This also lays the groundwork for #14281 because in system mode, all
this fetching logic will be skipped, but the "finishing touches"
mentioned above still need to be done. With this branch, that logic is
separated out and no longer recursively tangled with fetching stuff.
Additionally, this branch:
* Implements inclusion directives in `build.zig.zon` for deciding which
files belong the package (#14311).
* Adds basic documentation for `build.zig.zon` files.
* Adds support for fetching dependencies with the `file://` protocol
scheme (#17364).
* Adds a workaround for a Linux/btrfs file system bug (#17282).
This commit is a work-in-progress. Still todo:
1. Hook up the CLI to the new system.
2. Restore the module table creation logic after all the fetching is
done.
3. Fix compilation errors, get the tests passing, and regression test
against real world projects.
for wasm, as a heuritic, only enable truncation for values smaller than 32bits.
-> the bug is no longer triggered in most use cases (or at least the test suite...)
as for powerpc, adding a redundant `and mask` produces working code.
This hashmap value used to be assigned much later during `flushModule`,
which never happens if an error is returned by the backend, and
attempting to the undefined values would cause a crash.
Originally inspired by Go's `utf8.Valid` function. Includes some test cases from Go's test suite.
Further optimized to be faster in all tested cases (short/long ascii/UTF8), in all release modes.
Takes advantage of SIMD for the ASCII fast path.
`nextAfter()` returns the next representable value after `x` in the direction of `y` and is a standard math library function ([C++](https://en.cppreference.com/w/cpp/numeric/math/nextafter), [Java](https://docs.oracle.com/javase/8/docs/api/java/lang/Math.html#nextAfter-double-double-)). It is primarily useful for bitwise incrementing/decrementing floats.
This implementation supports runtime integers, runtime floats and `comptime_int`. `comptime_float` is not supported because NaNs/infinities are intentionally difficult to obtain and because I'm not sure if the fact that it's backed by `f128` is supposed to be an implementation detail. Either way, the user could just call the function with the floating-point type whose behavior they want at comptime and then cast the result to `comptime_float`.
The float implementation was ported from mingw-w64 with some slight changes made possible because the Zig standard library doesn't care about raising FP exceptions.
The number of test cases may seem excessive but they should cover every normal and edge case for every float type and are especially important for verifying that `f80` works.
Until now, we would pass `candidate: NativeTargetInfo` which creates
a copy of the `NativeTargetInfo.DynamicLinker` buffer. We would then
return this buffer in `bad_dl: []const u8` which would goes out-of-scope
the moment we leave this function frame yielding garbage. To fix this,
we just need to remember to pass by const-pointer
`candidate: *const NativeTargetInfo`.
Follow up to #17383. This is a minor optimization that only matters when a small allocation is resized/free'd soon after it is allocated.
The only real difference I was able to observe with this was via a synthetic benchmark that allocates a full bucket and then frees all but one of the slots, over and over in a loop:
Debug build:
Benchmark 1 (9 runs): gpa-degen-master.exe
measurement mean ± σ min … max outliers delta
wall_time 575ms ± 5.19ms 569ms … 583ms 0 ( 0%) 0%
peak_rss 43.8MB ± 1.37KB 43.8MB … 43.8MB 1 (11%) 0%
Benchmark 2 (10 runs): gpa-degen-search-cur.exe
measurement mean ± σ min … max outliers delta
wall_time 532ms ± 5.55ms 520ms … 539ms 0 ( 0%) ⚡- 7.5% ± 0.9%
peak_rss 43.8MB ± 65.2KB 43.8MB … 44.0MB 1 (10%) + 0.0% ± 0.1%
ReleaseFast build:
Benchmark 1 (129 runs): gpa-degen-master-release.exe
measurement mean ± σ min … max outliers delta
wall_time 38.9ms ± 1.12ms 36.7ms … 42.4ms 8 ( 6%) 0%
peak_rss 23.2MB ± 2.39KB 23.2MB … 23.2MB 0 ( 0%) 0%
Benchmark 2 (151 runs): gpa-degen-search-cur-release.exe
measurement mean ± σ min … max outliers delta
wall_time 33.2ms ± 999us 31.9ms … 36.3ms 20 (13%) ⚡- 14.7% ± 0.6%
peak_rss 23.2MB ± 2.26KB 23.2MB … 23.2MB 0 ( 0%) + 0.0% ± 0.0%
Notably, this contains bug fixes related to `@errorCast` which are
required by the changes to `std.io.Reader` in this branch, and the
compiler source code has a dependency on `std.io.Reader`.
The idea here is to avoid code bloat by having only one actual io.Reader
implementation, which is type erased, and then implement a GenericReader
that preserves type information on top of that as thin glue code.
The strategy here is for that glue code to `@errSetCast` the result of
the type-erased reader functions, however, while trying to do that I
ran into #17343.
Implement the stub for Elf.
I believe that separating the concerns, namely, having an interface
function that is responsible for signalling the linker to lower
the anon decl only, and a separate function to obtain the decl's
vaddr is preferable since it allows us to handle codegen errors
in a simpler way.
Introduce the new mechanism needed to render anonymous decls to C code
that the frontend is now using.
The current strategy is to collect the set of used anonymous decls into
one ArrayHashMap for the entire compilation, and then render them during
flush().
In the future this may need to be adjusted for incremental compilation
purposes, so that removing a Decl from decl_table means that newly
unused anonymous decls are no longer rendered. However, let's do one
thing at a time. The only goal of this branch is to stop using
Module.Decl objects for unnamed constants.
Start keeping track of dependencies on anon decls for dependency
ordering during flush()
Currently this causes use of undefined symbols because these
dependencies need to get rendered into the output.
Instead of explicitly creating a `Module.Decl` object for each anonymous
declaration, each `InternPool.Index` value is implicitly understood to
be an anonymous declaration when encountered by backend codegen.
The memory management strategy for these anonymous decls then becomes to
garbage collect them along with standard InternPool garbage.
In the interest of a smooth transition, this commit only implements this
new scheme for string literals and leaves all the previous mechanisms in
place.
The `makeMem*()` functions crashed under valgrind in Debug and
ReleaseSafe modes.
The reason being that `doMemCheckClientRequestExpr()` returns `0`
when not running under Valgrind, and `maxInt(usize)` when running
under Valgrind.
Thus, `@as(i1, @intCast(maxInt(usize)))` always fails and these
functions crashed before returning.
That being said, what these functions used to return was quite
unexpected: `0` on error and `-1` on success (=running under valgrind).
That doesn't match any Zig nor C conventions.
But that return value doesn't seem to be very useful. Either we are
running under Valgrind or we are not. There's no point in checking this
for every single call. Applications are likely to always discard it.
So, just return a `void` instead.
Also avoid function comments that start with `Similarly, ...` because
that doesn't refer to anything in the context of autodoc or in IDEs.
Before this commit, GeneralPurposeAllocator could run into incredibly degraded performance in scenarios where the bucket count for a particular size class grew to be large. For example, if exactly `slot_count` allocations of a single size class were performed and then all of them were freed except one, then the bucket for those allocations would have to be kept around indefinitely. If that pattern of allocation were done over and over, then the bucket list for that size class could grow incredibly large.
This allocation pattern has been seen in the wild: https://github.com/Vexu/arocc/issues/508#issuecomment-1738275688
In that case, the length of the bucket list for the `128` size class would grow to tens of thousands of buckets and cause Debug runtime to balloon to ~8 minutes whereas with the c_allocator the Debug runtime would be ~3 seconds.
To address this, there are three different changes happening here:
1. std.Treap is used instead of a doubly linked list for the lists of buckets. This takes the time complexity of searchBucket [used in resize and free] from O(n) to O(log n), but increases the time complexity of insert from O(1) to O(log n) [before, all new buckets would get added to the head of the list]. Note: Any data structure with O(log n) or better search/insert/delete would also work for this use-case.
2. If the 'current' bucket for a size class is full, the list of buckets is never traversed and instead a new bucket is allocated. Previously, traversing the bucket list could only find a non-full bucket in specific circumstances, and only because of a separate optimization that is no longer needed (before, after any resize/free, the affected bucket would be moved to the head of the bucket list to allow searchBucket to perform better on average). Now, the current_bucket for each size class only changes when either (1) the current bucket is emptied/freed, or (2) a new bucket is allocated (due to the current bucket being full or null). Because each bucket's alloc_cursor only moves forward (i.e. slots within a bucket are never re-used), we can therefore always know that any bucket besides the current_bucket will be full, so traversing the list in the hopes of finding an existing non-full bucket is entirely pointless.
3. Size + alignment information for small allocations has been moved into the Bucket data instead of keeping it in a separate HashMap. This offers an improvement over the HashMap since whenever we need to get/modify the length/alignment of an allocation it's extremely likely we will already have calculated any bucket-related information necessary to get the data.
The first change is the most relevant and accounts for most of the benefit here. Also note that the overall functionality of GeneralPurposeAllocator is unchanged.
In the degraded `arocc` case, these changes bring Debug performance from ~8 minutes to ~20 seconds.
Benchmark 1: test-master.bat
Time (mean ± σ): 481.263 s ± 5.440 s [User: 479.159 s, System: 1.937 s]
Range (min … max): 477.416 s … 485.109 s 2 runs
Benchmark 2: test-optim-treap.bat
Time (mean ± σ): 19.639 s ± 0.037 s [User: 18.183 s, System: 1.452 s]
Range (min … max): 19.613 s … 19.665 s 2 runs
Summary
'test-optim-treap.bat' ran
24.51 ± 0.28 times faster than 'test-master.bat'
Note: Much of the time taken on Windows in this particular case is related to gathering stack traces. With `.stack_trace_frames = 0` the runtime goes down to 6.7 seconds, which is a little more than 2.5x slower compared to when the c_allocator is used.
These changes may or mat not introduce a slight performance regression in the average case:
Here's the standard library tests on Windows in Debug mode:
Benchmark 1 (10 runs): std-tests-master.exe
measurement mean ± σ min … max outliers delta
wall_time 16.0s ± 30.8ms 15.9s … 16.1s 1 (10%) 0%
peak_rss 42.8MB ± 8.24KB 42.8MB … 42.8MB 0 ( 0%) 0%
Benchmark 2 (10 runs): std-tests-optim-treap.exe
measurement mean ± σ min … max outliers delta
wall_time 16.2s ± 37.6ms 16.1s … 16.3s 0 ( 0%) 💩+ 1.3% ± 0.2%
peak_rss 42.8MB ± 5.18KB 42.8MB … 42.8MB 0 ( 0%) + 0.1% ± 0.0%
And on Linux:
Benchmark 1: ./test-master
Time (mean ± σ): 16.091 s ± 0.088 s [User: 15.856 s, System: 0.453 s]
Range (min … max): 15.870 s … 16.166 s 10 runs
Benchmark 2: ./test-optim-treap
Time (mean ± σ): 16.028 s ± 0.325 s [User: 15.755 s, System: 0.492 s]
Range (min … max): 15.735 s … 16.709 s 10 runs
Summary
'./test-optim-treap' ran
1.00 ± 0.02 times faster than './test-master'
* add nested packed struct/union behavior tests
* use ptr_info.packed_offset rather than trying to duplicate the logic from Sema.structFieldPtrByIndex()
* use the container_ptr_info.packed_offset to account for non-aligned nested structs.
* dedup type.packedStructFieldBitOffset() and module.structPackedFieldBitOffset()
zig fetch [options] <url>
zig fetch [options] <path>
Fetches a package which is found at <url> or <path> into the global
cache directory, printing the package hash to stdout.
Closes#16972
Related to #14280
Additionally, this commit:
* Adds uncompressed .tar support to package fetching
* Introduces symlink support to package fetching
The 5.11 in uname is not something that is ever updated. There is no
versioning of the illumos system in general. Illumos prefers to rely
on feature detection.
I can't say what Solaris does these days as I do not work at Oracle;
so I left it alone.
- Adds `illumos` to the `Target.Os.Tag` enum. A new function,
`isSolarish` has been added that returns true if the tag is either
Solaris or Illumos. This matches the naming convention found in Rust's
`libc` crate[1].
- Add the tag wherever `.solaris` is being checked against.
- Check for the C pre-processor macro `__illumos__` in CMake to set the
proper target tuple. Illumos distros patch their compilers to have
this in the "built-in" set (verified with `echo | cc -dM -E -`).
Alternatively you could check the output of `uname -o`.
Right now, both Solaris and Illumos import from `c/solaris.zig`. In the
future it may be worth putting the shared ABI bits in a base file, and
mixing that in with specific `c/solaris.zig`/`c/illumos.zig` files.
[1]: 6e02a329a2/src/unix/solarish
My previous change for reading / writing to unions at comptime did not handle
union field read/writes correctly in all cases. Previously, if a field was
written to a union, it would overwrite the entire value. This is problematic
when a field of a larger size is subsequently read, because the value would not
be long enough, causing a panic.
Additionally, the writing behaviour itself was incorrect. Writing to a field of
a packed or extern union should only overwrite the bits corresponding to that
field, allowing for memory reintepretation via field writes / reads.
I addressed these problems as follows:
Add the concept of a "backing type" for extern / packed unions
(`Type.unionBackingType`). For extern unions, this is a `u8` array, for packed
unions it's an integer matching the `bitSize` of the union. Whenever union
memory is read at comptime, it's read as this type.
When union memory is written at comptime, the tag may still be known. If so, the
memory is written using the tagged type. If the tag is unknown (because this
union had previously been read from memory), it's simply written back out as the
backing type.
I added `write_packed` to the `reinterpret` field of
`ComptimePtrMutationKit`. This causes writes of the operand to be packed - which
is necessary when writing to a field of a packed union. Without this, writing a
value to a `u1` field would overwrite the entire byte it occupied.
The final case to address was reading a different (potentially larger) field
from a union when it was written with a known tag. To handle this, a new kind of
bitcast was introduced (`bitCastUnionFieldVal`) which supports reading a larger
field by using a backing buffer that has the unwritten bits set to
undefined. The reason to support this (vs always just writing the union as it's
backing type), is that no reads to larger fields ever occur at comptime, it
would be strictly worse to have spent time writing the full backing type.
Closes#14298
This commit adds support for fetching dependencies over git+http(s)
using a minimal implementation of the Git protocols and formats relevant
to fetching repository data.
Git URLs can be specified in `build.zig.zon` as follows:
```zig
.xml = .{
.url = "git+https://github.com/ianprime0509/zig-xml#7380d59d50f1cd8460fd748b5f6f179306679e2f",
.hash = "122085c1e4045fa9cb69632ff771c56acdb6760f34ca5177e80f70b0b92cd80da3e9",
},
```
The fragment part of the URL may specify a commit ID (SHA1 hash), branch
name, or tag. It is an error to omit the fragment: if this happens, the
compiler will prompt the user to add it, using the commit ID of the HEAD
commit of the repository (that is, the latest commit of the default
branch):
```
Fetch Packages... xml... /var/home/ian/src/zig-gobject/build.zig.zon:6:20: error: url field is missing an explicit ref
.url = "git+https://github.com/ianprime0509/zig-xml",
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
note: try .url = "git+https://github.com/ianprime0509/zig-xml#dfdc044f3271641c7d428dc8ec8cd46423d8b8b6",
```
This implementation currently supports only version 2 of Git's wire
protocol (documented in
[protocol-v2](https://git-scm.com/docs/protocol-v2)), which was first
introduced in Git 2.19 (2018) and made the default in 2.26 (2020).
The wire protocol behaves similarly when used over other transports,
such as SSH and the "Git protocol" (git:// URLs), so it should be
reasonably straightforward to support fetching dependencies from such
URLs if the necessary transports are implemented (e.g. #14295).
(Unmanaged)ArrayList.insert has the same inefficiency as the old insertSlice. With the new addManyAt, the solution is trivial.
Also improves the test "growing memory preserves contents". In the previous implementation, if any changes were made to the ArrayList memory growth policy (function growMemory), the list could end up with enough capacity to not trigger a memory growth, defeating the purpose of the test. The new implementation more robustly triggers a memory growth.
These are an order of magnitude quicker than the previous
implementations:
A relative comparison of each, measuring scanning a 1G file.
Reading 1G (1.0000000009313226GiB)
std.mem.sliceTo: 281.232ms
vectorized.sliceTo: 24.769ms
strlen: 24.291ms
std.indexOfScalar: 229.016ms
vectorized.indexOfScalar: 24.685ms
memchr: 24.958ms
* Move `computeBetterCapacity` to the bottom so that `pub` stuff shows
up first.
* Rename `computeBetterCapacity` to `growCapacity`. Every function
implicitly computes something; that word is always redundant in a
function name. "better" is vague. Better in what way? Instead we
describe what is actually happening. "grow".
* Improve doc comments to be very explicit about when element pointers
are invalidated or not.
* Rename `addManyAtIndex` to `addManyAt`. The parameter is named
`index`; that is enough.
* Extract some duplicated code into `addManyAtAssumeCapacity` and make
it `pub`.
* Since I audited every line of code for correctness, I changed the
style to my personal preference.
* Avoid a redundant `@memset` to `undefined` - memory allocation does
that already.
* Fixed comment giving the wrong reason for not calling
`ensureTotalCapacity`.
Includes a more robust implementation of replaceRange, which updates the
ArrayListUnmanaged if state changes in the managed part of the code
before returning an error.
Co-authored-by: Andrew Kelley <andrew@ziglang.org>
This commit makes the following changes:
* Disallow file:/// URIs
* Allow only relative paths in the .path field of build.zig.zon
* Remote now-unneeded shlwapi dependency
This commit introduces `--debug-incremental` so that we can start
playing around with incremental compilation while it is still being
developed, and before it is enabled by default.
Currently it saves InternPool data, and has TODO comments for the
remaining things. Deserialization is not implemented yet, which will
require some post-processing such as to build a string map out of
null-terminated string table bytes.
The saved compiler state is stored in a file called <root-name>.zcs
alongside <root-name>.o, <root-name>.pdb, <root-name>.exe, etc. In case
of using the zig build system, these files are all in a zig-cache
directory.
For the self-hosted compiler, here is one data point on the performance
penalty of saving this data:
```
Benchmark 1 (3 runs): zig build-exe ...
measurement mean ± σ min … max outliers delta
wall_time 51.1s ± 354ms 50.7s … 51.4s 0 ( 0%) 0%
peak_rss 3.91GB ± 354KB 3.91GB … 3.91GB 0 ( 0%) 0%
cpu_cycles 212G ± 3.17G 210G … 216G 0 ( 0%) 0%
instructions 274G ± 57.5M 274G … 275G 0 ( 0%) 0%
cache_references 13.1G ± 97.6M 13.0G … 13.2G 0 ( 0%) 0%
cache_misses 1.12G ± 24.6M 1.10G … 1.15G 0 ( 0%) 0%
branch_misses 1.53G ± 1.46M 1.53G … 1.53G 0 ( 0%) 0%
Benchmark 2 (3 runs): zig build-exe ... --debug-incremental
measurement mean ± σ min … max outliers delta
wall_time 51.8s ± 271ms 51.5s … 52.1s 0 ( 0%) + 1.3% ± 1.4%
peak_rss 3.91GB ± 317KB 3.91GB … 3.91GB 0 ( 0%) - 0.0% ± 0.0%
cpu_cycles 213G ± 398M 212G … 213G 0 ( 0%) + 0.3% ± 2.4%
instructions 275G ± 79.1M 275G … 275G 0 ( 0%) + 0.1% ± 0.1%
cache_references 13.1G ± 26.9M 13.0G … 13.1G 0 ( 0%) - 0.1% ± 1.2%
cache_misses 1.12G ± 5.66M 1.11G … 1.12G 0 ( 0%) - 0.6% ± 3.6%
branch_misses 1.53G ± 1.75M 1.53G … 1.54G 0 ( 0%) + 0.2% ± 0.2%
```
At the end of each compilation with `--debug-incremental`, we end up
with a 43 MiB `zig.zcs` file that contains all of the InternPool data
serialized.
Of course, it will necessarily be more expensive to save the state than
to not save the state. However, this data point shows just how cheap the
save state operation is, with all of the groundwork laid for using a
serialization-friendly in-memory data layout.
Addresses #17015 by introducing a new startWithOptions. The only option is currently is a flag
to use the provided URI as is, without modification when passed to the server. Normally, this
is not needed nor desired. However, some REST APIs may have requirements that cannot be satisfied
with the default handling.
When the compiler's state lives through multiple Compilation.update()
calls, the C backend stores the rendered C source code for each
decl code body and forward declarations.
With this commit, the state is still stored, but it is managed in one
big array list in link/C.zig rather than many array lists, one for each
decl. This means simpler serialization and deserialization.
When compiling for *-windows-msvc, find the native libc_installation and
add the lib dirs to lib_dirs, so that system libs can be found.
Previously, `version` and `ole32` were detected via the mingw.libExists logic,
even on .msvc, which was a false positive. This detection logic for mingw doesn't
find uuid.lib, which was the failure that triggered this bugfix.
Only build the issue_5825 test if the native target is x86_64-windows-msvc,
since it requires the .msvc abi.
Some servers will respond with the identity encoding, meaning no
encoding, especially when responding to range-get requests. Adding the
identity encoding stops the header parser from failing when it
encounters this.
SDK version detection:
- read SDKSettings.json before inferral from SDK path
- vendored libc: add SDKSettings.json for SDK version info
resolveLibSystem:
- adjust search order to { search_dirs, { sysroot or vendored }}
- previous search order was { sysroot, search_dirs, vendored }
- update include dirs to use combined dir
- use one libSystem.tbd (drop use of libSystem.VERSION.tbd)
- update canBuildLibC to check for minimum os version only
This fixes a panic in `unionAbiSize` when a 0-length array of a union is used as a struct field.
Because `resolveTypeLayout` does not resolve the `elem_ty` if `arrayLenIncludingSentinel` returns
0 for the array, the child union type is not guaranteed to have a resolved layout at this point.
Fixed this case by just returning 0 here.
The SVG looks way better than the pixelated PNG and will adapt best to
whatever screen it is being displayed on. The PNG continues to be used
because Apple Safari does not support SVG favicons yet. All other major
browsers do. See https://caniuse.com/link-icon-svg.
This is a companion PR to ziglang/www.ziglang.org#310.
Notable changes in this update:
127198e58c fixes building zig2 artifact on
macOS Sonoma 14.0 (more specifically the SDK 14.0 linker).
a8d2ed8065 fixed some alignment edge
cases which is needed to do the store_hash=false change in the compiler
source code.
df5f0517b3 preserves result type
information through the address-of operator.
Instead of linear search every time a packed struct field's bit or byte
offset is wanted, they are computed once during resolution of the packed
struct's backing int type, and stored in InternPool for O(1) lookup.
Closes#17178
I have observed the standard library tests overflowing the default WASI
stack as of the previous commit. As best as I can tell, this isn't
directly our fault: LLVM is just emitting less efficient code in debug
builds with the new codegen patterns.
This commit introduces the new `ref_coerced_ty` result type into AstGen.
This represents a expression which we want to treat as an lvalue, and
the pointer will be coerced to a given type.
This change gives known result types to many expressions, in particular
struct and array initializations. This allows certain casts to work
which previously required explicitly specifying types via `@as`. It also
eliminates our dependence on anonymous struct types for expressions of
the form `&.{ ... }` - this paves the way for #16865, and also results
in less Sema magic happening for such initializations, also leading to
potentially better runtime code.
As part of these changes, this commit also implements #17194 by
disallowing RLS on explicitly-typed struct and array initializations.
Apologies for linking these changes - it seemed rather pointless to try
and separate them, since they both make big changes to struct and array
initializations in AstGen. The rationale for this change can be found in
the proposal - in essence, performing RLS whilst maintaining the
semantics of the intermediary type is a very difficult problem to solve.
This allowed the problematic `coerce_result_ptr` ZIR instruction to be
completely eliminated, which in turn also simplified the logic for
inferred allocations in Sema - thanks to this, we almost break even on
line count!
In doing this, the ZIR instructions surrounding these initializations
have been restructured - some have been added and removed, and others
renamed for clarity (and their semantics changed slightly). In order to
optimize ZIR tag count, the `struct_init_anon_ref` and
`array_init_anon_ref` instructions have been removed in favour of using
`ref` on a standard anonymous value initialization, since these
instructions are now virtually never used.
Lastly, it's worth noting that this commit introduces a slightly strange
source of generic poison types: in the expression `@as(*anyopaque, &x)`,
the sub-expression `x` has a generic poison result type, despite no
generic code being involved. This turns out to be a logical choice,
because we don't know the result type for `x`, and the generic poison
type represents precisely this case, providing the semantics we need.
Resolves: #16512Resolves: #17194
Source location resolution previously made ZIR printing incredibly slow,
since it was O(N^2). Since we usually resolve source locations
approximately in order, it is much more efficient to resolve them using
a "cursor" which navigates the file.
This takes the time for `zig ast-check -t Sema.zig` down from many
minutes (enough that I got bored and killed the process; well over 10)
to a few seconds.
SPIR-V doesn't support true element indexing, so we probably
need to switch over to isByRef like in llvm for this to work
properly. Currently a temporary is used, which at least
seems to work.
This will help us both to make the implementation a little
more efficient by caching emission for certain types like structs,
and also allow us to attach extra information about types that we
can use while lowering without performing a search over the entire
type tree for some property.
Follow up to https://github.com/ziglang/zig/pull/17069.
This TODO being left in was a complete oversight.
Before, any 'retryable' error would hit:
error: thread 2920 panic: access of union field 'success' while field 'failure_retryable' is active
Now, it will be reported/handled properly:
C:\Users\Ryan\Programming\Zig\zig\test\standalone\windows_resources\res\zig.rc:1:1: error: FileNotFound
When the tag is not known, it's set to `.none`. In this case, the value is either an
array of bytes (for extern unions) or an integer (for packed unions).
This was previously implemented by analyzing the AIR prior to the ZIR
`make_ptr_const` instruction. This solution was highly delicate, and in
particular broke down whenever there was a second `alloc` between the
`store` and `alloc` instructions, which is especially common in
destructure statements.
Sema now uses a different strategy to detect whether a `const` is
comptime-known. When the `alloc` is created, Sema begins tracking all
pointers and stores which refer to that allocation in temporary local
state. If any store is not comptime-known or has a higher runtime index
than the allocation, the allocation is marked as being runtime-known.
When we reach the `make_ptr_const` instruction, if the allocation is not
marked as runtime-known, it must be comptime-known. Sema will use the
set of `store` instructions to re-initialize the value in comptime
memory. We optimize for the common case of a single `store` instruction
by not creating a comptime alloc in this case, instead directly plucking
the result value from the instruction.
Resolves: #16083
Previously, @memset at comptime performed N pointer stores. This is less
efficient than just storing a whole array of values at once. The
difference can be quite drastic when reinterpreting memory - a test case
which is 40s on master branch now takes under a second on a debug
compiler build.
Resolves: #17214
This reverts commit bb98ffbe7f.
This caused master branch to fail the CI. We do need to figure out why
these assertions are tripping but let's do it in a branch other than
master.
It seems the webassembly backend does not want the exception that
`structFieldAlignmentExtern` makes for 128-bit integers. Perhaps that
logic should be modified to check if the target is wasm.
Without this, this branch fails the C ABI tests for wasm, causing this:
```
wasm-ld: warning: function signature mismatch: zig_struct_u128
>>> defined as (i64, i64) -> void in cfuncs.o
>>> defined as (i32) -> void in test-c-abi-wasm32-wasi-musl-ReleaseFast.wasm.o
```
When struct types have no field names, the names are implicitly
understood to be strings corresponding to the field indexes in
declaration order. It used to be the case that a NullTerminatedString
would be stored for each field in this case, however, now, callers must
handle the possibility that there are no names stored at all. This
commit introduces `legacyStructFieldName`, a function to fake the
previous behavior. Probably something better could be done by reworking
all the callsites of this function.
This changeset fixes the handling of alignment in several places. The
new rules are:
* `@alignOf(T)` where `T` is a runtime zero-bit type is at least 1,
maybe greater.
* Zero-bit fields in `extern` structs *do* force alignment, potentially
offsetting following fields.
* Zero-bit fields *do* have addresses within structs which can be
observed and are consistent with `@offsetOf`.
These are not necessarily all implemented correctly yet (see disabled
test), but this commit fixes all regressions compared to master, and
makes one new test pass.
Previously it would canonicalize or not depending on some volatile
internal state of the compiler, now it forces resolution of the element
type to determine the alignment if it needs to.
All of the logic in `Value.elemValue` is quite questionable, but
printing an error is definitely better than crashing. Notably, this
should stop us from hitting crashes when dumping AIR.
This also modifies AstGen so that struct types use 1 bit each from the
flags to communicate if there are nonzero inits, alignments, or comptime
fields. This allows adding a struct type to the InternPool without
looking ahead in memory to find out the answers to these questions,
which is easier for CPUs as well as for me, coding this logic right now.
Structs were previously using `SegmentedList` to be given indexes, but
were not actually backed by the InternPool arrays.
After this, the only remaining uses of `SegmentedList` in the compiler
are `Module.Decl` and `Module.Namespace`. Once those last two are
migrated to become backed by InternPool arrays as well, we can introduce
state serialization via writing these arrays to disk all at once.
Unfortunately there are a lot of source code locations that touch the
struct type API, so this commit is still work-in-progress. Once I get it
compiling and passing the test suite, I can provide some interesting
data points such as how it affected the InternPool memory size and
performance comparison against master branch.
I also couldn't resist migrating over a bunch of alignment API over to
use the log2 Alignment type rather than a mismash of u32 and u64 byte
units with 0 meaning something implicitly different and special at every
location. Turns out you can do all the math you need directly on the
log2 representation of alignments.
The new CI images provided by GitHub no longer provide ninja.exe, so
this commit installs it explicitly, using the suggested method from
GitHub.
See https://github.com/actions/runner-images/issues/8343 for more
details.
Simplify wording and add some formatting in several locations.
Expand sentinel array tests to highlight (non-)handling of internal
sentinels.
Fix format of symbol names in function pointers example.
Clarify wording a bit on the builin atomic* documentation.
Remove the (second) builtin compileLog example that demonstrated a lack of
compileLog entries.
* langref: address comments from rohlem
Use "0-terminated" instead of "null-terminated".
Undo some changes that were not as clear an improvement as I though.
* langref: remove stray ""
Thanks to rohlem for spotting this typo.
previously, in the container view (the type of view that you see when
you look at `std` for example), when listing types and namespaces, we
would only show doc comments places on the direct child decl, which in
the case of the `std` namespace, for example, it's just a bunch of
re-exports.
now, if we don't find a direct doc comment, we chase indirection and
display doc comments placed directly on the definition, if any.
this is the precise priority order:
```
/// 1
pub const Foo = _Foo;
/// 2
const _Foo = struct {
//! 3
};
```
The numbers show the priority order for autodoc.
It is very common, and well-defined, for a pointer on one side of a C ABI
to have a different but compatible element type. Examples include:
- `char*` vs `uint8_t*` on a system with 8-bit bytes
- `const char*` vs `char*`
- `char*` vs `unsigned char*`
Without this flag, Clang would invoke UBSAN when such an extern
function was called.
Might be nice to file an upstream issue and find out if there is a more
precise way to disable the problematic check.
`-fsanitize-cfi-icall-generalize-pointers` looks promising according to
the documentation, but empirically it does not work.
release/17.x branch, commit 8f4dd44097c9ae25dd203d5ac87f3b48f854bba8
This adds the flag `-D_LIBCPP_PSTL_CPU_BACKEND_SERIAL`. A future
enhancement could possibly pass something different if there is a
compelling parallel implementation. That libdispatch one might be worth
looking into.
* some manual fixes to generated CPU features code. in the future it
would be nice to make the script do those automatically. I suspect
the sm_90a thing is a bug in LLVM.
* add liteos to various target OS switches. I know nothing about this
OS; someone will need to work specifically on support for this OS
when the time comes to support it properly in zig.
* while waiting for the compiler, I went ahead and made more
conservative choices about when to use `inline` in std/Target.zig
Currently, the compiler (like @typeName) writes it `fn(...) Type` but
zig fmt writes it `fn (...) Type` (notice the space after `fn`).
This inconsistency is now resolved and function types are consistently
written the zig fmt way. Before this there were more `fn (...) Type`
occurrences than `fn(...) Type` already.
Safety is not a global flag that should be enabled or disabled for all
stores - it's lowered by the frontend directly into AIR instruction
semantics. The flag for this is communicated via the `store` vs
`store_safe` AIR instructions, and whether to write 0xaa bytes or not
should be decided in `airStore` and passed down via function parameters.
This commit is a step backwards since it removes functionality but it
aims our feet towards a better mountain to climb.
C99 introduced designated initializers for structs. Omitted fields are
implicitly initialized to zero. Some C APIs are designed with this in
mind. Defaulting to zero values for translated struct fields permits Zig
code to comfortably use such an API.
Closes#8165
This introduces the concept of a "weak global name" into translate-c.
translate-c consists of two passes. The first is important, because it
discovers all global names, which are used to prevent naming conflicts:
whenever we see an identifier in the second pass, we can mangle it if it
conflicts with any global or any other in-scope identifier.
Unfortunately, this is a bit tricky for structs, unions, and enums. In
C, these types are not represented by normal identifers, but by separate
tags - `struct foo` does not prevent an unrelated identifier `foo`
existing. In general, we want to translate type names to user-friendly
ones such as `struct_foo` and `foo` where possible, but we can't
guarantee such names will not conflict with real variable names.
This is where weak global names come in. In the initial pass, when a
global type declaration is seen, `struct_foo` and `foo` are both added
as weak global names. This essentially means that we will use these
names for the type *if possible*, but if there is another global with
the same name, we will mangle the type name instead. Then, when actually
translating the declaration, we check whether there's a "true" global
with a conflicting name, in which case we mangle our name. If the
user-friendly alias `foo` conflicts, we do not attempt to mangle it: we
just don't emit it, because a mangled alias isn't particularly helpful.
The _environ variable that is populated when linking libc on Windows does not support Unicode keys/values (or, at least, the encoding is not necessarily UTF-8). So, for Unicode support, _wenviron would need to be used instead. However, this means that the keys/values would be encoded as UTF-16, so they would need to be converted to UTF-8 before being returned by `os.getenv`. This would require allocation which is not part of the `os.getenv` API, so `os.getenv` is not implementable on Windows even when linking libc.
Closes https://github.com/ziglang/zig/issues/8456
this will make s3 re-enable compression for the stdlib's autodoc
and improve loading times (and data usage) for users
alongside this commit the deploy script for the official website is also
being updated
Consider this C macro:
```c
#define FOO(x) struct x
```
Previously, translate-c did not detect that the `x` in the body referred
to the argument, so wrongly translated this code as using the
nonexistent `struct_x`. Since undefined identifiers are noticed in
AstGen, this prevents the translated file from being usable at all.
translate-c now instead detects this case and emits an appropriate
compile error in the macro's place.
AstGen emits an error when a closure over a known-runtime value crosses
a namespace boundary. This usually makes sense: however, this usage is
actually valid if the capture is within a `@TypeOf` operand. Sema
already has a special case to allow such closure within `@TypeOf` when
AstGen could not determine a value to be runtime-known. This commit
simply introduces analagous logic to AstGen to allow `var`s to cross
namespace boundaries within `@TypeOf`.
The include directories used when preprocessing .rc files are now separate from the target, and by default will use the system MSVC include paths if the MSVC + Windows SDK are present, otherwise it will fall back to the MinGW includes distributed with Zig. This default behavior can be overridden by the `-rcincludes` option (possible values: any (the default), msvc, gnu, or none).
This behavior is useful because Windows resource files may `#include` files that only exist with in the MSVC include dirs (e.g. in `<MSVC install directory>/atlmfc/include` which can contain other .rc files, images, icons, cursors, etc). So, by defaulting to the `any` behavior (MSVC if present, MinGW fallback), users will by default get behavior that is most-likely-to-work.
It also should be okay that the include directories used when compiling .rc files differ from the include directories used when compiling the main binary, since the .res format is not dependent on anything ABI-related. The only relevant differences would be things like `#define` constants being different values in the MinGW headers vs the MSVC headers, but any such differences would likely be a MinGW bug.
I have updated Felix's ZigAndroidTemplate to work with the latest
version of Zig and we are exploring adding Android support to Mach engine.
`std.c.getcontext` is _referenced_ but not _used_, and Android's bionic libc
does not implement `getcontext`. `std.os.linux.getcontext` also cannot be
used with bionic libc, so it seems prudent to just disable this extern for now.
This may not be the perfect long-term fix, but I have a golden rebuttal to that:
before I was unable to compile Zig applications for Android, and now I can.
<img width="828" alt="image" src="https://github.com/hexops/mach/assets/3173176/1e29142b-0419-4459-9c8b-75d92f87f822">
Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
This is supposed to be the case, similar to how pointers to generic
functions are comptime-only (several pieces of logic already assumed
this). These types being considered runtime was causing `dbg_var_val`
AIR instructions to be wrongly emitted for such values, causing codegen
backends to create a runtime reference to the inline function, which (at
least on the LLVM backend) triggers an error.
Resolves: #38
The new `@depedencies` module contains generated code like the
following (where strings like "abc123" represent hashes):
```zig
pub const root_deps = [_]struct { []const u8, []const u8 }{
.{ "foo", "abc123" },
};
pub const packages = struct {
pub const abc123 = struct {
pub const build_root = "/home/mlugg/.cache/zig/blah/abc123";
pub const build_zig = @import("abc123");
pub const deps = [_]struct { []const u8, []const u8 }{
.{ "bar", "abc123" },
.{ "name", "ghi789" },
};
};
};
```
Each package contains a build root string, the build.zig import, and a
mapping from dependency names to package hashes. There is also such a
mapping for the root package dependencies.
In theory, we could now remove the `dep_prefix` field from `std.Build`,
since its main purpose is now handled differently. I believe this is a
desirable goal, as it doesn't really make sense to assign a single FQN
to any package (because it may appear in many different places in the
package hierarchy). This commit does not remove that field, as it's used
non-trivially in a few places in the build runner and compiler tests:
this will be a future enhancement.
Resolves: #16354Resolves: #17135
This applies the changes to the grammar introduced by the new
destructuring syntax, as well as some existing changes which were not
copied into the langref.
This change implements the following syntax into the compiler:
```zig
const x: u32, var y, foo.bar = .{ 1, 2, 3 };
```
A destructure expression may only appear within a block (i.e. not at
comtainer scope). The LHS consists of a sequence of comma-separated var
decls and/or lvalue expressions. The RHS is a normal expression.
A new result location type, `destructure`, is used, which contains
result pointers for each component of the destructure. This means that
when the RHS is a more complicated expression, peer type resolution is
not used: each result value is individually destructured and written to
the result pointers. RLS is always used for destructure expressions,
meaning every `const` on the LHS of such an expression creates a true
stack allocation.
Aside from anonymous array literals, Sema is capable of destructuring
the following types:
* Tuples
* Arrays
* Vectors
A destructure may be prefixed with the `comptime` keyword, in which case
the entire destructure is evaluated at comptime: this means all `var`s
in the LHS are `comptime var`s, every lvalue expression is evaluated at
comptime, and the RHS is evaluated at comptime. If every LHS is a
`const`, this is not allowed: as with single declarations, the user
should instead mark the RHS as `comptime`.
There are a few subtleties in the grammar changes here. For one thing,
if every LHS is an lvalue expression (rather than a var decl), a
destructure is considered an expression. This makes, for instance,
`if (cond) x, y = .{ 1, 2 };` valid Zig code. A destructure is allowed
in almost every context where a standard assignment expression is
permitted. The exception is `switch` prongs, which cannot be
destructures as the comma is ambiguous with the end of the prong.
A follow-up commit will begin utilizing this syntax in the Zig compiler.
Resolves: #498
When RLS is used to initialize a value with a comptime-only type, the
usual "value with comptime-only type depends on runtime control flow"
error message isn't hit, because we don't use results from a block. When
we reach `make_ptr_const`, we must validate that the value is
comptime-known.
This option exists for fast iteration during compiler development, since
avoiding codegen (but still performing semantic analysis) causes compile
errors to appear around twice as fast. This option was broken when the
`emit_bin` property on `std.Build.Step.Compile` was removed.
Now, when this option is passed, we add a direct install dependency on
the compile step itself, so that it is always run.
Insn.st() can be used with dynamic size just like Insn.stx(), which is
relevant in a code generation context.
using ImmOrReg caused an error as its fields were ordered differently than
Source.
The `coerce_result_ptr` instruction is highly problematic and leads to
unintentional memory reinterpretation in some cases. It is more correct
to simply not forward result pointers through this builtin.
`coerce_result_ptr` is still used for struct and array initializations,
where it can still cause issues. Eliminating this usage will be a future
change.
Resolves: #16991
* Use 32-bit integers instead of pointers for compactness and
serialization friendliness.
* Use a separate hash map for runtime and comptime capture scopes,
avoiding the 1-bit union tag.
* Use a compact array representation instead of a tree of hash maps.
* Eliminate the only instance of ref-counting in the compiler, instead
relying on garbage collection (not implemented yet but is the plan for
almost all long-lived objects related to incremental compilation).
Because a code modification may need to access capture scope data, this
makes capture scope data long-lived state. My goal is to get incremental
compilation state serialization down to a single pwritev syscall, by
unifying the on-disk representation with the in-memory representation.
This commit eliminates the last remaining pointer field of
`Module.Decl`.
This will allow users to construct e.g. a ComptimeStringMap that uses case-insensitive ASCII comparison.
Note: the previous ComptimeStringMap API is unchanged (i.e. this does not break any existing code).
Instead of using actual slices for InternPool.Key.AnonStructType, this
commit changes to use Slice types instead, which store a
long-lived index rather than a pointer.
This is a follow-up to 7ef1eb1c27.
This reverts commit 4d1432299f.
Please don't hard-code unrelated concerns this way. build.zig should not
have awareness of the naming conventions for cmake build directories.
Closes#17011Closes#17012
This commit allows the logo to scale more freely to fit its container,
and removes some extra margins so that the content scroll bar is flush
with the right side of the viewport.
We already have a zig build step for this: test-fmt.
* Rename the test-fmt step to check-fmt.
test-fmt sounds like it runs tests for `zig fmt` itself (lib/std/zig/render.zig).
* Use it instead of `zig fmt --check` in the CI scripts.
* Also use it in CI scripts that didn't have this check before.
Handles .extended_header type to parse PAX attributes and check if they override
the path of the next file. Increases file path limit to std.fs.MAX_PATH_BYTES.
Fixes#15342
Now that allocator.resize() is allowed to fail, programs may wish to
test code paths that handle resize() failure. The simplest way to do
this now is to replace the vtable of the testing allocator with one
that uses Allocator.noResize for the 'resize' function pointer.
An alternative way to support this testing capability is to augment the
FailingAllocator (which is already useful for testing allocation failure
scenarios) to intentionally fail on calls to resize(). To do this, add a
'resize_fail_index' parameter to the FailingAllocator that causes
resize() to fail after the given number of calls.
Detect system libcxx name with the CMake build system and convey that
information to build.zig via config.h so that the same system libcxx is
used for stage3.
This undoes the hacky search 901457d173 .
closes#17018
Usage of FILE_RENAME_IGNORE_READONLY_ATTRIBUTE or
FILE_DISPOSITION_IGNORE_READONLY_ATTRIBUTE for posix semantics require
win10_rs5 instead of win10_rs1 necessary for posix semantics. Keep it as simple
as possible, since it is reasonable to expect users being able to update
win10_rs5 or use non-posix semantics instead.
Closes#17049.
This avoids leaving undefined bytes in the output file in case
a decl has been garbage collected before it's been committed into
the binary.
This should have minimal impact on performance since we have to
write globals on every iteration anyway.
Not all hashes are added just yet as these need to be generated manually
from reference implementations as they are not included by default in
smhasher.
`statFile` now only uses `os.fstatatWasi` when not linking libc, matching the pattern used throughout other `Dir` functions. This fixes the compilation error: `error: struct 'c.wasi.Stat' has no member named 'fromFilestat'` (which the added test would have failed with)
Eventually, we will validate DWARF info upfront and report errors
to the user but this will require a rewrite of several parts of
the linker so leaving as a TODO for the near future.
`std.zig.system.darwin.getSdk` now pulls only the SDK path
so we execute a child process only once and not twice as it was
until now since we parse the SDK version directly from the pulled path.
This is actually how `ld64` does it too.
This makes the call sites easier to read, reduces the number of `catch`
expressions required, and prepares for comptime reasons to appear
earlier in the list of notes.
The following cast builtins did not previously work on vectors, and have
been made to:
* `@floatCast`
* `@ptrFromInt`
* `@intFromPtr`
* `@floatFromInt`
* `@intFromFloat`
* `@intFromBool`
Resolves: #16267
`TailQueue` was implemented as a doubly-linked list, but named after an
abstract data type. This was inconsistent with `SinglyLinkedList`, which
can be used to implement an abstract data type, but is still named after
the implementation. Renaming `TailQueue` to `DoublyLinkedList` improves
consistency between the two type names, and should help discoverability.
`TailQueue` is now a deprecated alias of `DoublyLinkedList`.
Related to issues #1629 and #8233.
This is a breaking change.
This commit applies the following rules to std.os.uefi:
* avoid redundant names in the namespace such as "protocol.FooProtocol"
* don't initialize struct field to undefined. do that at the
initialization site if you want that, or create a named constant that
sets all the fields to undefined.
* avoid the word "data", "info", "context", "state", "details", or
"config" in the type name, especially if a word from that category is
already in the type name.
* embrace tree structure
After following these rules, `usingnamespace` disappeared naturally.
This commit eliminates 26/53 (49%) instances of `usingnamespace` in the
standard library. All these uses were due to not understanding how
to properly use namespaces.
I did not test this commit. The standard library UEFI code is
experimental and pull requests have been accepted with minimal vetting.
Users of std.os.uefi will need to submit follow-up pull requests to fix
up whatever regressions this commit introduces, this time without
abusing namespaces (pun intended).
Implement lowering code for `@mod` on integers in the stage2 wasm backend
Fix invalid wasm being produced for `@divFloor` on signed integers by the stage2 wasm backend
There are a couple concepts here worth understanding:
Key.UnionType - This type is available *before* resolving the union's
fields. The enum tag type, number of fields, and field names, field
types, and field alignments are not available with this.
InternPool.UnionType - This one can be obtained from the above type with
`InternPool.loadUnionType` which asserts that the union's enum tag type
has been resolved. This one has all the information available.
Additionally:
* ZIR: Turn an unused bit into `any_aligned_fields` flag to help
semantic analysis know whether a union has explicit alignment on any
fields (usually not).
* Sema: delete `resolveTypeRequiresComptime` which had the same type
signature and near-duplicate logic to `typeRequiresComptime`.
- Make opaque types not report comptime-only (this was inconsistent
between the two implementations of this function).
* Implement accepted proposal #12556 which is a breaking change.
Created from a conversation with @andrewrk on irc: Memory leaks when using ArrayList can be inconvenient to debug when the stack frame size is 4 because the entirety of the printed frame is within zig stdlib, and not in the users calling stack. Increasing this to 6 for Debug builds, gives 2 frames of user code. I increased the frame size for tests as well by the equivalent factor, but I'm unconvinced that's actually desirable.
This logic already existed for the void type, but is also necessary for
other 0-bit types. Without it, we try to alloc a local for a 0-bit type
which gets translated to a local of type `void` which C doesn't like.
The main motivation for this change is eliminating the `block_ptr`
result location and corresponding `store_to_block_ptr` ZIR instruction.
This is achieved through a simple pass over the AST before AstGen which
determines, for AST nodes which have a choice on whether to provide a
result location, which choice to make, based on whether the result
pointer is consumed non-trivially.
This eliminates so much logic from AstGen that we almost break even on
line count! AstGen no longer has to worry about instruction rewriting
based on whether or not a result location was consumed: it always knows
what to do ahead of time, which simplifies a *lot* of logic. This also
incidentally fixes a few random AstGen bugs related to result location
handling, leading to the changes in `test/` and `lib/std/`.
This opens the door to future RLS improvements by making them much
easier to implement correctly, and fixes many bugs. Most ZIR is made
more compact after this commit, mainly due to not having redundant
`store_to_block_ptr` instructions lying around, but also due to a few
bugs in the old system which are implicitly fixed here.
This makes Cache.findPrefix/findPrefixResolved use `std.fs.path.relative` instead of `std.mem.startsWith` when checking if a file is within a prefix. This fixes multiple edge cases around prefix detection:
- If a prefix path ended with a path separator, then the first character of the 'sub_path' would get cut off because the previous implementation assumed it was a path separator. Example: prefix: `/foo/`, file_path: `/foo/abc.txt` would see that they both start with `/foo/` and then slice starting from one byte past the common prefix, ending up with `bc.txt` instead of the expected `abc.txt`
- If a prefix contained double path separators after any component, then the `startsWith` check would erroneously fail. Example: prefix: `/foo//bar`, file_path: `/foo/bar/abc.txt` would not see that abc.txt is a sub path of the prefix `/foo//bar`
- On Windows, case insensitivity was not respected at all, instead the UTF-8 bytes were compared directly
This fixes all of the things in the above list (and possibly more).
Before, selfExePath would be the path of the symlink on Windows instead of the path of what the symlink points to.
This deprecates fs.selfExePathW since it does not provide a separate use-case over selfExePath (before, the W version could return [:0]u16 directly)
Fixes#16670
This matches how other filesystem functions were made to handle BAD_NETWORK_PATH/BAD_NETWORK_NAME in https://github.com/ziglang/zig/pull/16568. ReadLink was the odd one out, but that is no longer the case.
In theory, localhost could be mapped to a different address via the LMHOSTS file, so using 127.0.0.1 should remove that potential wrinkle and allow the drive-absolute -> UNC transformation to work on any(?) setup.
Also print the error name to ensure it gets printed in CI (aarch64-windows ReleaseSmall seemed not to print the error in the last intermittent UNC failure)
* Generalise NaN handling and make std.math.nan() give quiet NaNs
* Address uses of std.math.qnan_* and std.math.nan_* consts
* Comment out failing test due to issues with signalling NaN
* Fix issue in c_builtins.zig where we need qnan_u32
Prior to this change, we would unconditionally emit any system include path/framework
path as `-iwithsysroot`/`-iframeworkwithsysroot` if the sysroot was
set which can lead to unexpected build failures. Now, calls to
`b.addSystemIncludePath` and `b.addFrameworkPath` will always emit
search paths as `-isystem`/`-iframework`. As a result, it is now up to
the user to correctly concat the search paths with the sysroot when
and where desired.
If there is a need for emitting `-iwithsysroot`/`-iframeworkwithsysroot`
I would advise adding explicit hooks such as `addSystemIncludePathWithSysroot`
and `addFrameworkPathWithSysroot`.
The key changes in this commit are:
```diff
- names: []const NullTerminatedString,
+ names: NullTerminatedString.Slice,
- values: []const Index,
+ values: Index.Slice,
```
Which eliminates the slices from `InternPool.Key.EnumType` and replaces
them with structs that contain `start` and `len` indexes. This makes the
lifetime of `EnumType` change from expiring with updates to InternPool,
to expiring when the InternPool is garbage-collected, which is currently
never.
This is gearing up for a larger change I started working on locally
which moves union types into InternPool.
As a bonus, I fixed some unnecessary instances of `@as`.
Closes#16861
Using `alloc_if_needed` when parsing a `Value` allows receiving a token
which points to the buffer of the underlying `Reader`. This token will
no longer be valid after the `Reader`'s buffer is refilled, which will
happen with large values. Using `alloc_always` avoids this issue by
ensuring the returned tokens always own their data independently of the
underlying buffer.
The previous logic was using `root_src_directory` both in constructing
the resolved path and as the root when opening the resolved path, which
effectively duplicates those path components in the final path.
When using `std.Build.dependency` with target options, dependencies
would sometimes get targets which are equivalent but have distinct
names, e.g. `native` vs `native-native`. This is a somewhat broad issue,
and it's unclear how to fix it more generally - perhaps we should
special-case CrossTarget in options passing, or maybe targets should
have a canonical name which we guarantee to use everywhere aside from
raw user input.
However, this commit fixes the most egregious issue, which was an active
blocker to using the package manager for some users. This was caused by
the CPU changing from `native` to a specific descriptor (e.g.
`skylake+sgx`), which then changed the behavior of `zigTriple`.
Resolves: #16856
This is in particular very important to the Zig language which
allows exporting the same symbol under different names. For instance,
it is possible to have a case such that:
```
...
4258 T _foo
4258 T _bar
...
```
In this case we need to keep track of both symbol names when resolving
FDEs and unwind records.
This fixes a few things:
- Previously, CreateSymbolicLink would always create a relative link if a `dir` was provided, but the relative-ness of a link should be determined by the target path, not the null-ness of the `dir`.
- Special handling is now done to symlink to 'rooted' paths correctly (they are treated as a relative link, which is different than how the xToPrefixedFileW functions treat them)
- ReadLink now correctly supports UNC paths via a new `ntToWin32Namespace` function which intends to be an analog of `RtlNtPathNameToDosPathName` (RtlNtPathNameToDosPathName is not used because it seems to heap allocate as it takes an RTL_UNICODE_STRING_BUFFER)
Commit ea9917d9bd introduced usage
of fs.Dir.realpath which eventually calls os.getFdpath which is
forbidden to be used by the compiler. It causes building zig to fail on
OpenBsd, NetBSD and older versions of FreeBSD and DragonFly.
This patch substitutes with os.realpath on libc targets and eventually
calls c.realpath and allows zig to build. Any use of realpath is not
desired but this is the lesser evil.
We do not yet have correct implementation to access xmm registers from
the world of ucontext/mcontext.
Unimplement .netbsd to allow building zig compiler.
* std.unicode: Add more UTF-16 decoding functions
This mostly makes parts of Utf16LeIterator reusable
* std.json: Fix decoding of UTF-16 surrogate pairs
Before this commit, there were 524,288 codepoints that would get decoded improperly. After this commit, there are 0.
Fixes#16828
Some builtin types have a special InternPool index (e.g.
`.type_info_type`) so that AstGen can refer to them before semantic
analysis. Unfortunately, this previously led to a second index existing
to refer to the type once it was resolved, complicating Sema by having
the concept of an "unresolved" type index.
This change makes Sema modify these InternPool indices in-place to
contain the expanded representation when resolved. The analysis of the
corresponding decls is caught in `Module.semaDecl`, and a field is set
on Sema telling it which index to place struct/union/enum types at. This
system could break if `std.builtin` contained complex decls which
evaluate multiple struct types, but this will be caught by the
assertions in `InternPool.resolveBuiltinType`.
The AstGen result types which were disabled in 6917a8c have been
re-enabled.
Resolves: #16603
* Consistent decryption tail for all AEADs
* Remove outdated note
This was previously copied here from another function. There used
to be another comment on the tag verification linking to issue #1776,
but that one was not copied over. As it stands, this note seems fairly
misleading/irrelevant.
* Prettier docs
* Add note about plaintext contents to docs
* Capitalization
* Fixup missing XChaChaPoly docs
The panic handler decl_val was previously given a `unneeded` source
location, which was then added to the reference trace, resulting in a
crash if the source location was used in the reference trace. This
commit makes two trivial changes:
* Don't add unneeded source locations to the ref table (panic in debug, silently ignore in release)
* Pass a real source location when analyzing the panic handler
Previously, a relative path like `..` would:
- Attempt to be normalized (i.e. remove . and .. without any path resolution), but would error with TooManyParentDirs
- This would make wToPrefixedFileW run it through `RtlGetFullPathName_U` to do the necessary path resolution, but `RtlGetFullPathName_U` always resolves relative paths relative to the CWD
Instead, when TooManyParentDirs occurs, we now look up the path of the passed in `dir` (if it's non-null) and append the relative path to it before giving it to `RtlGetFullPathName_U`. If `dir` is null, then we just give it RtlGetFullPathName_U directly and let it resolve it relative to the CWD.
Closes#16779
After ff37ccd, interned values are trivial to convert to Air refs, using
`Air.internedToRef`. This made functions like `Sema.addConstant` effectively
redundant. This commit removes `Sema.addConstant` and `Sema.addType`, replacing
them with direct usages of `Air.internedToRef`.
Additionally, a new helper `Module.undefValue` is added, and the following
functions are moved into Module:
* `Sema.addConstUndef` -> `Module.undefRef`
* `Sema.addUnsignedInt` -> `Module.intRef` (now also works for signed types)
The general pattern here is that any `Module.xyzValue` helper may also have a
corresponding `Module.xyzRef` helper, which just wraps the call in
`Air.internedToRef`.
as explainded at https://llvm.org/docs/LangRef.html#vector-type :
"In general vector elements are laid out in memory in the same way as array types.
Such an analogy works fine as long as the vector elements are byte sized.
However, when the elements of the vector aren’t byte sized it gets a bit more complicated.
One way to describe the layout is by describing what happens when a vector such
as <N x iM> is bitcasted to an integer type with N*M bits, and then following the
rules for storing such an integer to memory."
"When <N*M> isn’t evenly divisible by the byte size the exact memory layout
is unspecified (just like it is for an integral type of the same size)."
The changes to result locations and generic calls has caused mild
changes to some compile errors. Some are slightly better, some slightly
worse, but none of the changes are major.
AstGen provides all function call arguments with a result location,
referenced through the call instruction index. The idea is that this
should be the parameter type, but for `anytype` parameters, we use
generic poison, which is required to be handled correctly.
Previously, generic instantiations and inline calls worked by evaluating
all args in advance, before resolving generic parameter types. This
means any generic parameter (not just `anytype` ones) had generic poison
result types. This caused missing result locations in some cases.
Additionally, the generic instantiation logic caused `zirParam` to
analyze the argument types a second time before coercion. This meant
that for nominal types (struct/enum/etc), a *new* type was created,
distinct to the result type which was previously forwarded to the
argument expression.
This commit fixes both of these issues. Generic parameter type
resolution is now interleaved with argument analysis, so that we don't
have unnecessary generic poison types, and generic instantiation logic
now handles parameters itself rather than falling through to the
standard zirParam logic, so avoids duplicating the types.
Resolves: #16566Resolves: #16258Resolves: #16753
Well, this was a journey!
The original issue I was trying to fix is covered by the new behavior
test in array.zig: in essence, `ty` and `coerced_ty` result locations
were not correctly propagated.
While fixing this, I noticed a similar bug in struct inits: the type was
propagated to *fields* fine, but the actual struct init was
unnecessarily anonymous, which could lead to unnecessary copies. Note
that the behavior test added in struct.zig was already passing - the bug
here didn't change any easy-to-test behavior - but I figured I'd add it
anyway.
This is a little harder than it seems, because the result type may not
itself be an array/struct type: it could be an optional / error union
wrapper. A new ZIR instruction is introduced to unwrap these.
This is also made a little tricky by the fact that it's possible for
result types to be unknown at the time of semantic analysis (due to
`anytype` parameters), leading to generic poison. In these cases, we
must essentially downgrade to an anonymous initialization.
Fixing these issues exposed *another* bug, related to type resolution in
Sema. That issue is now tracked by #16603. As a temporary workaround for
this bug, a few result locations for builtin function operands have been
disabled in AstGen. This is technically a breaking change, but it's very
minor: I doubt it'll cause any breakage in the wild.
This function does not seem to differ in any interesting way from
`!typeRequiresComptime`, other than the `is_extern` param which is only
used in one place, and some differences did not seem correct anyway.
My reasoning for changing opaque types to be comptime-only is that
`explainWhyTypeIsComptime` is quite happy to explain why they are. :D
llvm: convert more things to use Builder
* finish converting intrinsics
* finish converting attributes
* finish converting instructions
* finish converting globals
* pass behavior tests with no dependence on the llvm api (`-fno-libllvm`)
* autodoc: init guide TOC work
* autodoc: working guides toc navigation
* autodoc: more improvements
* autodoc: ui refinements
* autodoc: new layout and init descriptions for namespaces in std.zig
* renamed enum_big_numbers_quoted option to enum_nonportable_numbers_as_strings
* updated stringify doc to mention the option
I also reversed the logic to determine whether an integer is nonportable,
it seemed easier to reason about.
I also took a stab at applying the new option to floats, but, I got stuck
at trying to print large floats, not sure if Zig supports that yet.
The previous magic numbers used `1 << 52`, which did not account for the
implicit leading one in the floating point format. The RFC is correct
when it uses an exponent of 53. Technically these exclusive endpoints
are also representable, but everyone including the RFC seems to use them
exclusively.
Also, delete special case optimizations related to the type which have
already been implemented in the zig compiler to produce comptime values
for tautological runtime comparisons.
std.json follows interoperability recommendations from RFC8259 to limit
JSON number values to those that fit inside an f64. However, since Zig
supports arbitrarily large JSON numbers, this breaks roundtrip data
congruence.
To appease both use cases, I've added an option `emit_big_numbers_quoted`
to StringifyOptions. It's disabled by default which preserves roundtrip
but can be enabled to favor interoperability.
Before this commit, you could use readElfDebugInfo independently with
one catch: the data is not freed since the deinitialization functions
for ModuleDebugInfo are private. This change makes them public so the
users of such function and similar can free the memory after the
debug symbols have been used.
consttest_step=b.step("test","Run all the tests");
consttest_step=b.step("test","Run all the tests");
constskip_install_lib_files=b.option(bool,"no-lib","skip copying of lib/ files and langref to installation prefix. Useful for development")orelsefalse;
constskip_install_lib_files=b.option(bool,"no-lib","skip copying of lib/ files and langref to installation prefix. Useful for development")orelsefalse;
constskip_install_langref=b.option(bool,"no-langref","skip copying of langref to the installation prefix")orelseskip_install_lib_files;
constskip_install_langref=b.option(bool,"no-langref","skip copying of langref to the installation prefix")orelseskip_install_lib_files;
constskip_install_autodocs=b.option(bool,"no-autodocs","skip copying of standard library autodocs to the installation prefix")orelseskip_install_lib_files;
conststd_docs=b.option(bool,"std-docs","include standard library autodocs")orelsefalse;
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.