Commit Graph

413 Commits

Author SHA1 Message Date
mlugg
37a9a4e0f1 compiler: refactor Zcu.File and path representation
This commit makes some big changes to how we track state for Zig source
files. In particular, it changes:

* How `File` tracks its path on-disk
* How AstGen discovers files
* How file-level errors are tracked
* How `builtin.zig` files and modules are created

The original motivation here was to address incremental compilation bugs
with the handling of files, such as #22696. To fix this, a few changes
are necessary.

Just like declarations may become unreferenced on an incremental update,
meaning we suppress analysis errors associated with them, it is also
possible for all imports of a file to be removed on an incremental
update, in which case file-level errors for that file should be
suppressed. As such, after AstGen, the compiler must traverse files
(starting from analysis roots) and discover the set of "live files" for
this update.

Additionally, the compiler's previous handling of retryable file errors
was not very good; the source location the error was reported as was
based only on the first discovered import of that file. This source
location also disappeared on future incremental updates. So, as a part
of the file traversal above, we also need to figure out the source
locations of imports which errors should be reported against.

Another observation I made is that the "file exists in multiple modules"
error was not implemented in a particularly good way (I get to say that
because I wrote it!). It was subject to races, where the order in which
different imports of a file were discovered affects both how errors are
printed, and which module the file is arbitrarily assigned, with the
latter in turn affecting which other files are considered for import.
The thing I realised here is that while the AstGen worker pool is
running, we cannot know for sure which module(s) a file is in; we could
always discover an import later which changes the answer.

So, here's how the AstGen workers have changed. We initially ensure that
`zcu.import_table` contains the root files for all modules in this Zcu,
even if we don't know any imports for them yet. Then, the AstGen
workers do not need to be aware of modules. Instead, they simply ignore
module imports, and only spin off more workers when they see a by-path
import.

During AstGen, we can't use module-root-relative paths, since we don't
know which modules files are in; but we don't want to unnecessarily use
absolute files either, because those are non-portable and can make
`error.NameTooLong` more likely. As such, I have introduced a new
abstraction, `Compilation.Path`. This type is a way of representing a
filesystem path which has a *canonical form*. The path is represented
relative to one of a few special directories: the lib directory, the
global cache directory, or the local cache directory. As a fallback, we
use absolute (or cwd-relative on WASI) paths. This is kind of similar to
`std.Build.Cache.Path` with a pre-defined list of possible
`std.Build.Cache.Directory`, but has stricter canonicalization rules
based on path resolution to make sure deduplicating files works
properly. A `Compilation.Path` can be trivially converted to a
`std.Build.Cache.Path` from a `Compilation`, but is smaller, has a
canonical form, and has a digest which will be consistent across
different compiler processes with the same lib and cache directories
(important when we serialize incremental compilation state in the
future). `Zcu.File` and `Zcu.EmbedFile` both contain a
`Compilation.Path`, which is used to access the file on-disk;
module-relative sub paths are used quite rarely (`EmbedFile` doesn't
even have one now for simplicity).

After the AstGen workers all complete, we know that any file which might
be imported is definitely in `import_table` and up-to-date. So, we
perform a single-threaded graph traversal; similar to what
`resolveReferences` plays for `AnalUnit`s, but for files instead. We
figure out which files are alive, and which module each file is in. If a
file turns out to be in multiple modules, we set a field on `Zcu` to
indicate this error. If a file is in a different module to a prior
update, we set a flag instructing `updateZirRefs` to invalidate all
dependencies on the file. This traversal also discovers "import errors";
these are errors associated with a specific `@import`. With Zig's
current design, there is only one possible error here: "import outside
of module root". This must be identified during this traversal instead
of during AstGen, because it depends on which module the file is in. I
tried also representing "module not found" errors in this same way, but
it turns out to be much more useful to report those in Sema, because of
use cases like optional dependencies where a module import is behind a
comptime-known build option.

For simplicity, `failed_files` now just maps to `?[]u8`, since the
source location is always the whole file. In fact, this allows removing
`LazySrcLoc.Offset.entire_file` completely, slightly simplifying some
error reporting logic. File-level errors are now directly built in the
`std.zig.ErrorBundle.Wip`. If the payload is not `null`, it is the
message for a retryable error (i.e. an error loading the source file),
and will be reported with a "file imported here" note pointing to the
import site discovered during the single-threaded file traversal.

The last piece of fallout here is how `Builtin` works. Rather than
constructing "builtin" modules when creating `Package.Module`s, they are
now constructed on-the-fly by `Zcu`. The map `Zcu.builtin_modules` maps
from digests to `*Package.Module`s. These digests are abstract hashes of
the `Builtin` value; i.e. all of the options which are placed into
"builtin.zig". During the file traversal, we populate `builtin_modules`
as needed, so that when we see this imports in Sema, we just grab the
relevant entry from this map. This eliminates a bunch of awkward state
tracking during construction of the module graph. It's also now clearer
exactly what options the builtin module has, since previously it
inherited some options arbitrarily from the first-created module with
that "builtin" module!

The user-visible effects of this commit are:
* retryable file errors are now consistently reported against the whole
  file, with a note pointing to a live import of that file
* some theoretical bugs where imports are wrongly considered distinct
  (when the import path moves out of the cwd and then back in) are fixed
* some consistency issues with how file-level errors are reported are
  fixed; these errors will now always be printed in the same order
  regardless of how the AstGen pass assigns file indices
* incremental updates do not print retryable file errors differently
  between updates or depending on file structure/contents
* incremental updates support files changing modules
* incremental updates support files becoming unreferenced

Resolves: #22696
2025-05-18 17:37:02 +01:00
Alex Rønne Petersen
bc3c50c21e Merge pull request #23700 from sorairolake/rename-trims
chore(std.mem): Rename `trimLeft` and `trimRight` to `trimStart` and `trimEnd`
2025-05-12 17:11:52 +02:00
Alex Rønne Petersen
833d4c9ce4 Merge pull request #23835 from alexrp/freebsd-libc
Support dynamically-linked FreeBSD libc when cross-compiling
2025-05-12 01:19:23 +02:00
Alex Rønne Petersen
837e0f9c37 std.Target: Remove ObjectFormat.nvptx (and associated linker code).
Textual PTX is just assembly language like any other. And if we do ever add
support for emitting PTX object files after reverse engineering the bytecode
format, we'd be emitting ELF files like the CUDA toolchain. So there's really no
need for a special ObjectFormat tag here, nor linker code that treats it as a
distinct format.
2025-05-10 12:21:57 +02:00
Alex Rønne Petersen
610d3cf9de compiler: Move vendored library support to libs subdirectory. 2025-05-10 12:19:26 +02:00
Shun Sakai
5fc4448e45 chore(std.mem): Rename trimLeft and trimRight
Rename `trimLeft` to `trimStart`, and `trimRight` to `trimEnd`.
`trimLeft` and `trimRight` functions remain as deprecated aliases for
these new names.
2025-04-27 18:03:59 +09:00
Alex Rønne Petersen
30e254fc31 link: Stub out GOFF/XCOFF linker code based on LLVM.
This allows emitting object files for s390x-zos (GOFF) and powerpc(64)-aix
(XCOFF).

Note that GOFF emission in LLVM is still being worked on upstream for LLVM 21;
the resulting object files are useless right now. Also, -fstrip is required, or
LLVM will SIGSEGV during DWARF emission.
2025-04-27 03:52:52 +02:00
Jacob Young
6705cbd5eb codegen: fix packed byte-aligned relocations
Closes #23131
2025-03-23 18:35:34 -04:00
mlugg
9f235a105b link: mark prelink tasks as procesed under -fno-emit-bin
The old logic only decremented `remaining_prelink_tasks` if `bin_file`
was not `null`. This meant that on `-fno-emit-bin` builds with
registered prelink tasks (e.g. C source files), we exited from
`Compilation.performAllTheWorkInner` early, assuming a prelink error.

Instead, when `bin_file` is `null`, we still decrement
`remaining_prelink_tasks`; we just don't do any actual work.

Resolves: #22682
2025-03-22 21:44:46 -04:00
mlugg
725c825829 link: make sure MachO closes the damn files
Windows is a ridiculous operating system designed by toddlers, and so
requires us to close all file handles in the `tmp/xxxxxxx` cache dir
before renaming it into `o/xxxxxxx`. We have a hack in place to handle
this for the main output file, but the MachO linker also outputs a file
with debug symbols, and we weren't closing it! This led to a fuckton of
CI failures when we enabled `.whole` cache mode by default for
self-hosted backends.

thanks jacob for figuring this out while i sat there
2025-03-02 16:39:18 -05:00
David Rubin
931178494f Compilation: correct when to include ubsan 2025-02-25 11:22:33 -08:00
David Rubin
95720f007b move libubsan to lib/ and integrate it into -fubsan-rt 2025-02-25 11:22:33 -08:00
Andrew Kelley
61b69a418d Merge pull request #22659 from ifreund/linker-script-fix
link: fix ambiguous names in linker scripts
2025-02-22 18:18:24 -05:00
Alex Rønne Petersen
f87b443af1 link.MachO: Add support for the -x flag (discard local symbols).
This can also be extended to ELF later as it means roughly the same thing there.

This addresses the main issue in #21721 but as I don't have a macOS machine to
do further testing on, I can't confirm whether zig cc is able to pass the entire
cgo test suite after this commit. It can, however, cross-compile a basic program
that uses cgo to x86_64-macos-none which previously failed due to lack of -x
support. Unlike previously, the resulting symbol table does not contain local
symbols (such as C static functions).

I believe this satisfies the related donor bounty: https://ziglang.org/news/second-donor-bounty
2025-02-22 06:35:19 +01:00
Alex Rønne Petersen
481b7bf3f0 std.Target: Remove functions that just wrap component functions.
Functions like isMinGW() and isGnuLibC() have a good reason to exist: They look
at multiple components of the target. But functions like isWasm(), isDarwin(),
isGnu(), etc only exist to save 4-8 characters. I don't think this is a good
enough reason to keep them, especially given that:

* It's not immediately obvious to a reader whether target.isDarwin() means the
  same thing as target.os.tag.isDarwin() precisely because isMinGW() and similar
  functions *do* look at multiple components.
* It's not clear where we would draw the line. The logical conclusion before
  this commit would be to also wrap Arch.isX86(), Os.Tag.isSolarish(),
  Abi.isOpenHarmony(), etc... this obviously quickly gets out of hand.
* It's nice to just have a single correct way of doing something.
2025-02-17 19:18:19 +01:00
Isaac Freund
0499c731ea link: simplify control flow
This refactor was left out of the previous commit to make the diff less
noisy and easier to review. There should be no change in behavior.
2025-02-10 23:24:32 +01:00
Isaac Freund
819716b59f link: fix ambiguous names in linker scripts
Currently zig fails to build while linking the system LLVM/C++ libraries
on my Chimera Linux system due to the fact that libc++.so is a linker
script with the following contents:

INPUT(libc++.so.1 -lc++abi -lunwind)

Prior to this commit, zig would try to convert "ambiguous names" in
linker scripts such as libc++.so.1 in this example into -lfoo style
flags. This fails in this case due to the so version number as zig
checks for exactly the .so suffix.

Furthermore, I do not think that this conversion is semantically correct
since converting libfoo.so to -lfoo could theoretically end up resulting
in libfoo.a getting linked which seems wrong when a different file is
specified in the linker script.

With this patch, this attempted conversion is removed. Instead, zig
always first checks if the exact file/path in the linker script exists
relative to the current working directory.

If the file is classified as a library (including versioned shared
objects such as libfoo.so.1), zig then falls back to checking if
the exact file/path in the linker script exists relative to each
directory in the library search path, selecting the first match or
erroring out if none is found.

This behavior fixes the regression that prevents building zig while
linking the system LLVM/C++ libraries on Chimera Linux.
2025-02-10 23:19:48 +01:00
Meghan Denny
9142482372 std.ArrayList: popOrNull() -> pop() [v2] (#22720) 2025-02-10 04:21:31 +00:00
mlugg
d3ca10d5d8 Zcu: remove *_loaded fields on File
Instead, `source`, `tree`, and `zir` should all be optional. This is
precisely what we're actually trying to model here; and `File` isn't
optimized for memory consumption or serializability anyway, so it's fine
to use a couple of extra bytes on actual optionals here.
2025-02-04 16:20:29 +00:00
Andrew Kelley
94648a0383 fix merge conflicts with updating line numbers 2025-01-15 15:11:36 -08:00
Andrew Kelley
5b18af85cb type checking for synthetic functions 2025-01-15 15:11:36 -08:00
Andrew Kelley
4fccb5ae7a wasm linker: improve error messages by making source locations more lazy 2025-01-15 15:11:36 -08:00
Andrew Kelley
1a4c5837fe wasm linker: fix crashes when parsing compiler_rt 2025-01-15 15:11:36 -08:00
Andrew Kelley
d1cde847a3 implement the prelink phase in the frontend
this strategy uses a "postponed" queue to handle codegen tasks that
spawn too early. there's probably a better way.
2025-01-15 15:11:36 -08:00
Andrew Kelley
070b973c4a wasm linker: allow undefined imports when lib name is provided
and expose object_host_name as an option for setting the lib name for
object files, since the wasm linking standards don't specify a way to do
it.
2025-01-15 15:11:36 -08:00
Andrew Kelley
edd592d371 fix compilation when enabling llvm 2025-01-15 15:11:35 -08:00
Andrew Kelley
943dac3e85 compiler: add type safety for export indices 2025-01-15 15:11:35 -08:00
Andrew Kelley
9bf715de74 rework error handling in the backends 2025-01-15 15:11:35 -08:00
Andrew Kelley
77accf597d elf linker: conform to explicit error sets 2025-01-15 15:11:35 -08:00
Andrew Kelley
da25ed95fc macho linker conforms to explicit error sets, again 2025-01-15 15:11:35 -08:00
Andrew Kelley
16180f525a macho linker: conform to explicit error sets
Makes linker functions have small error sets, required to report
diagnostics properly rather than having a massive error set that has a
lot of codes.

Other linker implementations are not ported yet.

Also the branch is not passing semantic analysis yet.
2025-01-15 15:11:35 -08:00
Andrew Kelley
795e7c64d5 wasm linker: aggressive DODification
The goals of this branch are to:
* compile faster when using the wasm linker and backend
* enable saving compiler state by directly copying in-memory linker
  state to disk.
* more efficient compiler memory utilization
* introduce integer type safety to wasm linker code
* generate better WebAssembly code
* fully participate in incremental compilation
* do as much work as possible outside of flush(), while continuing to do
  linker garbage collection.
* avoid unnecessary heap allocations
* avoid unnecessary indirect function calls

In order to accomplish this goals, this removes the ZigObject
abstraction, as well as Symbol and Atom. These abstractions resulted
in overly generic code, doing unnecessary work, and needless
complications that simply go away by creating a better in-memory data
model and emitting more things lazily.

For example, this makes wasm codegen emit MIR which is then lowered to
wasm code during linking, with optimal function indexes etc, or
relocations are emitted if outputting an object. Previously, this would
always emit relocations, which are fully unnecessary when emitting an
executable, and required all function calls to use the maximum size LEB
encoding.

This branch introduces the concept of the "prelink" phase which occurs
after all object files have been parsed, but before any Zcu updates are
sent to the linker. This allows the linker to fully parse all objects
into a compact memory model, which is guaranteed to be complete when Zcu
code is generated.

This commit is not a complete implementation of all these goals; it is
not even passing semantic analysis.
2025-01-15 15:11:35 -08:00
mlugg
065e10c95c link: new incremental line number update API 2025-01-05 02:20:56 +00:00
mlugg
3afda4322c compiler: analyze type and value of global declaration separately
This commit separates semantic analysis of the annotated type vs value
of a global declaration, therefore allowing recursive and mutually
recursive values to be declared.

Every `Nav` which undergoes analysis now has *two* corresponding
`AnalUnit`s: `.{ .nav_val = n }` and `.{ .nav_ty = n }`. The `nav_val`
unit is responsible for *fully resolving* the `Nav`: determining its
value, linksection, addrspace, etc. The `nav_ty` unit, on the other
hand, resolves only the information necessary to construct a *pointer*
to the `Nav`: its type, addrspace, etc. (It does also analyze its
linksection, but that could be moved to `nav_val` I think; it doesn't
make any difference).

Analyzing a `nav_ty` for a declaration with no type annotation will just
mark a dependency on the `nav_val`, analyze it, and finish. Conversely,
analyzing a `nav_val` for a declaration *with* a type annotation will
first mark a dependency on the `nav_ty` and analyze it, using this as
the result type when evaluating the value body.

The `nav_val` and `nav_ty` units always have references to one another:
so, if a `Nav`'s type is referenced, its value implicitly is too, and
vice versa. However, these dependencies are trivial, so, to save memory,
are only known implicitly by logic in `resolveReferences`.

In general, analyzing ZIR `decl_val` will only analyze `nav_ty` of the
corresponding `Nav`. There are two exceptions to this. If the
declaration is an `extern` declaration, then we immediately ensure the
`Nav` value is resolved (which doesn't actually require any more
analysis, since such a declaration has no value body anyway).
Additionally, if the resolved type has type tag `.@"fn"`, we again
immediately resolve the `Nav` value. The latter restriction is in place
for two reasons:

* Functions are special, in that their externs are allowed to trivially
  alias; i.e. with a declaration `extern fn foo(...)`, you can write
  `const bar = foo;`. This is not allowed for non-function externs, and
  it means that function types are the only place where it is possible
  for a declaration `Nav` to have a `.@"extern"` value without actually
  being declared `extern`. We need to identify this situation
  immediately so that the `decl_ref` can create a pointer to the *real*
  extern `Nav`, not this alias.
* In certain situations, such as taking a pointer to a `Nav`, Sema needs
  to queue analysis of a runtime function if the value is a function. To
  do this, the function value needs to be known, so we need to resolve
  the value immediately upon `&foo` where `foo` is a function.

This restriction is simple to codify into the eventual language
specification, and doesn't limit the utility of this feature in
practice.

A consequence of this commit is that codegen and linking logic needs to
be more careful when looking at `Nav`s. In general:

* When `updateNav` or `updateFunc` is called, it is safe to assume that
  the `Nav` being updated (the owner `Nav` for `updateFunc`) is fully
  resolved.
* Any `Nav` whose value is/will be an `@"extern"` or a function is fully
  resolved; see `Nav.getExtern` for a helper for a common case here.
* Any other `Nav` may only have its type resolved.

This didn't seem to be too tricky to satisfy in any of the existing
codegen/linker backends.

Resolves: #131
2024-12-24 02:18:41 +00:00
Jacob Young
5c76e08f49 lldb: add pretty printer for intern pool indices 2024-12-20 22:51:20 -05:00
Andrew Kelley
d37ee79535 std.Build.Cache.hit: more discipline in error handling
Previous commits

2b0929929d
4ea2f441df

had this text:

> There are no dir components, so you would think that this was
> unreachable, however we have observed on macOS two processes racing to
> do openat() with O_CREAT manifest in ENOENT.

This appears to have been a misunderstanding based on the issue
report #12138 and corresponding PR #12139 in which the steps to
reproduce removed the cache directory in a loop which also executed
detached Zig compiler processes.

There is no evidence for the macOS kernel bug however the ENOENT is
easily explained by the removal of the cache directory.

This commit reverts those commits, ultimately reporting the ENOENT as an
error rather than repeating the create file operation. However this
commit also adds an explicit error set to `std.Build.Cache.hit` as well
as changing the `failed_file_index` to a proper diagnostic field that
fully communicates what failed, leading to more informative error
messages on failure to check the cache.

The equivalent failure when occuring for AstGen performs a fatal process
kill, reasoning being that the compiler has an invariant of the cache
directory not being yanked out from underneath it while executing. This
could be made a more granular error in the future but I suspect such
thing is not valuable to pursue.

Related to #18340 but does not solve it.
2024-12-10 18:11:12 -08:00
Andrew Kelley
11bf2d92de diversify "unable to spawn" failure messages
to help understand where a spurious failure is occurring
2024-11-26 13:56:40 -08:00
Andrew Kelley
f2dcfe0e40 link.File.Wasm: parse inputs in compilation pipeline
Primarily, this moves linker input parsing from flush() into the linker
task queue, which is executed simultaneously with the frontend.

I also made it avoid redundantly opening the same archive file N times
for each object file inside. Furthermore, hard code fixed buffer stream
rather than using a generic stream type.

Finally, I fixed the error handling of the Wasm.Archive.parse function.
Please pay attention to this pattern of returning a struct rather than
accepting a mutable struct as an argument. This ensures function-level
atomicity and makes resource management straightforward.

Deletes the file and path fields from Archive and Object.

Removed a well-meaning but ultimately misguided suggestion about how to
think about ZigObject since thinking about it that way has led to
problematic anti-DOD patterns.
2024-10-30 23:43:53 -07:00
Alex Rønne Petersen
16b87f7082 link: Fix archive format selection for some OSs.
* AIX has its own bespoke format.
* Handle all Apple platforms.
* FreeBSD and OpenBSD both use the GNU format in LLVM.
* Windows has since been switched to the COFF format by default in LLVM.
2024-10-31 01:33:49 +01:00
Alex Rønne Petersen
f1f804e532 zig_llvm: Reduce our exposure to LLVM API breakage.
LLVM recently introduced new Triple::ArchType members in 19.1.3 which broke our
static assertions in zig_llvm.cpp. When implementing a fix for that, I realized
that we don't even need a lot of the stuff we have in zig_llvm.(cpp,h) anymore.
This commit trims the interface down considerably.
2024-10-31 01:27:22 +01:00
Andrew Kelley
504ad56815 link.flushTaskQueue: move safety lock
The safety lock needs to happen after check()
2024-10-23 16:27:39 -07:00
Andrew Kelley
ba71079837 combine codegen work queue and linker task queue
these tasks have some shared data dependencies so they cannot be done
simultaneously. Future work should untangle these data dependencies so
that more can be done in parallel.

for now this commit ensures correctness by making linker input parsing
and codegen tasks part of the same queue.
2024-10-23 16:27:39 -07:00
Andrew Kelley
336466c9df glibc sometimes makes archives be ld scripts
it is incredible how many bad ideas glibc is bundled into one project.
2024-10-23 16:27:39 -07:00
Andrew Kelley
5d75d8f6fc also find static libc files on the host
and don't look for glibc files on windows
2024-10-23 16:27:39 -07:00
Andrew Kelley
3cc19cd865 better error messages 2024-10-23 16:27:38 -07:00
Andrew Kelley
c2898c436f branch fixes 2024-10-23 16:27:38 -07:00
Andrew Kelley
5ca54036ca move linker input file parsing to the compilation pipeline 2024-10-23 16:27:38 -07:00
Andrew Kelley
cbcd67ea90 link.MachO: fix missing input classification 2024-10-23 16:27:38 -07:00
Andrew Kelley
e2a71b37d8 fix MachO linking regression 2024-10-23 16:27:38 -07:00
Andrew Kelley
8cfe303da9 fix resolving link inputs 2024-10-23 16:27:38 -07:00