motiejus/zig - zig - gitea: Gitea Service

Author	SHA1	Message	Date
mlugg	37a9a4e0f1	compiler: refactor `Zcu.File` and path representation This commit makes some big changes to how we track state for Zig source files. In particular, it changes: * How `File` tracks its path on-disk * How AstGen discovers files * How file-level errors are tracked * How `builtin.zig` files and modules are created The original motivation here was to address incremental compilation bugs with the handling of files, such as #22696. To fix this, a few changes are necessary. Just like declarations may become unreferenced on an incremental update, meaning we suppress analysis errors associated with them, it is also possible for all imports of a file to be removed on an incremental update, in which case file-level errors for that file should be suppressed. As such, after AstGen, the compiler must traverse files (starting from analysis roots) and discover the set of "live files" for this update. Additionally, the compiler's previous handling of retryable file errors was not very good; the source location the error was reported as was based only on the first discovered import of that file. This source location also disappeared on future incremental updates. So, as a part of the file traversal above, we also need to figure out the source locations of imports which errors should be reported against. Another observation I made is that the "file exists in multiple modules" error was not implemented in a particularly good way (I get to say that because I wrote it!). It was subject to races, where the order in which different imports of a file were discovered affects both how errors are printed, and which module the file is arbitrarily assigned, with the latter in turn affecting which other files are considered for import. The thing I realised here is that while the AstGen worker pool is running, we cannot know for sure which module(s) a file is in; we could always discover an import later which changes the answer. So, here's how the AstGen workers have changed. We initially ensure that `zcu.import_table` contains the root files for all modules in this Zcu, even if we don't know any imports for them yet. Then, the AstGen workers do not need to be aware of modules. Instead, they simply ignore module imports, and only spin off more workers when they see a by-path import. During AstGen, we can't use module-root-relative paths, since we don't know which modules files are in; but we don't want to unnecessarily use absolute files either, because those are non-portable and can make `error.NameTooLong` more likely. As such, I have introduced a new abstraction, `Compilation.Path`. This type is a way of representing a filesystem path which has a canonical form. The path is represented relative to one of a few special directories: the lib directory, the global cache directory, or the local cache directory. As a fallback, we use absolute (or cwd-relative on WASI) paths. This is kind of similar to `std.Build.Cache.Path` with a pre-defined list of possible `std.Build.Cache.Directory`, but has stricter canonicalization rules based on path resolution to make sure deduplicating files works properly. A `Compilation.Path` can be trivially converted to a `std.Build.Cache.Path` from a `Compilation`, but is smaller, has a canonical form, and has a digest which will be consistent across different compiler processes with the same lib and cache directories (important when we serialize incremental compilation state in the future). `Zcu.File` and `Zcu.EmbedFile` both contain a `Compilation.Path`, which is used to access the file on-disk; module-relative sub paths are used quite rarely (`EmbedFile` doesn't even have one now for simplicity). After the AstGen workers all complete, we know that any file which might be imported is definitely in `import_table` and up-to-date. So, we perform a single-threaded graph traversal; similar to what `resolveReferences` plays for `AnalUnit`s, but for files instead. We figure out which files are alive, and which module each file is in. If a file turns out to be in multiple modules, we set a field on `Zcu` to indicate this error. If a file is in a different module to a prior update, we set a flag instructing `updateZirRefs` to invalidate all dependencies on the file. This traversal also discovers "import errors"; these are errors associated with a specific `@import`. With Zig's current design, there is only one possible error here: "import outside of module root". This must be identified during this traversal instead of during AstGen, because it depends on which module the file is in. I tried also representing "module not found" errors in this same way, but it turns out to be much more useful to report those in Sema, because of use cases like optional dependencies where a module import is behind a comptime-known build option. For simplicity, `failed_files` now just maps to `?[]u8`, since the source location is always the whole file. In fact, this allows removing `LazySrcLoc.Offset.entire_file` completely, slightly simplifying some error reporting logic. File-level errors are now directly built in the `std.zig.ErrorBundle.Wip`. If the payload is not `null`, it is the message for a retryable error (i.e. an error loading the source file), and will be reported with a "file imported here" note pointing to the import site discovered during the single-threaded file traversal. The last piece of fallout here is how `Builtin` works. Rather than constructing "builtin" modules when creating `Package.Module`s, they are now constructed on-the-fly by `Zcu`. The map `Zcu.builtin_modules` maps from digests to `Package.Module`s. These digests are abstract hashes of the `Builtin` value; i.e. all of the options which are placed into "builtin.zig". During the file traversal, we populate `builtin_modules` as needed, so that when we see this imports in Sema, we just grab the relevant entry from this map. This eliminates a bunch of awkward state tracking during construction of the module graph. It's also now clearer exactly what options the builtin module has, since previously it inherited some options arbitrarily from the first-created module with that "builtin" module! The user-visible effects of this commit are: retryable file errors are now consistently reported against the whole file, with a note pointing to a live import of that file * some theoretical bugs where imports are wrongly considered distinct (when the import path moves out of the cwd and then back in) are fixed * some consistency issues with how file-level errors are reported are fixed; these errors will now always be printed in the same order regardless of how the AstGen pass assigns file indices * incremental updates do not print retryable file errors differently between updates or depending on file structure/contents * incremental updates support files changing modules * incremental updates support files becoming unreferenced Resolves: #22696	2025-05-18 17:37:02 +01:00
Alex Rønne Petersen	bc3c50c21e	Merge pull request #23700 from sorairolake/rename-trims chore(std.mem): Rename `trimLeft` and `trimRight` to `trimStart` and `trimEnd`	2025-05-12 17:11:52 +02:00
Alex Rønne Petersen	833d4c9ce4	Merge pull request #23835 from alexrp/freebsd-libc Support dynamically-linked FreeBSD libc when cross-compiling	2025-05-12 01:19:23 +02:00
Alex Rønne Petersen	837e0f9c37	std.Target: Remove ObjectFormat.nvptx (and associated linker code). Textual PTX is just assembly language like any other. And if we do ever add support for emitting PTX object files after reverse engineering the bytecode format, we'd be emitting ELF files like the CUDA toolchain. So there's really no need for a special ObjectFormat tag here, nor linker code that treats it as a distinct format.	2025-05-10 12:21:57 +02:00
Alex Rønne Petersen	610d3cf9de	compiler: Move vendored library support to `libs` subdirectory.	2025-05-10 12:19:26 +02:00
Shun Sakai	5fc4448e45	chore(std.mem): Rename `trimLeft` and `trimRight` Rename `trimLeft` to `trimStart`, and `trimRight` to `trimEnd`. `trimLeft` and `trimRight` functions remain as deprecated aliases for these new names.	2025-04-27 18:03:59 +09:00
Alex Rønne Petersen	30e254fc31	link: Stub out GOFF/XCOFF linker code based on LLVM. This allows emitting object files for s390x-zos (GOFF) and powerpc(64)-aix (XCOFF). Note that GOFF emission in LLVM is still being worked on upstream for LLVM 21; the resulting object files are useless right now. Also, -fstrip is required, or LLVM will SIGSEGV during DWARF emission.	2025-04-27 03:52:52 +02:00
Jacob Young	6705cbd5eb	codegen: fix packed byte-aligned relocations Closes #23131	2025-03-23 18:35:34 -04:00
mlugg	9f235a105b	link: mark prelink tasks as procesed under `-fno-emit-bin` The old logic only decremented `remaining_prelink_tasks` if `bin_file` was not `null`. This meant that on `-fno-emit-bin` builds with registered prelink tasks (e.g. C source files), we exited from `Compilation.performAllTheWorkInner` early, assuming a prelink error. Instead, when `bin_file` is `null`, we still decrement `remaining_prelink_tasks`; we just don't do any actual work. Resolves: #22682	2025-03-22 21:44:46 -04:00
mlugg	725c825829	link: make sure MachO closes the damn files Windows is a ridiculous operating system designed by toddlers, and so requires us to close all file handles in the `tmp/xxxxxxx` cache dir before renaming it into `o/xxxxxxx`. We have a hack in place to handle this for the main output file, but the MachO linker also outputs a file with debug symbols, and we weren't closing it! This led to a fuckton of CI failures when we enabled `.whole` cache mode by default for self-hosted backends. thanks jacob for figuring this out while i sat there	2025-03-02 16:39:18 -05:00
David Rubin	931178494f	Compilation: correct when to include ubsan	2025-02-25 11:22:33 -08:00
David Rubin	95720f007b	move libubsan to `lib/` and integrate it into `-fubsan-rt`	2025-02-25 11:22:33 -08:00
Andrew Kelley	61b69a418d	Merge pull request #22659 from ifreund/linker-script-fix link: fix ambiguous names in linker scripts	2025-02-22 18:18:24 -05:00
Alex Rønne Petersen	f87b443af1	link.MachO: Add support for the -x flag (discard local symbols). This can also be extended to ELF later as it means roughly the same thing there. This addresses the main issue in #21721 but as I don't have a macOS machine to do further testing on, I can't confirm whether zig cc is able to pass the entire cgo test suite after this commit. It can, however, cross-compile a basic program that uses cgo to x86_64-macos-none which previously failed due to lack of -x support. Unlike previously, the resulting symbol table does not contain local symbols (such as C static functions). I believe this satisfies the related donor bounty: https://ziglang.org/news/second-donor-bounty	2025-02-22 06:35:19 +01:00
Alex Rønne Petersen	481b7bf3f0	std.Target: Remove functions that just wrap component functions. Functions like isMinGW() and isGnuLibC() have a good reason to exist: They look at multiple components of the target. But functions like isWasm(), isDarwin(), isGnu(), etc only exist to save 4-8 characters. I don't think this is a good enough reason to keep them, especially given that: * It's not immediately obvious to a reader whether target.isDarwin() means the same thing as target.os.tag.isDarwin() precisely because isMinGW() and similar functions do look at multiple components. * It's not clear where we would draw the line. The logical conclusion before this commit would be to also wrap Arch.isX86(), Os.Tag.isSolarish(), Abi.isOpenHarmony(), etc... this obviously quickly gets out of hand. * It's nice to just have a single correct way of doing something.	2025-02-17 19:18:19 +01:00
Isaac Freund	0499c731ea	link: simplify control flow This refactor was left out of the previous commit to make the diff less noisy and easier to review. There should be no change in behavior.	2025-02-10 23:24:32 +01:00
Isaac Freund	819716b59f	link: fix ambiguous names in linker scripts Currently zig fails to build while linking the system LLVM/C++ libraries on my Chimera Linux system due to the fact that libc++.so is a linker script with the following contents: INPUT(libc++.so.1 -lc++abi -lunwind) Prior to this commit, zig would try to convert "ambiguous names" in linker scripts such as libc++.so.1 in this example into -lfoo style flags. This fails in this case due to the so version number as zig checks for exactly the .so suffix. Furthermore, I do not think that this conversion is semantically correct since converting libfoo.so to -lfoo could theoretically end up resulting in libfoo.a getting linked which seems wrong when a different file is specified in the linker script. With this patch, this attempted conversion is removed. Instead, zig always first checks if the exact file/path in the linker script exists relative to the current working directory. If the file is classified as a library (including versioned shared objects such as libfoo.so.1), zig then falls back to checking if the exact file/path in the linker script exists relative to each directory in the library search path, selecting the first match or erroring out if none is found. This behavior fixes the regression that prevents building zig while linking the system LLVM/C++ libraries on Chimera Linux.	2025-02-10 23:19:48 +01:00
Meghan Denny	9142482372	std.ArrayList: popOrNull() -> pop() [v2] (#22720 )	2025-02-10 04:21:31 +00:00
mlugg	d3ca10d5d8	Zcu: remove `*_loaded` fields on `File` Instead, `source`, `tree`, and `zir` should all be optional. This is precisely what we're actually trying to model here; and `File` isn't optimized for memory consumption or serializability anyway, so it's fine to use a couple of extra bytes on actual optionals here.	2025-02-04 16:20:29 +00:00
Andrew Kelley	94648a0383	fix merge conflicts with updating line numbers	2025-01-15 15:11:36 -08:00
Andrew Kelley	5b18af85cb	type checking for synthetic functions	2025-01-15 15:11:36 -08:00
Andrew Kelley	4fccb5ae7a	wasm linker: improve error messages by making source locations more lazy	2025-01-15 15:11:36 -08:00
Andrew Kelley	1a4c5837fe	wasm linker: fix crashes when parsing compiler_rt	2025-01-15 15:11:36 -08:00
Andrew Kelley	d1cde847a3	implement the prelink phase in the frontend this strategy uses a "postponed" queue to handle codegen tasks that spawn too early. there's probably a better way.	2025-01-15 15:11:36 -08:00
Andrew Kelley	070b973c4a	wasm linker: allow undefined imports when lib name is provided and expose object_host_name as an option for setting the lib name for object files, since the wasm linking standards don't specify a way to do it.	2025-01-15 15:11:36 -08:00
Andrew Kelley	edd592d371	fix compilation when enabling llvm	2025-01-15 15:11:35 -08:00
Andrew Kelley	943dac3e85	compiler: add type safety for export indices	2025-01-15 15:11:35 -08:00
Andrew Kelley	9bf715de74	rework error handling in the backends	2025-01-15 15:11:35 -08:00
Andrew Kelley	77accf597d	elf linker: conform to explicit error sets	2025-01-15 15:11:35 -08:00
Andrew Kelley	da25ed95fc	macho linker conforms to explicit error sets, again	2025-01-15 15:11:35 -08:00
Andrew Kelley	16180f525a	macho linker: conform to explicit error sets Makes linker functions have small error sets, required to report diagnostics properly rather than having a massive error set that has a lot of codes. Other linker implementations are not ported yet. Also the branch is not passing semantic analysis yet.	2025-01-15 15:11:35 -08:00
Andrew Kelley	795e7c64d5	wasm linker: aggressive DODification The goals of this branch are to: * compile faster when using the wasm linker and backend * enable saving compiler state by directly copying in-memory linker state to disk. * more efficient compiler memory utilization * introduce integer type safety to wasm linker code * generate better WebAssembly code * fully participate in incremental compilation * do as much work as possible outside of flush(), while continuing to do linker garbage collection. * avoid unnecessary heap allocations * avoid unnecessary indirect function calls In order to accomplish this goals, this removes the ZigObject abstraction, as well as Symbol and Atom. These abstractions resulted in overly generic code, doing unnecessary work, and needless complications that simply go away by creating a better in-memory data model and emitting more things lazily. For example, this makes wasm codegen emit MIR which is then lowered to wasm code during linking, with optimal function indexes etc, or relocations are emitted if outputting an object. Previously, this would always emit relocations, which are fully unnecessary when emitting an executable, and required all function calls to use the maximum size LEB encoding. This branch introduces the concept of the "prelink" phase which occurs after all object files have been parsed, but before any Zcu updates are sent to the linker. This allows the linker to fully parse all objects into a compact memory model, which is guaranteed to be complete when Zcu code is generated. This commit is not a complete implementation of all these goals; it is not even passing semantic analysis.	2025-01-15 15:11:35 -08:00
mlugg	065e10c95c	link: new incremental line number update API	2025-01-05 02:20:56 +00:00
mlugg	3afda4322c	compiler: analyze type and value of global declaration separately This commit separates semantic analysis of the annotated type vs value of a global declaration, therefore allowing recursive and mutually recursive values to be declared. Every `Nav` which undergoes analysis now has two corresponding `AnalUnit`s: `.{ .nav_val = n }` and `.{ .nav_ty = n }`. The `nav_val` unit is responsible for fully resolving the `Nav`: determining its value, linksection, addrspace, etc. The `nav_ty` unit, on the other hand, resolves only the information necessary to construct a pointer to the `Nav`: its type, addrspace, etc. (It does also analyze its linksection, but that could be moved to `nav_val` I think; it doesn't make any difference). Analyzing a `nav_ty` for a declaration with no type annotation will just mark a dependency on the `nav_val`, analyze it, and finish. Conversely, analyzing a `nav_val` for a declaration with a type annotation will first mark a dependency on the `nav_ty` and analyze it, using this as the result type when evaluating the value body. The `nav_val` and `nav_ty` units always have references to one another: so, if a `Nav`'s type is referenced, its value implicitly is too, and vice versa. However, these dependencies are trivial, so, to save memory, are only known implicitly by logic in `resolveReferences`. In general, analyzing ZIR `decl_val` will only analyze `nav_ty` of the corresponding `Nav`. There are two exceptions to this. If the declaration is an `extern` declaration, then we immediately ensure the `Nav` value is resolved (which doesn't actually require any more analysis, since such a declaration has no value body anyway). Additionally, if the resolved type has type tag `.@"fn"`, we again immediately resolve the `Nav` value. The latter restriction is in place for two reasons: * Functions are special, in that their externs are allowed to trivially alias; i.e. with a declaration `extern fn foo(...)`, you can write `const bar = foo;`. This is not allowed for non-function externs, and it means that function types are the only place where it is possible for a declaration `Nav` to have a `.@"extern"` value without actually being declared `extern`. We need to identify this situation immediately so that the `decl_ref` can create a pointer to the real extern `Nav`, not this alias. * In certain situations, such as taking a pointer to a `Nav`, Sema needs to queue analysis of a runtime function if the value is a function. To do this, the function value needs to be known, so we need to resolve the value immediately upon `&foo` where `foo` is a function. This restriction is simple to codify into the eventual language specification, and doesn't limit the utility of this feature in practice. A consequence of this commit is that codegen and linking logic needs to be more careful when looking at `Nav`s. In general: * When `updateNav` or `updateFunc` is called, it is safe to assume that the `Nav` being updated (the owner `Nav` for `updateFunc`) is fully resolved. * Any `Nav` whose value is/will be an `@"extern"` or a function is fully resolved; see `Nav.getExtern` for a helper for a common case here. * Any other `Nav` may only have its type resolved. This didn't seem to be too tricky to satisfy in any of the existing codegen/linker backends. Resolves: #131	2024-12-24 02:18:41 +00:00
Jacob Young	5c76e08f49	lldb: add pretty printer for intern pool indices	2024-12-20 22:51:20 -05:00
Andrew Kelley	d37ee79535	std.Build.Cache.hit: more discipline in error handling Previous commits `2b0929929d` `4ea2f441df` had this text: > There are no dir components, so you would think that this was > unreachable, however we have observed on macOS two processes racing to > do openat() with O_CREAT manifest in ENOENT. This appears to have been a misunderstanding based on the issue report #12138 and corresponding PR #12139 in which the steps to reproduce removed the cache directory in a loop which also executed detached Zig compiler processes. There is no evidence for the macOS kernel bug however the ENOENT is easily explained by the removal of the cache directory. This commit reverts those commits, ultimately reporting the ENOENT as an error rather than repeating the create file operation. However this commit also adds an explicit error set to `std.Build.Cache.hit` as well as changing the `failed_file_index` to a proper diagnostic field that fully communicates what failed, leading to more informative error messages on failure to check the cache. The equivalent failure when occuring for AstGen performs a fatal process kill, reasoning being that the compiler has an invariant of the cache directory not being yanked out from underneath it while executing. This could be made a more granular error in the future but I suspect such thing is not valuable to pursue. Related to #18340 but does not solve it.	2024-12-10 18:11:12 -08:00
Andrew Kelley	11bf2d92de	diversify "unable to spawn" failure messages to help understand where a spurious failure is occurring	2024-11-26 13:56:40 -08:00
Andrew Kelley	f2dcfe0e40	link.File.Wasm: parse inputs in compilation pipeline Primarily, this moves linker input parsing from flush() into the linker task queue, which is executed simultaneously with the frontend. I also made it avoid redundantly opening the same archive file N times for each object file inside. Furthermore, hard code fixed buffer stream rather than using a generic stream type. Finally, I fixed the error handling of the Wasm.Archive.parse function. Please pay attention to this pattern of returning a struct rather than accepting a mutable struct as an argument. This ensures function-level atomicity and makes resource management straightforward. Deletes the file and path fields from Archive and Object. Removed a well-meaning but ultimately misguided suggestion about how to think about ZigObject since thinking about it that way has led to problematic anti-DOD patterns.	2024-10-30 23:43:53 -07:00
Alex Rønne Petersen	16b87f7082	link: Fix archive format selection for some OSs. * AIX has its own bespoke format. * Handle all Apple platforms. * FreeBSD and OpenBSD both use the GNU format in LLVM. * Windows has since been switched to the COFF format by default in LLVM.	2024-10-31 01:33:49 +01:00
Alex Rønne Petersen	f1f804e532	zig_llvm: Reduce our exposure to LLVM API breakage. LLVM recently introduced new Triple::ArchType members in 19.1.3 which broke our static assertions in zig_llvm.cpp. When implementing a fix for that, I realized that we don't even need a lot of the stuff we have in zig_llvm.(cpp,h) anymore. This commit trims the interface down considerably.	2024-10-31 01:27:22 +01:00
Andrew Kelley	504ad56815	link.flushTaskQueue: move safety lock The safety lock needs to happen after check()	2024-10-23 16:27:39 -07:00
Andrew Kelley	ba71079837	combine codegen work queue and linker task queue these tasks have some shared data dependencies so they cannot be done simultaneously. Future work should untangle these data dependencies so that more can be done in parallel. for now this commit ensures correctness by making linker input parsing and codegen tasks part of the same queue.	2024-10-23 16:27:39 -07:00
Andrew Kelley	336466c9df	glibc sometimes makes archives be ld scripts it is incredible how many bad ideas glibc is bundled into one project.	2024-10-23 16:27:39 -07:00
Andrew Kelley	5d75d8f6fc	also find static libc files on the host and don't look for glibc files on windows	2024-10-23 16:27:39 -07:00
Andrew Kelley	3cc19cd865	better error messages	2024-10-23 16:27:38 -07:00
Andrew Kelley	c2898c436f	branch fixes	2024-10-23 16:27:38 -07:00
Andrew Kelley	5ca54036ca	move linker input file parsing to the compilation pipeline	2024-10-23 16:27:38 -07:00
Andrew Kelley	cbcd67ea90	link.MachO: fix missing input classification	2024-10-23 16:27:38 -07:00
Andrew Kelley	e2a71b37d8	fix MachO linking regression	2024-10-23 16:27:38 -07:00
Andrew Kelley	8cfe303da9	fix resolving link inputs	2024-10-23 16:27:38 -07:00

1 2 3 4 5 ...

413 Commits