Commit Graph

66 Commits

Author SHA1 Message Date
Andrew Kelley
795e7c64d5 wasm linker: aggressive DODification
The goals of this branch are to:
* compile faster when using the wasm linker and backend
* enable saving compiler state by directly copying in-memory linker
  state to disk.
* more efficient compiler memory utilization
* introduce integer type safety to wasm linker code
* generate better WebAssembly code
* fully participate in incremental compilation
* do as much work as possible outside of flush(), while continuing to do
  linker garbage collection.
* avoid unnecessary heap allocations
* avoid unnecessary indirect function calls

In order to accomplish this goals, this removes the ZigObject
abstraction, as well as Symbol and Atom. These abstractions resulted
in overly generic code, doing unnecessary work, and needless
complications that simply go away by creating a better in-memory data
model and emitting more things lazily.

For example, this makes wasm codegen emit MIR which is then lowered to
wasm code during linking, with optimal function indexes etc, or
relocations are emitted if outputting an object. Previously, this would
always emit relocations, which are fully unnecessary when emitting an
executable, and required all function calls to use the maximum size LEB
encoding.

This branch introduces the concept of the "prelink" phase which occurs
after all object files have been parsed, but before any Zcu updates are
sent to the linker. This allows the linker to fully parse all objects
into a compact memory model, which is guaranteed to be complete when Zcu
code is generated.

This commit is not a complete implementation of all these goals; it is
not even passing semantic analysis.
2025-01-15 15:11:35 -08:00
Andrew Kelley
e501cf51a0 link.File.Wasm: unify the string tables
Before, the wasm struct had a string table, the ZigObject had a string
table, and each Object had a string table.

Now there is just the one. This makes for more efficient use of memory
and simplifies logic, particularly with regards to linker state
serialization.

This commit additionally adds significantly more integer type safety.
2024-10-31 22:01:10 -07:00
Andrew Kelley
f2dcfe0e40 link.File.Wasm: parse inputs in compilation pipeline
Primarily, this moves linker input parsing from flush() into the linker
task queue, which is executed simultaneously with the frontend.

I also made it avoid redundantly opening the same archive file N times
for each object file inside. Furthermore, hard code fixed buffer stream
rather than using a generic stream type.

Finally, I fixed the error handling of the Wasm.Archive.parse function.
Please pay attention to this pattern of returning a struct rather than
accepting a mutable struct as an argument. This ensures function-level
atomicity and makes resource management straightforward.

Deletes the file and path fields from Archive and Object.

Removed a well-meaning but ultimately misguided suggestion about how to
think about ZigObject since thinking about it that way has led to
problematic anti-DOD patterns.
2024-10-30 23:43:53 -07:00
Andrew Kelley
ba2d006634 link.File.Wasm: remove the "files" abstraction
Removes the `files` field from the Wasm linker, storing the ZigObject
as its own field instead using a tagged union.

This removes a layer of indirection when accessing the ZigObject, and
untangles logic so that we can introduce a "pre-link" phase that
prepares the linker state to handle only incremental updates to the
ZigObject and then minimize logic inside flush().

Furthermore, don't make array elements store their own indexes, that's
always a waste.

Flattens some of the file system hierarchy and unifies variable names
for easier refactoring.

Introduces type safety for optional object indexes.
2024-10-30 19:34:58 -07:00
Andrew Kelley
13fb68c064 link: consolidate diagnostics
By organizing linker diagnostics into this struct, it becomes possible
to share more code between linker backends, and more importantly it
becomes possible to pass only the Diag struct to some functions, rather
than passing the entire linker state object in. This makes data
dependencies more obvious, making it easier to rearrange code and to
multithread.

Also fix MachO code abusing an atomic variable. Not only was it using
the wrong atomic operation, it is unnecessary additional state since
the state is already being protected by a mutex.
2024-10-11 10:36:19 -07:00
Linus Groh
8588964972 Replace deprecated default initializations with decl literals 2024-09-12 16:01:23 +01:00
mlugg
0fe3fd01dd std: update std.builtin.Type fields to follow naming conventions
The compiler actually doesn't need any functional changes for this: Sema
does reification based on the tag indices of `std.builtin.Type` already!
So, no zig1.wasm update is necessary.

This change is necessary to disallow name clashes between fields and
decls on a type, which is a prerequisite of #9938.
2024-08-28 08:39:59 +01:00
Jakub Konka
cba3389d90 macho: redo input file parsing in prep for multithreading 2024-07-22 12:05:56 +02:00
sobolevn
4c71d3f29e Fix typos in code comments in src/ 2024-07-20 20:23:18 +03:00
Michael Bradshaw
642093e04b Rename *[UI]LEB128 functions to *[UI]leb128 2024-06-23 04:30:12 +01:00
Luuk de Gram
196ba706a0 wasm: gc fixes and re-enable linker tests
Certain symbols were left unmarked, meaning they would not be emit into
the final binary incorrectly. We now mark the synthetic symbols to ensure
they are emit as they are already created under the circumstance they're
needed for. This also re-enables disabled tests that were left disabled
in a previous merge conflict.
Lastly, this adds the shared-memory test to the test harnass as it was
previously forgotten and therefore regressed.
2024-02-29 15:24:08 +01:00
Luuk de Gram
5ba5a2c133 wasm: integrate linker errors with Compilation
Rather than using the logger, we now emit proper 'compiler'-errors just
like the ELF and MachO linkers with notes. We now also support emitting
multiple errors before quiting the linking process in certain phases,
such as symbol resolution. This means we will print all symbols which
were resolved incorrectly, rather than the first one we encounter.
2024-02-29 15:24:07 +01:00
Luuk de Gram
5ef8321338 wasm: make symbol indexes a non-exhaustive enum
This introduces some type safety so we cannot accidently give an atom
index as a symbol index. This also means we do not have to store any
optionals and therefore allow for memory optimizations. Lastly, we can
now always simply access the symbol index of an atom, rather than having
to call `getSymbolIndex` as it is easy to forget.
2024-02-29 15:24:07 +01:00
Luuk de Gram
0a030d6598 wasm: Use File.Index for symbol locations
Rather than using the optional, we now directly use `File.Index` which
can already represent an unknown file due to its `.null` value. This
means we do not pay for the memory cost.

This type of index is now used for:
- SymbolLoc
- Key of the functions map
- InitFunc

Now we can simply pass things like atom.file, object.file, loc.file etc
whenever we need to access its representing object file which makes it
a lot easier.
2024-02-29 15:23:03 +01:00
Luuk de Gram
cbc8d33062 wasm: fix symbol resolution and atom processing 2024-02-29 15:23:03 +01:00
Luuk de Gram
143e9599d6 wasm: use File abstraction instead of object
When merging sections we now make use of the `File` abstraction so all
objects such as globals, functions, imports, etc are also merged from
the `ZigObject` module. This allows us to use a singular way to perform
each link action without having to check the kind of the file.
The logic is mostly handled in the abstract file module, unless its
complexity warrants the handling within the corresponding module itself.
2024-02-29 15:23:03 +01:00
Luuk de Gram
2b3e6f680c wasm-linker: ensure custom sections are parsed
Not all custom sections are represented by a symbol, which means the
section will not be parsed by the lazy parsing and therefore get garbage-
collected. This is problematic as it may contain debug information that
should not be garbage-collected. To resolve this, we manually create
local symbols for those sections and also ensure they do not get garbage-
collected.
2024-01-12 14:57:32 +01:00
Andrew Kelley
2047a6b82d fix remaining compile errors except one 2024-01-01 17:51:20 -07:00
Luuk de Gram
596d1cd5a8 wasm-linker: support --no-gc-sections
By default we garbage-collect sections for Wasm to reduce size, as well
as finish linking quicker (as we have fewer things to do). However,
when the user specifies `--no-gc-sections` we ensure all resolved symbols
get marked and therefore do not get garbage collected.
This is supported in both incremental-mode and traditional linking.
2023-11-28 16:40:26 +01:00
Luuk de Gram
8856ba7505 wasm-linker: parse symbols into atoms lazily
Rather than parsing every symbol into an atom, we now only parse them
into an atom when such atom is marked. This means garbage-collected
symbols will also not be parsed into atoms, and neither are discarded
symbols which have been resolved by other symbols. (Such as multiple
weak symbols).

This also introduces a binary search for finding the start index into
the list of relocations. This speeds up finding the corresponding
relocations tremendously as they're ordered ascended by address.

Lastly, we re-use the memory of atom's data as well as relocations
instead of duplicating it. This means we half the memory usage of
atom's data and relocations for linked object files. As we are
aware of decls and synthetic atoms, we free the memory of those
atoms indepedently of the atoms of object files to prevent double-frees.
2023-11-28 15:47:07 +01:00
Andrew Kelley
d5e21a4f1a std: remove meta.trait
In general, I don't like the idea of std.meta.trait, and so I am
providing some guidance by deleting the entire namespace from the
standard library and compiler codebase.

My main criticism is that it's overcomplicated machinery that bloats
compile times and is ultimately unnecessary given the existence of Zig's
strong type system and reference traces.

Users who want this can create a third party package that provides this
functionality.

closes #18051
2023-11-22 13:24:27 -05:00
mlugg
b355893438 compiler: correct unnecessary uses of 'var' 2023-11-19 11:11:49 +00:00
Andrew Kelley
3fc6fc6812 std.builtin.Endian: make the tags lower case
Let's take this breaking change opportunity to fix the style of this
enum.
2023-10-31 21:37:35 -04:00
Jacob Young
d890e81761 mem: fix ub in writeInt
Use inline to vastly simplify the exposed API.  This allows a
comptime-known endian parameter to be propogated, making extra functions
for a specific endianness completely unnecessary.
2023-10-31 21:37:35 -04:00
Andrew Kelley
accd5701c2 compiler: move struct types into InternPool proper
Structs were previously using `SegmentedList` to be given indexes, but
were not actually backed by the InternPool arrays.

After this, the only remaining uses of `SegmentedList` in the compiler
are `Module.Decl` and `Module.Namespace`. Once those last two are
migrated to become backed by InternPool arrays as well, we can introduce
state serialization via writing these arrays to disk all at once.

Unfortunately there are a lot of source code locations that touch the
struct type API, so this commit is still work-in-progress. Once I get it
compiling and passing the test suite, I can provide some interesting
data points such as how it affected the InternPool memory size and
performance comparison against master branch.

I also couldn't resist migrating over a bunch of alignment API over to
use the log2 Alignment type rather than a mismash of u32 and u64 byte
units with 0 meaning something implicitly different and special at every
location. Turns out you can do all the math you need directly on the
log2 representation of alignments.
2023-09-21 14:48:40 -07:00
Luuk de Gram
3fd6e93f4f wasm-linker: prevent double-free on parse failure 2023-07-19 17:22:46 +02:00
mlugg
f26dda2117 all: migrate code to new cast builtin syntax
Most of this migration was performed automatically with `zig fmt`. There
were a few exceptions which I had to manually fix:

* `@alignCast` and `@addrSpaceCast` cannot be automatically rewritten
* `@truncate`'s fixup is incorrect for vectors
* Test cases are not formatted, and their error locations change
2023-06-24 16:56:39 -07:00
Eric Joldasov
50339f595a all: zig fmt and rename "@XToY" to "@YFromX"
Signed-off-by: Eric Joldasov <bratishkaerik@getgoogleoff.me>
2023-06-19 12:34:42 -07:00
r00ster91
2593156068 migration: std.math.{min, min3, max, max3} -> @min & @max 2023-06-16 13:44:09 -07:00
Luuk de Gram
9fce1df4cd wasm-linker: implement runtime TLS relocations 2023-03-18 20:13:30 +01:00
Luuk de Gram
fb9d3cd50e wasm-linker: feature verifiction for shared-mem
When the user enables shared-memory, we must ensure the linked objects
have the 'atomics' and 'bulk-memory' features allowed.
2023-03-18 20:13:29 +01:00
Luuk de Gram
09abd53da7 wasm-linker: refactor Limits and add flags
Rather than adding the flags "on-demand" during limits writing,
we now properly parse them and store the flags within the limits
itself. This also allows us to store whether we're using shared-
memory or not. Only when the correct flag is set will we set the
max within `Limits` or else we will leave it `undefined`.
2023-03-18 20:13:29 +01:00
Luuk de Gram
b0024c4884 wasm-linker: basic TLS support
Linker now parses segments with regards to TLS segments. If the name
represents a TLS segment but does not contain the TLS flag, we set it
manually as the object file is created using an older compiler (LLVM).

For now we panic when we find a TLS relocation and implement those
later.
2023-03-18 20:13:25 +01:00
Luuk de Gram
87738cad86 wasm-linker: store symbol's virtual address
For data symbols we will now store its virtual address. This means
we do no longer have to calculate it each time a relocation asks
for the address. This is now done for all data symbols only once
rather than every single relocation for that symbol.

This now also allows us directly store the virtual address of synthetic
symbols without having to create an atom for them. This means we also
don't need to have a "synthetic" segment any longer and do not emit
the synthetic symbols such as __heap_end and __heap_base into the final
binary.
2023-03-09 19:14:17 +01:00
Andrew Kelley
aeaef8c0ff update std lib and compiler sources to new for loop syntax 2023-02-18 19:17:21 -07:00
Luuk de Gram
46f54b23ae link: make Wasm atoms fully owned by the linker 2023-02-01 19:10:56 +01:00
Luuk de Gram
dd85092982 wasm-linker: Fix relocations for alias'd atoms
When an atom has one or multiple aliasses, we we could not find the
target atom from the alias'd symbol. This is solved by ensuring that
we also insert each alias symbol in the symbol-atom map.
2022-12-18 16:37:00 +01:00
Andrew Kelley
ceb0a632cf std.mem.Allocator: allow shrink to fail
closes #13535
2022-11-29 23:30:38 -07:00
Luuk de Gram
7f508480f4 wasm-linker: convert relocation addend to i32
Addends in relocations are signed integers as theoretically it could
be a negative number. As Atom's offsets are relative to their parent
section, the relocation value should still result in a postive number.
For this reason, the final result is stored as an unsigned integer.

Also, rather than using `null` for relocations that do not support
addends. We set the value to 0 for those that do not support addends,
and have to call `addendIsPresent` to determine if an addend exists
or not. This means each Relocation costs 4 bytes less than before,
saving memory while linking.
2022-10-08 17:23:13 +02:00
Luuk de Gram
61f317e386 wasm-linker: rename self to descriptive name 2022-09-12 21:19:16 +02:00
Jakub Konka
56b96cd61b Merge pull request #12772 from ziglang/coff-basic-imports
coff: implement enough of the incremental linker to pass behavior and incremental tests on Windows
2022-09-09 13:08:58 +02:00
Jakub Konka
678e07b924 macho+wasm: unify and clean up closing file handles 2022-09-07 22:42:59 +02:00
Luuk de Gram
a8d137d05a wasm-linker: support incremental debug info
Although the wasm-linker previously already supported
debug information in incremental-mode, this was no longer
working as-is with the addition of supporting object-file-parsed
debug information. This commit implements the Zig-created debug information
structure from scratch which is a lot more robust and also allows
being linked with debug information from other object files.
2022-09-07 18:59:36 +02:00
Luuk de Gram
46c932a2c9 wasm-linker: perform debug relocations
This correctly performs a relocation for debug sections.
The result is that the wasm-linker can now correctly create
a binary from object files while preserving all debug information.
2022-09-07 18:53:16 +02:00
Luuk de Gram
c347751338 wasm-linker: write debug sections from objects
We now link relocatable debug sections with the correct
section symbol and then allocate and resolve the debug atoms
before writing them into the final binary.

Although this does perform the relocation, the actual relocations
are not done correctly yet.
2022-09-07 18:53:16 +02:00
Luuk de Gram
f060edb0f3 wasm-linker: create atoms from debug sections 2022-09-07 18:53:16 +02:00
Luuk de Gram
9a92f3d290 wasm/Object: parse debug sections into reloc data
Rather than storing the name of a debug section into the structure
`RelocatableData`, we use the `index` field as an offset into the
debug names table. This means we do not have to store an extra 16 bytes
for non-debug sections which can be massive for object files where each
data symbol has its own data section. The name of a debug section
can then be retrieved again when needed by using the offset and
then reading until the 0-delimiter.
2022-09-07 18:53:12 +02:00
Luuk de Gram
1544625df3 wasm/Object: parse using the correct file size
When an object file is being parsed from within an archive
file, we provide the object file size to ensure we do not
read past the object file. This is because follow up object
files can exist there, as well as an LF character to notate
the end of the file was reached. Such a character is invalid
within the object file.

This also fixes a bug in getting the function/global type
for defined globals/functions from object files as it was missing
the substraction with the import count of the respective type.
2022-08-20 14:50:11 +02:00
InKryption
a0d3a87ce1 std.fmt: require specifier for unwrapping ?T and E!T 2022-07-26 11:25:49 -07:00
Andrew Kelley
934573fc5d Revert "std.fmt: require specifier for unwrapping ?T and E!T."
This reverts commit 7cbd586ace.

This is causing a fail to build from source:

```
./lib/std/fmt.zig:492:17: error: cannot format optional without a specifier (i.e. {?} or {any})
                @compileError("cannot format optional without a specifier (i.e. {?} or {any})");
                ^
./src/link/MachO/Atom.zig:544:26: note: called from here
                log.debug("  RELA({s}) @ {x} => %{d} in object({d})", .{
                         ^
```

I looked at the code to fix it but none of those args are optionals.
2022-07-24 11:50:10 -07:00