Commit Graph

67 Commits

Author SHA1 Message Date
Igor Anić
d645114f7e add deflate implemented from first principles
Zig deflate compression/decompression implementation. It supports compression and decompression of gzip, zlib and raw deflate format.

Fixes #18062.

This PR replaces current compress/gzip and compress/zlib packages. Deflate package is renamed to flate. Flate is common name for deflate/inflate where deflate is compression and inflate decompression.

There are breaking change. Methods signatures are changed because of removal of the allocator, and I also unified API for all three namespaces (flate, gzip, zlib).

Currently I put old packages under v1 namespace they are still available as compress/v1/gzip, compress/v1/zlib, compress/v1/deflate. Idea is to give users of the current API little time to postpone analyzing what they had to change. Although that rises question when it is safe to remove that v1 namespace.

Here is current API in the compress package:

```Zig
// deflate
    fn compressor(allocator, writer, options) !Compressor(@TypeOf(writer))
    fn Compressor(comptime WriterType) type

    fn decompressor(allocator, reader, null) !Decompressor(@TypeOf(reader))
    fn Decompressor(comptime ReaderType: type) type

// gzip
    fn compress(allocator, writer, options) !Compress(@TypeOf(writer))
    fn Compress(comptime WriterType: type) type

    fn decompress(allocator, reader) !Decompress(@TypeOf(reader))
    fn Decompress(comptime ReaderType: type) type

// zlib
    fn compressStream(allocator, writer, options) !CompressStream(@TypeOf(writer))
    fn CompressStream(comptime WriterType: type) type

    fn decompressStream(allocator, reader) !DecompressStream(@TypeOf(reader))
    fn DecompressStream(comptime ReaderType: type) type

// xz
   fn decompress(allocator: Allocator, reader: anytype) !Decompress(@TypeOf(reader))
   fn Decompress(comptime ReaderType: type) type

// lzma
    fn decompress(allocator, reader) !Decompress(@TypeOf(reader))
    fn Decompress(comptime ReaderType: type) type

// lzma2
    fn decompress(allocator, reader, writer !void

// zstandard:
    fn DecompressStream(ReaderType, options) type
    fn decompressStream(allocator, reader) DecompressStream(@TypeOf(reader), .{})
    struct decompress
```

The proposed naming convention:
 - Compressor/Decompressor for functions which return type, like Reader/Writer/GeneralPurposeAllocator
 - compressor/compressor for functions which are initializers for that type, like reader/writer/allocator
 - compress/decompress for one shot operations, accepts reader/writer pair, like read/write/alloc

```Zig
/// Compress from reader and write compressed data to the writer.
fn compress(reader: anytype, writer: anytype, options: Options) !void

/// Create Compressor which outputs the writer.
fn compressor(writer: anytype, options: Options) !Compressor(@TypeOf(writer))

/// Compressor type
fn Compressor(comptime WriterType: type) type

/// Decompress from reader and write plain data to the writer.
fn decompress(reader: anytype, writer: anytype) !void

/// Create Decompressor which reads from reader.
fn decompressor(reader: anytype) Decompressor(@TypeOf(reader)

/// Decompressor type
fn Decompressor(comptime ReaderType: type) type

```

Comparing this implementation with the one we currently have in Zig's standard library (std).
Std is roughly 1.2-1.4 times slower in decompression, and 1.1-1.2 times slower in compression. Compressed sizes are pretty much same in both cases.
More resutls in [this](https://github.com/ianic/flate) repo.

This library uses static allocations for all structures, doesn't require allocator. That makes sense especially for deflate where all structures, internal buffers are allocated to the full size. Little less for inflate where we std version uses less memory by not preallocating to theoretical max size array which are usually not fully used.

For deflate this library allocates 395K while std 779K.
For inflate this library allocates 74.5K while std around 36K.

Inflate difference is because we here use 64K history instead of 32K in std.

If merged existing usage of compress gzip/zlib/deflate need some changes. Here is example with necessary changes in comments:

```Zig

const std = @import("std");

// To get this file:
// wget -nc -O war_and_peace.txt https://www.gutenberg.org/ebooks/2600.txt.utf-8
const data = @embedFile("war_and_peace.txt");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer std.debug.assert(gpa.deinit() == .ok);
    const allocator = gpa.allocator();

    try oldDeflate(allocator);
    try new(std.compress.flate, allocator);

    try oldZlib(allocator);
    try new(std.compress.zlib, allocator);

    try oldGzip(allocator);
    try new(std.compress.gzip, allocator);
}

pub fn new(comptime pkg: type, allocator: std.mem.Allocator) !void {
    var buf = std.ArrayList(u8).init(allocator);
    defer buf.deinit();

    // Compressor
    var cmp = try pkg.compressor(buf.writer(), .{});
    _ = try cmp.write(data);
    try cmp.finish();

    var fbs = std.io.fixedBufferStream(buf.items);
    // Decompressor
    var dcp = pkg.decompressor(fbs.reader());

    const plain = try dcp.reader().readAllAlloc(allocator, std.math.maxInt(usize));
    defer allocator.free(plain);
    try std.testing.expectEqualSlices(u8, data, plain);
}

pub fn oldDeflate(allocator: std.mem.Allocator) !void {
    const deflate = std.compress.v1.deflate;

    // Compressor
    var buf = std.ArrayList(u8).init(allocator);
    defer buf.deinit();
    // Remove allocator
    // Rename deflate -> flate
    var cmp = try deflate.compressor(allocator, buf.writer(), .{});
    _ = try cmp.write(data);
    try cmp.close(); // Rename to finish
    cmp.deinit(); // Remove

    // Decompressor
    var fbs = std.io.fixedBufferStream(buf.items);
    // Remove allocator and last param
    // Rename deflate -> flate
    // Remove try
    var dcp = try deflate.decompressor(allocator, fbs.reader(), null);
    defer dcp.deinit(); // Remove

    const plain = try dcp.reader().readAllAlloc(allocator, std.math.maxInt(usize));
    defer allocator.free(plain);
    try std.testing.expectEqualSlices(u8, data, plain);
}

pub fn oldZlib(allocator: std.mem.Allocator) !void {
    const zlib = std.compress.v1.zlib;

    var buf = std.ArrayList(u8).init(allocator);
    defer buf.deinit();

    // Compressor
    // Rename compressStream => compressor
    // Remove allocator
    var cmp = try zlib.compressStream(allocator, buf.writer(), .{});
    _ = try cmp.write(data);
    try cmp.finish();
    cmp.deinit(); // Remove

    var fbs = std.io.fixedBufferStream(buf.items);
    // Decompressor
    // decompressStream => decompressor
    // Remove allocator
    // Remove try
    var dcp = try zlib.decompressStream(allocator, fbs.reader());
    defer dcp.deinit(); // Remove

    const plain = try dcp.reader().readAllAlloc(allocator, std.math.maxInt(usize));
    defer allocator.free(plain);
    try std.testing.expectEqualSlices(u8, data, plain);
}

pub fn oldGzip(allocator: std.mem.Allocator) !void {
    const gzip = std.compress.v1.gzip;

    var buf = std.ArrayList(u8).init(allocator);
    defer buf.deinit();

    // Compressor
    // Rename compress => compressor
    // Remove allocator
    var cmp = try gzip.compress(allocator, buf.writer(), .{});
    _ = try cmp.write(data);
    try cmp.close(); // Rename to finisho
    cmp.deinit(); // Remove

    var fbs = std.io.fixedBufferStream(buf.items);
    // Decompressor
    // Rename decompress => decompressor
    // Remove allocator
    // Remove try
    var dcp = try gzip.decompress(allocator, fbs.reader());
    defer dcp.deinit(); // Remove

    const plain = try dcp.reader().readAllAlloc(allocator, std.math.maxInt(usize));
    defer allocator.free(plain);
    try std.testing.expectEqualSlices(u8, data, plain);
}

```
2024-02-14 18:28:20 +01:00
Jakub Konka
216a5594f6 elf: use u32 for all section indexes 2024-02-13 20:33:08 +01:00
Jakub Konka
e401930fa8 elf: store relative offsets in atom and symbol 2024-02-13 20:33:01 +01:00
Jakub Konka
e5483b4ffc elf: fix 32bit build 2024-02-13 10:48:10 +01:00
Jakub Konka
8bd01eb7a9 elf: refactor archive specific object parsing logic 2024-02-12 23:59:19 +01:00
Jakub Konka
a94d5895cf elf: do not prealloc input objects, pread selectively 2024-02-12 23:07:51 +01:00
Jakub Konka
ce58f68903 elf: merge all mergeable string rodata sections into one 2024-01-26 05:48:32 +01:00
Jakub Konka
7a96907b92 elf: check for and report duplicate symbol definitions 2024-01-14 20:39:00 +01:00
Andrew Kelley
2047a6b82d fix remaining compile errors except one 2024-01-01 17:51:20 -07:00
Andrew Kelley
4629708787 linker: fix some allocator references 2024-01-01 17:51:20 -07:00
Andrew Kelley
f54471b54c compiler: miscellaneous branch progress
implement builtin.zig file population for all modules rather than
assuming there is only one global builtin.zig module.

move some fields from link.File to Compilation
move some fields from Module to Compilation

compute debug_format in global Compilation config resolution

wire up C compilation to the concept of owner modules

make whole cache mode call link.File.createEmpty() instead of
link.File.open()
2024-01-01 17:51:19 -07:00
Andrew Kelley
0789e91eeb linkers: update references to "options" field 2024-01-01 17:51:19 -07:00
Andrew Kelley
5a6a1f8a8a linker: update target references 2024-01-01 17:51:19 -07:00
Jakub Konka
ee1630beea elf: exit early with an error when parsing or init failed 2023-12-05 16:31:47 +01:00
Jakub Konka
e349bb2b66 elf: upcast e_shnum to u64 to check for valid ranges 2023-12-05 14:27:03 +01:00
Jakub Konka
52959bba7c elf: re-instate basic error reporting for LD script parser 2023-12-05 14:08:04 +01:00
Jakub Konka
3f42ed3ca2 elf: do not write ELF header if there were errors 2023-12-05 13:49:55 +01:00
Jakub Konka
af8621db2d elf: report error at the point where it is happening 2023-12-05 13:28:47 +01:00
Jakub Konka
6f3bbd5eaa elf: we were writing too many symbols in the symtab 2023-11-15 19:00:13 +01:00
Jakub Konka
6e797d8648 elf: add SHF_INFO_LINK flag to any emitted SHT_RELA section 2023-11-09 19:41:50 +01:00
Jakub Konka
acd7cbf0b5 elf: init output COMDAT group sections 2023-11-09 17:41:14 +01:00
Jakub Konka
1f8dd27e40 elf: correctly format output .eh_frame when emitting relocatable 2023-11-09 14:46:28 +01:00
Jakub Konka
ae08f9bfe9 elf: claim unresolved dangling symbols as undef externs in -r mode 2023-11-08 11:51:11 +01:00
Jakub Konka
e87c751558 elf: reference .rela sections via output section index 2023-11-08 10:57:34 +01:00
Jakub Konka
5e78600f0f elf: actually track output symtab index of symbols 2023-11-07 23:18:41 +01:00
Jakub Konka
0211d6bf4f elf: create link between .rela and output section 2023-11-07 14:42:27 +01:00
Jakub Konka
e22b3595c1 elf: update .rela section sizes 2023-11-07 14:29:44 +01:00
Jakub Konka
c7ed7c4690 elf: generate section symbols when writing symtab 2023-11-07 13:31:31 +01:00
Jakub Konka
3df53d1722 elf: create skeleton of required changes for supporting -r mode 2023-11-07 11:19:55 +01:00
Jakub Konka
8142925c7e elf: hook up saving object files in an archive 2023-11-05 13:37:13 +01:00
Jakub Konka
55fa8a04f1 elf: add hooks for archiving Objects 2023-11-05 12:56:17 +01:00
Jakub Konka
5c48236103 elf: init objects after parsing them 2023-11-05 12:37:15 +01:00
Jakub Konka
25c53f08a6 elf: redo strings management in the linker
* atom names - are stored locally and pulled from defining object's
  strtab
* local symbols - same
* global symbols - in principle, we could store them locally, but
  for better debugging experience - when things go wrong - we
  store the offsets in a global strtab used by the symbol resolver
2023-11-04 09:08:16 +01:00
Jakub Konka
8087ec8e8c elf: improve parsing of ld scripts and actually test linking against them 2023-10-24 19:03:00 +02:00
Jakub Konka
52e0ca1312 elf: parse GNU ld script as system lib indirection 2023-10-18 13:54:43 +02:00
Jakub Konka
d2727b808c elf: fix 32bit build 2023-10-16 19:56:47 +02:00
Jakub Konka
7b2cbcf0fe codegen+elf: check if extern is a variable ref 2023-10-16 19:33:06 +02:00
Jakub Konka
1efc0519ce elf: make init/fini sorting deterministic 2023-10-16 19:33:05 +02:00
Jakub Konka
def7190e84 elf: hook up common symbols handler 2023-10-16 19:33:04 +02:00
Jakub Konka
5fa90afb64 elf: fix synthetic section handling and actually parse DSOs 2023-10-16 19:33:04 +02:00
Jakub Konka
04a8f217c6 elf: fix COMDAT deduping logic 2023-10-16 19:33:04 +02:00
Jakub Konka
d6cec5a586 elf: add more prepwork for linking c++ objects 2023-10-16 19:33:04 +02:00
Jakub Konka
2c2bc66ce1 elf: handle .eh_frame and non-alloc sections 2023-10-16 19:33:04 +02:00
Jakub Konka
9ccd94d560 elf: refactor object.shdrContents to never error out 2023-10-16 19:33:04 +02:00
Jakub Konka
53340544c6 elf: get hello-world with LLVM in Zig working 2023-10-16 19:33:04 +02:00
Jakub Konka
1b70ad622b elf: port zld's allocation mechanism 2023-10-16 19:33:04 +02:00
Jakub Konka
6faed6269f elf: update section sizes accumulated from objects 2023-10-16 19:33:04 +02:00
Jakub Konka
679accd887 elf: initialize output sections from input objects in a separate step 2023-10-16 19:33:04 +02:00
Jakub Konka
89c2151a97 elf: move logic for extracing atom's code into input files 2023-09-28 18:35:26 +02:00
Jakub Konka
af00ac53b5 elf: report fatal linker error for unhandled relocation types 2023-09-28 14:06:12 +02:00