zig

fork of https://codeberg.org/ziglang/zig
Log | Files | Refs | README | LICENSE

commit 772debb03a3f15cfba5d100925260d86c1fcc9dc (tree)
parent e313584a488dde3b4edb2f2e6e97a646596fcbf3
Author: Alex Kladov <aleksey.kladov@gmail.com>
Date:   Tue, 18 Jul 2023 14:40:41 +0100

reduce AstGen.numberLiteral stack usage

At the moment, the LLVM IR we generate for this fn is

define internal fastcc void @AstGen.numberLiteral ...  {
Entry:
  ...
  %16 = alloca %"fmt.parse_float.decimal.Decimal(f128)", align 8
  ...

That `Decimal` is huuuge! It stores

    pub const max_digits =  11564;
    digits: [max_digits]u8,

on the stack.

It comes from `convertSlow` function, which LLVM happily inlined,
despite it being the cold path. Forbid inlining that to not penalize
callers with excessive stack usage.

Backstory: I was looking for needles memcpys in TigerBeetle, and came up
with this copyhound.zig tool for doing just that:

   https://github.com/tigerbeetle/tigerbeetle/blob/ee67e2ab95ed7ccf909be377dc613869738d48b4/src/copyhound.zig

Got curious, run it on the Zig's own code base, and looked at some of
the worst offenders.

List of worst offenders:

warning: crypto.kyber_d00.Kyber.SecretKey.decaps: 7776 bytes memcpy
warning: crypto.ff.Modulus.powPublic: 8160 bytes memcpy
warning: AstGen.numberLiteral: 11584 bytes memcpy
warning: crypto.tls.Client.init__anon_133566: 13984 bytes memcpy
warning: http.Client.connectUnproxied: 16896 bytes memcpy
warning: crypto.tls.Client.init__anon_133566: 16904 bytes memcpy
warning: objcopy.ElfFileHelper.tryCompressSection: 32768 bytes memcpy

Note from Andrew: I removed `noinline` from this commit since it should
be enough to set it to be cold.

Diffstat:
Mlib/std/fmt/parse_float/convert_slow.zig | 5+++++
1 file changed, 5 insertions(+), 0 deletions(-)

diff --git a/lib/std/fmt/parse_float/convert_slow.zig b/lib/std/fmt/parse_float/convert_slow.zig @@ -32,7 +32,12 @@ pub fn getShift(n: usize) usize { /// /// The algorithms described here are based on "Processing Long Numbers Quickly", /// available here: <https://arxiv.org/pdf/2101.11408.pdf#section.11>. +/// +/// Note that this function needs a lot of stack space and is marked +/// cold to hint against inlining into the caller. pub fn convertSlow(comptime T: type, s: []const u8) BiasedFp(T) { + @setCold(true); + const MantissaT = mantissaType(T); const min_exponent = -(1 << (math.floatExponentBits(T) - 1)) + 1; const infinite_power = (1 << math.floatExponentBits(T)) - 1;