turbonss

motiejus/turbonss

Fork 0

Go to file

Motiejus Jakštys 0c33c27e56 reduce number of global shells

2022-02-12 23:06:03 +02:00

deps

add deps/cmph

2022-02-09 13:08:25 +02:00

include/deps/cmph

compile cmph from source

2022-02-10 06:07:52 +02:00

src

less error handling

2022-02-10 06:12:12 +02:00

.gitignore

first steps

2022-02-09 12:53:01 +02:00

.gitmodules

add deps/cmph

2022-02-09 13:08:25 +02:00

build.zig

compile cmph from source

2022-02-10 06:07:52 +02:00

README.md

reduce number of global shells

2022-02-12 23:06:03 +02:00

README.md

Turbo NSS

glibc nss library for passwd and group.

Checking out and building

$ git clone --recursive https://git.sr.ht/~motiejus/turbonss

Alternatively, if you forgot --recursive:

$ git submodule update --init

And run tests:

$ zig build test

... the other commands will be documented as they are implemented.

This project uses git subtrac for managing dependencies.

Steps

A known implementation runs id(1) at ~250 rps sequentially. Our goal is 10k ID/s.

id(1) works as follows:

lookup user by name.
get all additional gids (an array attached to a member).
for each additional gid, get the group name.

Assuming a member is in ~100 groups on average, that's 1M group lookups per second (cmph can do 1M in <200ms). We need to convert gid to a group index quickly.

API

The following operations need to be fast, in order of importance:

lookup gid -> group (this is on hot path in id) with or without members (2 separate calls).
lookup uid -> user.
lookup groupname -> group.
lookup username -> user.
(optional) iterate users using a defined order (getent passwd).
(optional) iterate groups using a defined order (getent group).

Indices

Preliminary results of playing with cmph:

BDZ: tried b=3, b=7 (default), and b=10.

BDZ algorithm stores 1M values in (900KB, 338KB, 306KB) respectively.
Latency for 1M keys: (170ms, 180ms, 230ms).
Packed vs non-packed latency differences are not meaningful.

CHM retains order, however, 1M keys weigh 8MB. 10k keys are ~20x larger with CHM than with BDZ, eliminating the benefit of preserved ordering.

Full file structure

The file structure stars with magic and version number, followed by a list of User, Group records and their indices. All indices are number of bytes, relative to the beginning of the file.

const File = struct {
    magic: [4]u8,
    version: u4,
    padding: u4,
    num_shells: u6,
    padding: u2,
    num_users: u32,
    num_groups: u32,
    offset_cmph_gid2group: u26,
    offset_cmph_uid2user: u26,
    offset_cmph_groupname2group: u26,
    offset_cmph_username2user: u26,
    offset_sorted_groups: u26,
    offset_sorted_users: u26,
    offset_groupmembers: u26,
    offset_additional_gids: u26,
}

magic is 0xf09fa4b7, and version must be 0. Offsets are indices to further sections of the file, with zero being the first block (the magic number). As all blobs are 64-byte aligned, the offsets are pointing to the beginning of the 64-byte "block" (thus u26). All numbers are little-endian.

As of writing the file header is 40 bytes.

Primitive types:

const Group = struct {
    gid: u32,
    // index to a separate structure with a list of members
    members_offset: u29,
    padding: u3,
    groupname_len: u8,
    // a variable-sized array that will be stored immediately after this
    // struct.
    stringdata []u8;
}

const User = struct {
    uid: u32,
    gid: u32,
    // pointer to a separate structure that contains a list of gids
    additional_gids_offset: u29,
    // shell is a different story, documented elsewhere.
    shell_here: u1,
    shell_len_or_place: u6,
    home_len: u6,
    username_len: u6,
    gecos_len: u8,
    // a variable-sized array that will be stored immediately after this
    // struct.
    stringdata []u8;
}

TODO explain:

shells
additional_gids
members