turbonss

motiejus/turbonss

Fork 0

Go to file

Motiejus Jakštys b3446ef307 document global structure better

2022-02-13 10:42:40 +02:00

deps

add deps/cmph

2022-02-09 13:08:25 +02:00

include/deps/cmph

compile cmph from source

2022-02-10 06:07:52 +02:00

src

less error handling

2022-02-10 06:12:12 +02:00

.gitignore

first steps

2022-02-09 12:53:01 +02:00

.gitmodules

add deps/cmph

2022-02-09 13:08:25 +02:00

build.zig

compile cmph from source

2022-02-10 06:07:52 +02:00

README.md

document global structure better

2022-02-13 10:42:40 +02:00

README.md

Turbo NSS

glibc nss library for passwd and group.

Checking out and building

$ git clone --recursive https://git.sr.ht/~motiejus/turbonss

Alternatively, if you forgot --recursive:

$ git submodule update --init

And run tests:

$ zig build test

... the other commands will be documented as they are implemented.

This project uses git subtrac for managing dependencies.

Steps

A known implementation runs id(1) at ~250 rps sequentially. Our goal is 10k ID/s.

id(1) works as follows:

lookup user by name.
get all additional gids (an array attached to a member).
for each additional gid, get the group name.

Assuming a member is in ~100 groups on average, that's 1M group lookups per second (cmph can do 1M in <200ms). We need to convert gid to a group index quickly.

API

The following operations need to be fast, in order of importance:

lookup gid -> group (this is on hot path in id) with or without members (2 separate calls).
lookup uid -> user.
lookup groupname -> group.
lookup username -> user.
(optional) iterate users using a defined order (getent passwd).
(optional) iterate groups using a defined order (getent group).

Indices

Preliminary results of playing with cmph:

BDZ: tried b=3, b=7 (default), and b=10.

BDZ algorithm stores 1M values in (900KB, 338KB, 306KB) respectively.
Latency for 1M keys: (170ms, 180ms, 230ms).
Packed vs non-packed latency differences are not meaningful.

CHM retains order, however, 1M keys weigh 8MB. 10k keys are ~20x larger with CHM than with BDZ, eliminating the benefit of preserved ordering.

Full file structure

The turbonss header looks like this:

OFFSET     TYPE     NAME                          DESCRIPTION
   0      [4]u8     magic                         always 0xf09fa4b7
   4         u8     version                       now `0`
   5         u2     padding
             u6     num_shells                    see "SHELLS" section.
   6        u32     num_users                     number of passwd entries
  10        u32     num_groups                    number of group entries
  14        u32     offset_cmph_gid2group
  18        u32     offset_cmph_uid2user
  22        u32     offset_cmph_groupname2group
  26        u32     offset_cmph_username2user
  30        u32     offset_sorted_groups
  34        u32     offset_sorted_users
  38        u32     offset_groupmembers
  42        u32     offset_additional_gids

magic is 0xf09fa4b7, and version must be 0. All integers are big-endian. Offsets are indices to further sections of the file, with zero being the first block (the magic number). As all blobs are 64-byte aligned, the offsets are always pointing to the beginning of an 64-byte "block". Therefore, all offset_* values could be u26. As u32 is easier to visualize with xxd, and the File block fits to 64 bytes anyway, we are keeping them as u32 now.

Primitive types:

const Group = struct {
    gid: u32,
    // index to a separate structure with a list of members. The memberlist is
    // always 2^5-byte aligned, this is an index there.
    members_offset: u27,
    groupname_len: u5,
    // a groupname_len-sized string
    groupname []u8;
}

const User = struct {
    uid: u32,
    gid: u32,
    // pointer to a separate structure that contains a list of gids
    additional_gids_offset: u29,
    // shell is a different story, documented elsewhere.
    shell_here: u1,
    shell_len_or_place: u6,
    home_len: u6,
    username_pos: u1,
    username_len: u5,
    gecos_len: u8,
    // a variable-sized array that will be stored immediately after this
    // struct.
    stringdata []u8;
}

TODO explain:

shells
additional_gids
members