turbonss

motiejus/turbonss

Fork 0

Go to file

Motiejus Jakštys 1327587838 start with a full file structure

2022-02-13 18:01:44 +02:00

deps

add deps/cmph

2022-02-09 13:08:25 +02:00

include/deps/cmph

compile cmph from source

2022-02-10 06:07:52 +02:00

src

less error handling

2022-02-10 06:12:12 +02:00

.gitignore

first steps

2022-02-09 12:53:01 +02:00

.gitmodules

add deps/cmph

2022-02-09 13:08:25 +02:00

build.zig

compile cmph from source

2022-02-10 06:07:52 +02:00

README.md

start with a full file structure

2022-02-13 18:01:44 +02:00

README.md

Turbo NSS

Glibc nss library for passwd and group.

Checking out and building

$ git clone --recursive https://git.sr.ht/~motiejus/turbonss

Alternatively, if you forgot --recursive:

$ git submodule update --init

And run tests:

$ zig build test

Other commands will be documented as they are implemented.

This project uses git subtrac for managing dependencies.

remarks on `id(1)`

A known implementation runs id(1) at ~250 rps sequentially on ~20k users and ~10k groups. Our target is 10k id/s.

id(1) works as follows:

lookup user by name.
get all additional gids (an array attached to a member).
for each additional gid, get the group name.

Assuming a member is in ~100 groups on average, that's 1M group lookups per second. We need to convert gid to a group index, and group index to a group gid/name quickly.

Caveat: struct group contains an array of pointers to names of group members (char **gr_mem). However, id does not use that information, resulting in a significant read amplification. Therefore, if argv[0] == "id", getgrid(3) will return group without the members. This speeds up id by about 10x on a known NSS implementation.

Indices

The following operations need to be fast, in order of importance:

lookup gid -> group (this is on hot path in id) with or without members (2 separate calls).
lookup uid -> user.
lookup groupname -> group.
lookup username -> user.
(optional) iterate users using a defined order (getent passwd).
(optional) iterate groups using a defined order (getent group).

First 4 can use perfect hashing like cmph: it hashes a list of bytes to a sequential list of integers. Perfect hashing algorithms require some space, and take some time to calculate ("hashing duration"). I've tested BDZ, which hashes [][]u8 to a sequential list of integers (not preserving order) and CHM, which does the same, but preserves order. BDZ accepts an argument 3 <= b <= 10.

BDZ: tried b=3, b=7 (default), and b=10.

BDZ algorithm requires (900KB, 338KB, 306KB, respectively) for 1M values.
Latency to resolve 1M keys: (170ms, 180ms, 230ms).
Packed vs non-packed latency differences are not meaningful.

CHM retains order, however, 1M keys weigh 8MB. 10k keys are ~20x larger with CHM than with BDZ, eliminating the benefit of preserved ordering.

Turbonss header

The turbonss header looks like this:

OFFSET     TYPE     NAME                          DESCRIPTION
   0      [4]u8     magic                         always 0xf09fa4b7
   4         u8     version                       now `0`
   5        u16     bom                           0x1234
   7         u2     padding
             u6     num_shells                    see "SHELLS" section.
   8        u32     num_users                     number of passwd entries
  12        u32     num_groups                    number of group entries
  16        u32     offset_cmph_gid2group
  20        u32     offset_cmph_uid2user
  24        u32     offset_cmph_groupname2group
  28        u32     offset_cmph_username2user
  32        u32     offset_groupmembers
  36        u32     offset_additional_gids

magic is 0xf09fa4b7, and version must be 0. All integers are native-endian. bom is a byte-order-mark. It must resolve to 0x1234 (4460). If that's not true, the file is consumed in a different endianness than it was created at. Turbonss files cannot be moved across different-endianness computers. If that happens, turbonss will refuse to read the file.

Offsets are indices to further sections of the file, with zero being the first block (pointing to the magic field). As all blobs are 64-byte aligned, the offsets are always pointing to the beginning of an 64-byte "block". Therefore, all offset_* values could be u26. As u32 is easier to visualize with xxd, and the header block fits to 64 bytes anyway, we are keeping them as u32 now.

Primitive types:

const Group = struct {
    gid: u32,
    // index to a separate structure with a list of members. The memberlist is
    // always 2^5-byte aligned, this is an index there.
    members_offset: u27,
    groupname_len: u5,
    // a groupname_len-sized string
    groupname []u8;
}

const User = struct {
    uid: u32,
    gid: u32,
    // pointer to a separate structure that contains a list of gids
    additional_gids_offset: u29,
    // shell is a different story, documented elsewhere.
    shell_here: u1,
    shell_len_or_place: u6,
    home_len: u6,
    username_pos: u1,
    username_len: u5,
    gecos_len: u8,
    // a variable-sized array that will be stored immediately after this
    // struct.
    stringdata []u8;
}

Complete file structure

OFFSET    Section              SIZE                         DESCRIPTION
  0<<6    Header               1<<6                         documented above
  *<<6    []Group              num_groups * sizeof(Group)
  *<<6    []User               num_users * sizeof(User)
  *<<6    []u32                num_groups * sizeof(u32)
  *<<6    []u32                num_users * sizeof(u32)
  *<<6    Shells               unknown                      documented in "SHELLS"
  *<<6    cmph_gid2group       unknown                      offset by offset_cmph_gid2group
  *<<6    cmph_uid2user        unknown                      offset by offset_cmph_gid2group
  *<<6    cmph_groupname2group unknown                      offset by offset_cmph_groupname2group
  *<<6    cmph_username2user   unknown                      offset by offset_cmph_username2user
  *<<6    groupmembers         unknown                      list of group members for each group
  *<<6    additional_gids      unknown                      list of gids (group membership) for each member

TODO explain:

shells
additional_gids
groupmembers

README.md

Turbo NSS

Checking out and building

remarks on id(1)

Indices

Turbonss header

Complete file structure

remarks on `id(1)`