Turbo NSS
---------

glibc nss library for passwd and group.

Checking out and building
-------------------------

```
$ git clone --recursive https://git.sr.ht/~motiejus/turbonss
```

Alternatively, if you forgot `--recursive`:

```
$ git submodule update --init
```

And run tests:

```
$ zig build test
```

... the other commands will be documented as they are implemented.

This project uses [git subtrac][git-subtrac] for managing dependencies.

Steps
-----

A known implementation runs id(1) at ~250 rps sequentially. Our goal is 10k
ID/s.

id(1) works as follows:
- lookup user by name.
- get all additional gids (an array attached to a member).
- for each additional gid, get the group name.

Assuming a member is in ~100 groups on average, that's 1M group lookups per
second (cmph can do 1M in <200ms). We need to convert gid to a group index
quickly.

API
---

The following operations need to be fast, in order of importance:

1. lookup gid -> group (this is on hot path in id) with or without members (2
   separate calls).
2. lookup uid -> user.
3. lookup groupname -> group.
4. lookup username -> user.
5. (optional) iterate users using a defined order (`getent passwd`).
6. (optional) iterate groups using a defined order (`getent group`).

Indices
-------

Preliminary results of playing with [cmph][cmph]:

BDZ: tried b=3, b=7 (default), and b=10.

* BDZ algorithm stores 1M values in (900KB, 338KB, 306KB) respectively.
* Latency for 1M keys: (170ms, 180ms, 230ms).
* Packed vs non-packed latency differences are not meaningful.

CHM retains order, however, 1M keys weigh 8MB. 10k keys are ~20x larger with
CHM than with BDZ, eliminating the benefit of preserved ordering.

Full file structure
-------------------

The file structure stars with magic and version number, followed by a list of
User, Group records and their indices. All indices are number of bytes,
relative to the beginning of the file.

```
const File = struct {
    magic: [4]u8,
    version: u4,
    padding: u4,
    num_shells: u8,
    num_users: u32,
    num_groups: u32,
    offset_cmph_gid2group: u26,
    offset_cmph_uid2user: u26,
    offset_cmph_groupname2group: u26,
    offset_cmph_username2user: u26,
    offset_sorted_groups: u26,
    offset_sorted_users: u26,
    offset_groupmembers: u26,
    offset_additional_gids: u26,
}
```

`magic` is 0xf09fa4b7, and `version` must be `0`. Offsets are indices to
further sections of the file, with zero being the first block (the magic
number). As all blobs are 64-byte aligned, the offsets are pointing to the
beginning of the 64-byte "block" (thus u26). All numbers are little-endian.

As of writing the file header is 40 bytes.

Primitive types:

```
const Group = struct {
    gid: u32,
    // index to a separate structure with a list of members
    members_offset: u29,
    padding: u3,
    groupname_len: u8,
    // a variable-sized array that will be stored immediately after this
    // struct.
    stringdata []u8;
}

const User = struct {
    uid: u32,
    gid: u32,
    // pointer to a separate structure that contains a list of gids
    additional_gids_offset: u29,
    // shell is a different story, documented elsewhere.
    shell_here: u1,
    shell_len_or_place: u6,
    home_len: u6,
    username_len: u6,
    gecos_len: u8,
    // a variable-sized array that will be stored immediately after this
    // struct.
    stringdata []u8;
}
```

TODO explain:
- shells
- `additional_gids`
- `members`

[git-subtrac]: https://github.com/apenwarr/git-subtrac/
[cmph]: http://cmph.sourceforge.net/