turbonss/README.md

Turbo NSS
---------

Turbonss is a plugin for GNU Name Service Switch (NSS) functionality of GNU C
Library (glibc). Turbonss implements lookup for `user` and `passwd` database
entries (i.e. system users, groups, and group memberships). It's main goal is
performance, with focus on making [`id(1)`][id] run as fast as possible.

Turbonss is optimized for reading. If the data changes in any way, the whole
file will need to be regenerated (and tooling only supports only full
generation). It was created, and best suited, for environments that have a
central user & group database which then needs to be distributed to many
servers/services.

To understand more about name service switch, start with
[`nsswitch.conf(5)`][nsswitch].

Design & constraints
--------------------

To be fast, the user/group database (later: DB) has to be small
([background][data-oriented-design]). It encodes user & group information in a
way that minimizes the DB size, and reduces jumping across the DB ("chasing
pointers and thrashing CPU cache").

To understand how this is done efficiently, let's analyze the
[`getpwnam_r(3)`][getpwnam_r] in high level. This API call accepts a username
and returns the following user information:

```
struct passwd {
   char   *pw_name;       /* username */
   char   *pw_passwd;     /* user password */
   uid_t   pw_uid;        /* user ID */
   gid_t   pw_gid;        /* group ID */
   char   *pw_gecos;      /* user information */
   char   *pw_dir;        /* home directory */
   char   *pw_shell;      /* shell program */
};
```

Turbonss, among others, implements this call, and takes the following steps to
resolve a username to a `struct passwd*`:

- Open the DB (using `mmap`) and interpret it's first 40 bytes as a `struct
  Header`. The header stores offsets to the sections of the file. This needs to
  be done once, when the NSS library is loaded (or on the first call).
- Hash the username using a perfect hash function. Perfect hash function
  returns a number `n ∈ [0,N-1]`, where N is the total number of users.
- Jump to the `n`'th location in the `idx_username2user` section (by pointer
  arithmetic), which contains the index `i` to the user's information.
- Jump to the location `i` of section `Users` (again, using pointer arithmetic)
  which stores the full user information.
- Decode the user information (which is all in a continuous memory block) and
  return it to the caller.

In total, that's one hash for the username (~150ns), two pointer jumps within
the group file (to sections `idx_username2user` and `Users`), and, now that the
user record is found, `memcpy` for each field.

The turbonss DB file is be `mmap`-ed, making it simple to implement pointer
arithmetic and jumping across the file. This also reduces memory usage,
especially across multiple concurrent invocations of the `id` command. The
consumed heap space for each separate turbonss instance will be minimal.

Tight packing places some constraints on the underlying data:

- Maximum database size: 4GB.
- Permitted length of username and groupname: 1-32 bytes.
- Permitted length of shell and homedir: 1-64 bytes.
- Permitted comment ("gecos") length: 0-255 bytes.
- Username, groupname and gecos must be utf8-encoded.

Checking out and building
-------------------------

```
$ git clone --recursive https://git.sr.ht/~motiejus/turbonss
```

Alternatively, if you forgot `--recursive`:

```
$ git submodule update --init
```

And run tests:

```
$ zig build test
```

Other commands will be documented as they are implemented.

This project uses [git subtrac][git-subtrac] for managing dependencies.

remarks on `id(1)`
------------------

A known implementation runs id(1) at ~250 rps sequentially on ~20k users and
~10k groups. Our target is 10k id/s for the same payload.

To better reason about the trade-offs, it is useful to understand how `id(1)`
is implemented, in rough terms:
- lookup user by name.
- get all additional gids (an array attached to a member).
- for each additional gid, get the group information (`struct group*`).

Assuming a member is in ~100 groups on average, that's 1M group lookups per
second. We need to convert gid to a group index, and group index to a group
gid/name quickly.

Caveat: `struct group` contains an array of pointers to names of group members
(`char **gr_mem`). However, `id` does not use that information, resulting in
read amplification. Therefore, if `argv[0] == "id"`, our implementation of
`getgrid(3)` returns the `struct group*` without the members. This speeds up
`id` by about 10x on a known NSS implementation.

Relatedly, because `getgrid(3)` does not need the group members, the group
members are stored in a different DB sectoin, making the `Groups` section
smaller, thus more CPU-cache-friendly in the hot path.

Indices
-------

Now that we've sketched the implementation of `id(3)`, it's clearer to
understand which operations need to be fast; in order of importance:

1. lookup gid -> group info (this is on hot path in id) without members.
2. lookup uid -> user.
3. lookup groupname -> group.
4. lookup username -> user.

These indices can use perfect hashing like [cmph][cmph]: a perfect hash hashes
a list of bytes to a sequential list of integers. Perfect hashing algorithms
require some space, and take some time to calculate ("hashing duration"). I've
tested BDZ, which hashes [][]u8 to a sequential list of integers (not
preserving order) and CHM, preserves order. BDZ accepts an optional argument `3
<= b <= 10`.

* BDZ algorithm requires (b=3, 900KB, b=7, 338KB, b=10, 306KB) for 1M values.
* Latency to resolve 1M keys: (170ms, 180ms, 230ms, respectively).
* Packed vs non-packed latency differences are not meaningful.

CHM retains order, however, 1M keys weigh 8MB. 10k keys are ~20x larger with
CHM than with BDZ, eliminating the benefit of preserved ordering: we can just
have a separate index.

Turbonss header
---------------

The turbonss header looks like this:

```
OFFSET     TYPE     NAME                          DESCRIPTION
   0      [4]u8     magic                         always 0xf09fa4b7
   4         u8     version                       now `0`
   5        u16     bom                           0x1234
   7         u6     num_shells                    max value: 63
   8        u32     num_users                     number of passwd entries
  12        u32     num_groups                    number of group entries
  16        u32     offset_cmph_uid2user
  20        u32     offset_cmph_groupname2group
  24        u32     offset_cmph_username2user
  28        u32     offset_idx                    offset to the first idx_ section
  32        u32     offset_groups
  36        u32     offset_users
  40        u32     offset_groupmembers
  44        u32     offset_additional_gids
```

`magic` is 0xf09fa4b7, and `version` must be `0`. All integers are
native-endian. `bom` is a byte-order-mark. It must resolve to `0x1234` (4460).
If that's not true, the file is consumed in a different endianness than it was
created at. Turbonss files cannot be moved across different-endianness
computers. If that happens, turbonss will refuse to read the file.

Offsets are indices to further sections of the file, with zero being the first
block (pointing to the `magic` field). As all blobs are 64-byte aligned, the
offsets are always pointing to the beginning of an 64-byte "block". Therefore,
all `offset_*` values could be `u26`. As `u32` is easier to visualize with xxd,
and the header block fits to 64 bytes anyway, we are keeping them as u32 now.

Sections whose lengths can be calculated do not have a corresponding `offset_*`
header field. For example, `cmph_gid2group` comes immediately after the header,
and `idx_groupname2group` comes after `idx_gid2group`, whose offset is
`offset_idx`, and size can be calculated.

Primitive types
---------------

`User` and `Group` entries are sorted by name, ordered by their unicode
codepoints. They are byte-aligned (8bits). All `User` and `Group` entries are
referred by their byte offset in the `Users` and `Groups` section relative to
the beginning of the section.

```
const Group = struct {
    gid: u32,
    // index to a separate structure with a list of members. The memberlist is
    // always 2^5-byte aligned (32b), this is an index there.
    members_offset: u27,
    groupname_len: u5,
    // a groupname_len-sized string
    groupname []u8;
}

const User = struct {
    uid: u32,
    gid: u32,
    // pointer to a separate structure that contains a list of gids
    additional_gids_offset: u29,
    // shell is a different story, documented elsewhere.
    shell_here: u1,
    shell_len_or_place: u6,
    homedir_len: u6,
    username_is_a_suffix: u1,
    username_offset_or_len: u5,
    gecos_len: u8,
    // a variable-sized array that will be stored immediately after this
    // struct.
    stringdata []u8;
}
```

`stringdata` contains a few string entries:
- homedir.
- username.
- gecos.
- shell (optional).

First byte of the homedir is stored right after the `gecos_len` field, and it's
length is `homedir_len`. The same logic applies to all the `stringdata` fields:
there is a way to calculate their relative position from the length of the
fields before them.

Additionally, two optimizations for special fields are made:
- shells are often shared across different users, see the "Shells" section.
- username is frequently a suffix of the homedir. For example, `/home/motiejus`
  and `motiejus`. In which case storing both username and homedir strings is
  wasteful. For that cases, username has two options:
  1. `username_is_a_suffix=true`: username is a suffix of the home dir. In that
  case, the username starts at the `username_offset_or_len`'th byte of the
  homedir, and ends at the same place as the homedir.
  2. `username_is_a_suffix=false`: username is stored separately. In that case,
  it begins one byte after homedir, and it's length is
  `username_offset_or_len`.

Shells
------

Normally there is a limited number of shells even in the huge user databases. A
few examples: `/bin/bash`, `/usr/bin/nologin`, `/bin/zsh` among others.
Therefore, "shells" have an optimization: they can be pointed by in the
external list, or reside among the user's data.

63 most popular shells (i.e. referred to by at least two User entries) are
stored externally in "Shells" area. The less popular ones are stored with
userdata.

There are two "Shells" areas: the index and the blob. The index is a list of
structs which point to a location in the "blob" area:

```
const ShellIndex = struct {
    offset: u10,
    len: u6,
};
```

In the user's struct the `shell_here=true` bit signifies that the shell is
stored with userdata. `false` means it is stored in the `Shells` section. If
the shell is stored "here", it is the first element in `stringdata`, and it's
length is `shell_len_or_place`. If it is stored externally, the latter variable
points to it's index in the ShellIndex area.

Shells in the external storage are sorted by their weight, which is
`length*frequency`.

Variable-length integers (varints)
----------------------------------

Varint is an efficiently encoded integer (packed for small values). Same as
[protocol buffer varints][varint], except the largest possible value is `u64`.
They compress integers well.

`groupmembers`, `additional_gids`
---------------------------------

`groupmembers` and `additional_gids` store group and user memberships
respectively: for each group, a list of pointers (offsets) to User records, and
for each user — a list of pointers to Group records. These fields are always
used in their entirety — not necessitating random access, thus suitable for
tight packing.

An entry of `groupmembers` and `additional_gids` looks like this piece of
pseudo-code:

```
const PackedList = struct {
    length: varint,
    members: []varint
}
const Groupmembers = PackedList;
const AdditionalGids = PackedList;
```

An entry in `members` field points to the offset into a respective `User` or
`Group` entry (number of bytes relative to the first entry of the type).
`members` in `PackedList` is sorted by the name (`username` or `groupname`) of
the record it is pointing to.

A packed list is a list of varints.

Complete file structure
-----------------------

`idx_*` sections are of type `[]PackedIntArray(u29)` and are pointing to the
respective `Groups` and `Users` entries (from the beginning of the respective
section). Since User and Group records are 8-byte aligned, 3 bits are saved
from every element.

Each section is padded to 64 bytes.

```
SECTION               SIZE                 DESCRIPTION
Header                48                   see "Turbonss header" section
cmph_gid2group        ?                    gid->group cmph
cmph_uid2user         ?                    uid->user cmph
cmph_groupname2group  ?                    groupname->group cmph
cmph_username2user    ?                    username->user cmph
idx_gid2group         len(group)*4*29/32   cmph->offset gid2group
idx_groupname2group   len(group)*4*29/32   cmph->offset groupname2group
idx_uid2user          len(user)*4*29/32    cmph->offset uid2user
idx_username2user     len(user)*4*29/32    cmph->offset username2user
ShellIndex            len(shells)*2        Shell index array
ShellBlob             <= 4032              Shell data blob (max 63*64 bytes)
Groups                ?                    packed Group entries (8b padding)
Users                 ?                    packed User entries (8b padding)
groupmembers          ?                    per-group memberlist (32b padding)
additional_gids       ?                    per-user grouplist (8b padding)
```

[git-subtrac]: https://github.com/apenwarr/git-subtrac/
[cmph]: http://cmph.sourceforge.net/
[id]: https://linux.die.net/man/1/id
[nsswitch]: https://linux.die.net/man/5/nsswitch.conf
[data-oriented-design]: https://media.handmade-seattle.com/practical-data-oriented-design/
[getpwnam_r]: https://linux.die.net/man/3/getpwnam_r
[varint]: https://developers.google.com/protocol-buffers/docs/encoding#varints
Let it be so. 2022-02-08 09:52:47 +02:00			`Turbo NSS`
			`---------`

more nss docs 2022-02-14 10:55:49 +02:00			`Turbonss is a plugin for GNU Name Service Switch (NSS) functionality of GNU C`
			Library (glibc). Turbonss implements lookup for `user` and `passwd` database
			`entries (i.e. system users, groups, and group memberships). It's main goal is`
			performance, with focus on making [`id(1)`][id] run as fast as possible.

add missing headers 2022-02-14 13:55:54 +02:00			`Turbonss is optimized for reading. If the data changes in any way, the whole`
			`file will need to be regenerated (and tooling only supports only full`
			`generation). It was created, and best suited, for environments that have a`
			`central user & group database which then needs to be distributed to many`
			`servers/services.`

more nss docs 2022-02-14 10:55:49 +02:00			`To understand more about name service switch, start with`
explain some optimizations 2022-02-14 13:05:33 +02:00			[`nsswitch.conf(5)`][nsswitch].
more nss docs 2022-02-14 10:55:49 +02:00
			`Design & constraints`
			`--------------------`

add missing fields 2022-02-14 13:37:10 +02:00			`To be fast, the user/group database (later: DB) has to be small`
			`([background][data-oriented-design]). It encodes user & group information in a`
			`way that minimizes the DB size, and reduces jumping across the DB ("chasing`
			`pointers and thrashing CPU cache").`
more nss docs 2022-02-14 10:55:49 +02:00
add missing fields 2022-02-14 13:37:10 +02:00			`To understand how this is done efficiently, let's analyze the`
			[`getpwnam_r(3)`][getpwnam_r] in high level. This API call accepts a username
			`and returns the following user information:`
more nss docs 2022-02-14 10:55:49 +02:00
			```
			`struct passwd {`
			`char pw_name; / username */`
			`char pw_passwd; / user password */`
			`uid_t pw_uid; /* user ID */`
			`gid_t pw_gid; /* group ID */`
			`char pw_gecos; / user information */`
			`char pw_dir; / home directory */`
			`char pw_shell; / shell program */`
			`};`
			```

			`Turbonss, among others, implements this call, and takes the following steps to`
explain some optimizations 2022-02-14 13:05:33 +02:00			resolve a username to a `struct passwd*`:
more nss docs 2022-02-14 10:55:49 +02:00
add missing fields 2022-02-14 13:37:10 +02:00			- Open the DB (using `mmap`) and interpret it's first 40 bytes as a `struct
			Header`. The header stores offsets to the sections of the file. This needs to
			`be done once, when the NSS library is loaded (or on the first call).`
more nss docs 2022-02-14 10:55:49 +02:00			`- Hash the username using a perfect hash function. Perfect hash function`
explain some optimizations 2022-02-14 13:05:33 +02:00			returns a number `n ∈ [0,N-1]`, where N is the total number of users.
add missing fields 2022-02-14 13:37:10 +02:00			- Jump to the `n`'th location in the `idx_username2user` section (by pointer
			arithmetic), which contains the index `i` to the user's information.
			- Jump to the location `i` of section `Users` (again, using pointer arithmetic)
			`which stores the full user information.`
more nss docs 2022-02-14 10:55:49 +02:00			`- Decode the user information (which is all in a continuous memory block) and`
			`return it to the caller.`

			`In total, that's one hash for the username (~150ns), two pointer jumps within`
add missing fields 2022-02-14 13:37:10 +02:00			the group file (to sections `idx_username2user` and `Users`), and, now that the
			user record is found, `memcpy` for each field.
more nss docs 2022-02-14 10:55:49 +02:00
explain some optimizations 2022-02-14 13:05:33 +02:00			The turbonss DB file is be `mmap`-ed, making it simple to implement pointer
			`arithmetic and jumping across the file. This also reduces memory usage,`
			especially across multiple concurrent invocations of the `id` command. The
			`consumed heap space for each separate turbonss instance will be minimal.`

			`Tight packing places some constraints on the underlying data:`
more nss docs 2022-02-14 10:55:49 +02:00
			`- Maximum database size: 4GB.`
shellpop skeleton 2022-02-15 10:49:03 +02:00			`- Permitted length of username and groupname: 1-32 bytes.`
			`- Permitted length of shell and homedir: 1-64 bytes.`
			`- Permitted comment ("gecos") length: 0-255 bytes.`
			`- Username, groupname and gecos must be utf8-encoded.`
Let it be so. 2022-02-08 09:52:47 +02:00
[readme] add download/build instructions 2022-02-09 13:14:42 +02:00			`Checking out and building`
			`-------------------------`

			```
			`$ git clone --recursive https://git.sr.ht/~motiejus/turbonss`
			```

			Alternatively, if you forgot `--recursive`:

			```
			`$ git submodule update --init`
			```

			`And run tests:`

			```
			`$ zig build test`
			```

start with a full file structure 2022-02-13 18:01:44 +02:00			`Other commands will be documented as they are implemented.`
[readme] add download/build instructions 2022-02-09 13:14:42 +02:00
			`This project uses [git subtrac][git-subtrac] for managing dependencies.`

start with a full file structure 2022-02-13 18:01:44 +02:00			remarks on `id(1)`
			`------------------`
Let it be so. 2022-02-08 09:52:47 +02:00
start with a full file structure 2022-02-13 18:01:44 +02:00			`A known implementation runs id(1) at ~250 rps sequentially on ~20k users and`
explain some optimizations 2022-02-14 13:05:33 +02:00			`~10k groups. Our target is 10k id/s for the same payload.`
Let it be so. 2022-02-08 09:52:47 +02:00
explain some optimizations 2022-02-14 13:05:33 +02:00			To better reason about the trade-offs, it is useful to understand how `id(1)`
			`is implemented, in rough terms:`
Let it be so. 2022-02-08 09:52:47 +02:00			`- lookup user by name.`
			`- get all additional gids (an array attached to a member).`
explain some optimizations 2022-02-14 13:05:33 +02:00			- for each additional gid, get the group information (`struct group*`).
Let it be so. 2022-02-08 09:52:47 +02:00
			`Assuming a member is in ~100 groups on average, that's 1M group lookups per`
start with a full file structure 2022-02-13 18:01:44 +02:00			`second. We need to convert gid to a group index, and group index to a group`
			`gid/name quickly.`
Let it be so. 2022-02-08 09:52:47 +02:00
start with a full file structure 2022-02-13 18:01:44 +02:00			Caveat: `struct group` contains an array of pointers to names of group members
explain some optimizations 2022-02-14 13:05:33 +02:00			(`char **gr_mem`). However, `id` does not use that information, resulting in
			read amplification. Therefore, if `argv[0] == "id"`, our implementation of
			`getgrid(3)` returns the `struct group*` without the members. This speeds up
			`id` by about 10x on a known NSS implementation.
start with a full file structure 2022-02-13 18:01:44 +02:00
explain some optimizations 2022-02-14 13:05:33 +02:00			Relatedly, because `getgrid(3)` does not need the group members, the group
			members are stored in a different DB sectoin, making the `Groups` section
			`smaller, thus more CPU-cache-friendly in the hot path.`
more nss docs 2022-02-14 10:55:49 +02:00
start with a full file structure 2022-02-13 18:01:44 +02:00			`Indices`
			`-------`
Let it be so. 2022-02-08 09:52:47 +02:00
explain some optimizations 2022-02-14 13:05:33 +02:00			Now that we've sketched the implementation of `id(3)`, it's clearer to
			`understand which operations need to be fast; in order of importance:`
Let it be so. 2022-02-08 09:52:47 +02:00
explain some optimizations 2022-02-14 13:05:33 +02:00			`1. lookup gid -> group info (this is on hot path in id) without members.`
Let it be so. 2022-02-08 09:52:47 +02:00			`2. lookup uid -> user.`
[readme] add file structure 2022-02-11 15:37:23 +02:00			`3. lookup groupname -> group.`
			`4. lookup username -> user.`
Add cmph test results 2022-02-11 13:31:54 +02:00
explain some optimizations 2022-02-14 13:05:33 +02:00			`These indices can use perfect hashing like [cmph][cmph]: a perfect hash hashes`
			`a list of bytes to a sequential list of integers. Perfect hashing algorithms`
			`require some space, and take some time to calculate ("hashing duration"). I've`
			`tested BDZ, which hashes [][]u8 to a sequential list of integers (not`
			preserving order) and CHM, preserves order. BDZ accepts an optional argument `3
			<= b <= 10`.
Add cmph test results 2022-02-11 13:31:54 +02:00
explain some optimizations 2022-02-14 13:05:33 +02:00			`* BDZ algorithm requires (b=3, 900KB, b=7, 338KB, b=10, 306KB) for 1M values.`
			`* Latency to resolve 1M keys: (170ms, 180ms, 230ms, respectively).`
Add cmph test results 2022-02-11 13:31:54 +02:00			`* Packed vs non-packed latency differences are not meaningful.`

more primitive types, start with File 2022-02-12 12:30:50 +02:00			`CHM retains order, however, 1M keys weigh 8MB. 10k keys are ~20x larger with`
explain some optimizations 2022-02-14 13:05:33 +02:00			`CHM than with BDZ, eliminating the benefit of preserved ordering: we can just`
			`have a separate index.`
Add cmph test results 2022-02-11 13:31:54 +02:00
start with a full file structure 2022-02-13 18:01:44 +02:00			`Turbonss header`
			`---------------`
[readme] add file structure 2022-02-11 15:37:23 +02:00
document global structure better 2022-02-13 10:42:40 +02:00			`The turbonss header looks like this:`

[readme] add file structure 2022-02-11 15:37:23 +02:00			```
document global structure better 2022-02-13 10:42:40 +02:00			`OFFSET TYPE NAME DESCRIPTION`
			`0 [4]u8 magic always 0xf09fa4b7`
			4 u8 version now `0`
start with a full file structure 2022-02-13 18:01:44 +02:00			`5 u16 bom 0x1234`
shellpop skeleton 2022-02-15 10:49:03 +02:00			`7 u6 num_shells max value: 63`
start with a full file structure 2022-02-13 18:01:44 +02:00			`8 u32 num_users number of passwd entries`
			`12 u32 num_groups number of group entries`
add missing headers 2022-02-14 13:55:54 +02:00			`16 u32 offset_cmph_uid2user`
			`20 u32 offset_cmph_groupname2group`
			`24 u32 offset_cmph_username2user`
			`28 u32 offset_idx offset to the first idx_ section`
			`32 u32 offset_groups`
			`36 u32 offset_users`
shellpop skeleton 2022-02-15 10:49:03 +02:00			`40 u32 offset_groupmembers`
			`44 u32 offset_additional_gids`
more primitive types, start with File 2022-02-12 12:30:50 +02:00			```

start with a full file structure 2022-02-13 18:01:44 +02:00			`magic` is 0xf09fa4b7, and `version` must be `0`. All integers are
			native-endian. `bom` is a byte-order-mark. It must resolve to `0x1234` (4460).
			`If that's not true, the file is consumed in a different endianness than it was`
			`created at. Turbonss files cannot be moved across different-endianness`
			`computers. If that happens, turbonss will refuse to read the file.`

document global structure better 2022-02-13 10:42:40 +02:00			`Offsets are indices to further sections of the file, with zero being the first`
start with a full file structure 2022-02-13 18:01:44 +02:00			block (pointing to the `magic` field). As all blobs are 64-byte aligned, the
			`offsets are always pointing to the beginning of an 64-byte "block". Therefore,`
			all `offset_*` values could be `u26`. As `u32` is easier to visualize with xxd,
			`and the header block fits to 64 bytes anyway, we are keeping them as u32 now.`
more primitive types, start with File 2022-02-12 12:30:50 +02:00
add missing headers 2022-02-14 13:55:54 +02:00			Sections whose lengths can be calculated do not have a corresponding `offset_*`
			header field. For example, `cmph_gid2group` comes immediately after the header,
			and `idx_groupname2group` comes after `idx_gid2group`, whose offset is
			`offset_idx`, and size can be calculated.

more nss docs 2022-02-14 10:55:49 +02:00			`Primitive types`
			`---------------`
more primitive types, start with File 2022-02-12 12:30:50 +02:00
explain some optimizations 2022-02-14 13:05:33 +02:00			`User` and `Group` entries are sorted by name, ordered by their unicode
			codepoints. They are byte-aligned (8bits). All `User` and `Group` entries are
			referred by their byte offset in the `Users` and `Groups` section relative to
			`the beginning of the section.`

more primitive types, start with File 2022-02-12 12:30:50 +02:00			```
			`const Group = struct {`
			`gid: u32,`
document global structure better 2022-02-13 10:42:40 +02:00			`// index to a separate structure with a list of members. The memberlist is`
more nss docs 2022-02-14 10:55:49 +02:00			`// always 2^5-byte aligned (32b), this is an index there.`
document global structure better 2022-02-13 10:42:40 +02:00			`members_offset: u27,`
			`groupname_len: u5,`
			`// a groupname_len-sized string`
			`groupname []u8;`
more primitive types, start with File 2022-02-12 12:30:50 +02:00			`}`

update user record 2022-02-12 10:13:10 +02:00			`const User = struct {`
			`uid: u32,`
			`gid: u32,`
more primitive types, start with File 2022-02-12 12:30:50 +02:00			`// pointer to a separate structure that contains a list of gids`
update user record 2022-02-12 10:13:10 +02:00			`additional_gids_offset: u29,`
add remaining offsets 2022-02-12 23:01:16 +02:00			`// shell is a different story, documented elsewhere.`
			`shell_here: u1,`
			`shell_len_or_place: u6,`
explain some optimizations 2022-02-14 13:05:33 +02:00			`homedir_len: u6,`
			`username_is_a_suffix: u1,`
			`username_offset_or_len: u5,`
update user record 2022-02-12 10:13:10 +02:00			`gecos_len: u8,`
			`// a variable-sized array that will be stored immediately after this`
			`// struct.`
			`stringdata []u8;`
[readme] add file structure 2022-02-11 15:37:23 +02:00			`}`
formatting 2022-02-12 10:14:37 +02:00			```
[readme] add file structure 2022-02-11 15:37:23 +02:00
explain some optimizations 2022-02-14 13:05:33 +02:00			`stringdata` contains a few string entries:
			`- homedir.`
			`- username.`
			`- gecos.`
			`- shell (optional).`

			First byte of the homedir is stored right after the `gecos_len` field, and it's
			length is `homedir_len`. The same logic applies to all the `stringdata` fields:
			`there is a way to calculate their relative position from the length of the`
			`fields before them.`

			`Additionally, two optimizations for special fields are made:`
			`- shells are often shared across different users, see the "Shells" section.`
			- username is frequently a suffix of the homedir. For example, `/home/motiejus`
			and `motiejus`. In which case storing both username and homedir strings is
			`wasteful. For that cases, username has two options:`
			1. `username_is_a_suffix=true`: username is a suffix of the home dir. In that
			case, the username starts at the `username_offset_or_len`'th byte of the
			`homedir, and ends at the same place as the homedir.`
			2. `username_is_a_suffix=false`: username is stored separately. In that case,
			`it begins one byte after homedir, and it's length is`
			`username_offset_or_len`.
more nss docs 2022-02-14 10:55:49 +02:00
			`Shells`
			`------`

			`Normally there is a limited number of shells even in the huge user databases. A`
			few examples: `/bin/bash`, `/usr/bin/nologin`, `/bin/zsh` among others.
			`Therefore, "shells" have an optimization: they can be pointed by in the`
			`external list, or reside among the user's data.`

shellpop skeleton 2022-02-15 10:49:03 +02:00			`63 most popular shells (i.e. referred to by at least two User entries) are`
			`stored externally in "Shells" area. The less popular ones are stored with`
more nss docs 2022-02-14 10:55:49 +02:00			`userdata.`

shellpop skeleton 2022-02-15 10:49:03 +02:00			`There are two "Shells" areas: the index and the blob. The index is a list of`
			`structs which point to a location in the "blob" area:`

			```
			`const ShellIndex = struct {`
			`offset: u10,`
			`len: u6,`
			`};`
			```

			In the user's struct the `shell_here=true` bit signifies that the shell is
			stored with userdata. `false` means it is stored in the `Shells` section. If
			the shell is stored "here", it is the first element in `stringdata`, and it's
			length is `shell_len_or_place`. If it is stored externally, the latter variable
			`points to it's index in the ShellIndex area.`
more nss docs 2022-02-14 10:55:49 +02:00
			`Shells in the external storage are sorted by their weight, which is`
			`length*frequency`.

explain some optimizations 2022-02-14 13:05:33 +02:00			`Variable-length integers (varints)`
			`----------------------------------`

			`Varint is an efficiently encoded integer (packed for small values). Same as`
			[protocol buffer varints][varint], except the largest possible value is `u64`.
			`They compress integers well.`

more nss docs 2022-02-14 10:55:49 +02:00			`groupmembers`, `additional_gids`
			`---------------------------------`

			`groupmembers` and `additional_gids` store group and user memberships
explain some optimizations 2022-02-14 13:05:33 +02:00			`respectively: for each group, a list of pointers (offsets) to User records, and`
			`for each user — a list of pointers to Group records. These fields are always`
			`used in their entirety — not necessitating random access, thus suitable for`
			`tight packing.`
more nss docs 2022-02-14 10:55:49 +02:00
			An entry of `groupmembers` and `additional_gids` looks like this piece of
			`pseudo-code:`

			```
			`const PackedList = struct {`
			`length: varint,`
			`members: []varint`
			`}`
			`const Groupmembers = PackedList;`
			`const AdditionalGids = PackedList;`
			```

explain some optimizations 2022-02-14 13:05:33 +02:00			An entry in `members` field points to the offset into a respective `User` or
			`Group` entry (number of bytes relative to the first entry of the type).
			`members` in `PackedList` is sorted by the name (`username` or `groupname`) of
			`the record it is pointing to.`

			`A packed list is a list of varints.`
more nss docs 2022-02-14 10:55:49 +02:00
start with a full file structure 2022-02-13 18:01:44 +02:00			`Complete file structure`
			`-----------------------`

use PackedIntArray 2022-02-14 14:55:54 +02:00			`idx_*` sections are of type `[]PackedIntArray(u29)` and are pointing to the
			respective `Groups` and `Users` entries (from the beginning of the respective
			`section). Since User and Group records are 8-byte aligned, 3 bits are saved`
			`from every element.`
add missing fields 2022-02-14 13:37:10 +02:00
			`Each section is padded to 64 bytes.`

start with a full file structure 2022-02-13 18:01:44 +02:00			```
add missing headers 2022-02-14 13:55:54 +02:00			`SECTION SIZE DESCRIPTION`
shellpop skeleton 2022-02-15 10:49:03 +02:00			`Header 48 see "Turbonss header" section`
add missing headers 2022-02-14 13:55:54 +02:00			`cmph_gid2group ? gid->group cmph`
			`cmph_uid2user ? uid->user cmph`
			`cmph_groupname2group ? groupname->group cmph`
			`cmph_username2user ? username->user cmph`
			`idx_gid2group len(group)429/32 cmph->offset gid2group`
			`idx_groupname2group len(group)429/32 cmph->offset groupname2group`
			`idx_uid2user len(user)429/32 cmph->offset uid2user`
			`idx_username2user len(user)429/32 cmph->offset username2user`
shellpop skeleton 2022-02-15 10:49:03 +02:00			`ShellIndex len(shells)*2 Shell index array`
			`ShellBlob <= 4032 Shell data blob (max 63*64 bytes)`
add missing headers 2022-02-14 13:55:54 +02:00			`Groups ? packed Group entries (8b padding)`
			`Users ? packed User entries (8b padding)`
			`groupmembers ? per-group memberlist (32b padding)`
			`additional_gids ? per-user grouplist (8b padding)`
start with a full file structure 2022-02-13 18:01:44 +02:00			```

[readme] add download/build instructions 2022-02-09 13:14:42 +02:00			`[git-subtrac]: https://github.com/apenwarr/git-subtrac/`
Add cmph test results 2022-02-11 13:31:54 +02:00			`[cmph]: http://cmph.sourceforge.net/`
more nss docs 2022-02-14 10:55:49 +02:00			`[id]: https://linux.die.net/man/1/id`
			`[nsswitch]: https://linux.die.net/man/5/nsswitch.conf`
			`[data-oriented-design]: https://media.handmade-seattle.com/practical-data-oriented-design/`
			`[getpwnam_r]: https://linux.die.net/man/3/getpwnam_r`
explain some optimizations 2022-02-14 13:05:33 +02:00			`[varint]: https://developers.google.com/protocol-buffers/docs/encoding#varints`