1
Fork 0
turbonss/README.md

140 lines
3.8 KiB
Markdown

Turbo NSS
---------
glibc nss library for passwd and group.
Checking out and building
-------------------------
```
$ git clone --recursive https://git.sr.ht/~motiejus/turbonss
```
Alternatively, if you forgot `--recursive`:
```
$ git submodule update --init
```
And run tests:
```
$ zig build test
```
... the other commands will be documented as they are implemented.
This project uses [git subtrac][git-subtrac] for managing dependencies.
Steps
-----
A known implementation runs id(1) at ~250 rps sequentially. Our goal is 10k
ID/s.
id(1) works as follows:
- lookup user by name.
- get all additional gids (an array attached to a member).
- for each additional gid, get the group name.
Assuming a member is in ~100 groups on average, that's 1M group lookups per
second (cmph can do 1M in <200ms). We need to convert gid to a group index
quickly.
API
---
The following operations need to be fast, in order of importance:
1. lookup gid -> group (this is on hot path in id) with or without members (2
separate calls).
2. lookup uid -> user.
3. lookup groupname -> group.
4. lookup username -> user.
5. (optional) iterate users using a defined order (`getent passwd`).
6. (optional) iterate groups using a defined order (`getent group`).
Indices
-------
Preliminary results of playing with [cmph][cmph]:
BDZ: tried b=3, b=7 (default), and b=10.
* BDZ algorithm stores 1M values in (900KB, 338KB, 306KB) respectively.
* Latency for 1M keys: (170ms, 180ms, 230ms).
* Packed vs non-packed latency differences are not meaningful.
CHM retains order, however, 1M keys weigh 8MB. 10k keys are ~20x larger with
CHM than with BDZ, eliminating the benefit of preserved ordering.
Full file structure
-------------------
The turbonss header looks like this:
```
OFFSET TYPE NAME DESCRIPTION
0 [4]u8 magic always 0xf09fa4b7
4 u8 version now `0`
5 u2 padding
u6 num_shells see "SHELLS" section.
6 u32 num_users number of passwd entries
10 u32 num_groups number of group entries
14 u32 offset_cmph_gid2group
18 u32 offset_cmph_uid2user
22 u32 offset_cmph_groupname2group
26 u32 offset_cmph_username2user
30 u32 offset_sorted_groups
34 u32 offset_sorted_users
38 u32 offset_groupmembers
42 u32 offset_additional_gids
```
`magic` is 0xf09fa4b7, and `version` must be `0`. All integers are big-endian.
Offsets are indices to further sections of the file, with zero being the first
block (the magic number). As all blobs are 64-byte aligned, the offsets are
always pointing to the beginning of an 64-byte "block". Therefore, all
`offset_*` values could be `u26`. As `u32` is easier to visualize with xxd, and
the File block fits to 64 bytes anyway, we are keeping them as u32 now.
Primitive types:
```
const Group = struct {
gid: u32,
// index to a separate structure with a list of members. The memberlist is
// always 2^5-byte aligned, this is an index there.
members_offset: u27,
groupname_len: u5,
// a groupname_len-sized string
groupname []u8;
}
const User = struct {
uid: u32,
gid: u32,
// pointer to a separate structure that contains a list of gids
additional_gids_offset: u29,
// shell is a different story, documented elsewhere.
shell_here: u1,
shell_len_or_place: u6,
home_len: u6,
username_pos: u1,
username_len: u5,
gecos_len: u8,
// a variable-sized array that will be stored immediately after this
// struct.
stringdata []u8;
}
```
TODO explain:
- shells
- `additional_gids`
- `members`
[git-subtrac]: https://github.com/apenwarr/git-subtrac/
[cmph]: http://cmph.sourceforge.net/