1
Fork 0
turbonss/README.md

169 lines
5.9 KiB
Markdown
Raw Normal View History

2022-02-08 09:52:47 +02:00
Turbo NSS
---------
2022-02-13 18:01:44 +02:00
Glibc nss library for passwd and group.
2022-02-08 09:52:47 +02:00
Checking out and building
-------------------------
```
$ git clone --recursive https://git.sr.ht/~motiejus/turbonss
```
Alternatively, if you forgot `--recursive`:
```
$ git submodule update --init
```
And run tests:
```
$ zig build test
```
2022-02-13 18:01:44 +02:00
Other commands will be documented as they are implemented.
This project uses [git subtrac][git-subtrac] for managing dependencies.
2022-02-13 18:01:44 +02:00
remarks on `id(1)`
------------------
2022-02-08 09:52:47 +02:00
2022-02-13 18:01:44 +02:00
A known implementation runs id(1) at ~250 rps sequentially on ~20k users and
~10k groups. Our target is 10k id/s.
2022-02-08 09:52:47 +02:00
2022-02-13 18:01:44 +02:00
`id(1)` works as follows:
2022-02-08 09:52:47 +02:00
- lookup user by name.
- get all additional gids (an array attached to a member).
2022-02-12 10:13:10 +02:00
- for each additional gid, get the group name.
2022-02-08 09:52:47 +02:00
Assuming a member is in ~100 groups on average, that's 1M group lookups per
2022-02-13 18:01:44 +02:00
second. We need to convert gid to a group index, and group index to a group
gid/name quickly.
2022-02-08 09:52:47 +02:00
2022-02-13 18:01:44 +02:00
Caveat: `struct group` contains an array of pointers to names of group members
(`char **gr_mem`). However, `id` does not use that information, resulting in a
significant read amplification. Therefore, if `argv[0] == "id"`, `getgrid(3)`
will return group without the members. This speeds up `id` by about 10x on a
known NSS implementation.
Indices
-------
2022-02-08 09:52:47 +02:00
The following operations need to be fast, in order of importance:
2022-02-12 10:13:10 +02:00
1. lookup gid -> group (this is on hot path in id) with or without members (2
separate calls).
2022-02-08 09:52:47 +02:00
2. lookup uid -> user.
2022-02-11 15:37:23 +02:00
3. lookup groupname -> group.
4. lookup username -> user.
2022-02-12 12:30:50 +02:00
5. (optional) iterate users using a defined order (`getent passwd`).
6. (optional) iterate groups using a defined order (`getent group`).
2022-02-12 10:13:10 +02:00
2022-02-13 18:01:44 +02:00
First 4 can use perfect hashing like [cmph][cmph]: it hashes a list of bytes to
a sequential list of integers. Perfect hashing algorithms require some space,
and take some time to calculate ("hashing duration"). I've tested BDZ, which
hashes [][]u8 to a sequential list of integers (not preserving order) and CHM, which
does the same, but preserves order. BDZ accepts an argument 3 <= b <= 10.
2022-02-11 13:31:54 +02:00
BDZ: tried b=3, b=7 (default), and b=10.
2022-02-13 18:01:44 +02:00
* BDZ algorithm requires (900KB, 338KB, 306KB, respectively) for 1M values.
* Latency to resolve 1M keys: (170ms, 180ms, 230ms).
2022-02-11 13:31:54 +02:00
* Packed vs non-packed latency differences are not meaningful.
2022-02-12 12:30:50 +02:00
CHM retains order, however, 1M keys weigh 8MB. 10k keys are ~20x larger with
2022-02-11 13:31:54 +02:00
CHM than with BDZ, eliminating the benefit of preserved ordering.
2022-02-13 18:01:44 +02:00
Turbonss header
---------------
2022-02-11 15:37:23 +02:00
2022-02-13 10:42:40 +02:00
The turbonss header looks like this:
2022-02-11 15:37:23 +02:00
```
2022-02-13 10:42:40 +02:00
OFFSET TYPE NAME DESCRIPTION
0 [4]u8 magic always 0xf09fa4b7
4 u8 version now `0`
2022-02-13 18:01:44 +02:00
5 u16 bom 0x1234
7 u2 padding
2022-02-13 10:42:40 +02:00
u6 num_shells see "SHELLS" section.
2022-02-13 18:01:44 +02:00
8 u32 num_users number of passwd entries
12 u32 num_groups number of group entries
16 u32 offset_cmph_gid2group
20 u32 offset_cmph_uid2user
24 u32 offset_cmph_groupname2group
28 u32 offset_cmph_username2user
32 u32 offset_groupmembers
36 u32 offset_additional_gids
2022-02-12 12:30:50 +02:00
```
2022-02-13 18:01:44 +02:00
`magic` is 0xf09fa4b7, and `version` must be `0`. All integers are
native-endian. `bom` is a byte-order-mark. It must resolve to `0x1234` (4460).
If that's not true, the file is consumed in a different endianness than it was
created at. Turbonss files cannot be moved across different-endianness
computers. If that happens, turbonss will refuse to read the file.
2022-02-13 10:42:40 +02:00
Offsets are indices to further sections of the file, with zero being the first
2022-02-13 18:01:44 +02:00
block (pointing to the `magic` field). As all blobs are 64-byte aligned, the
offsets are always pointing to the beginning of an 64-byte "block". Therefore,
all `offset_*` values could be `u26`. As `u32` is easier to visualize with xxd,
and the header block fits to 64 bytes anyway, we are keeping them as u32 now.
2022-02-12 12:30:50 +02:00
Primitive types:
```
const Group = struct {
gid: u32,
2022-02-13 10:42:40 +02:00
// index to a separate structure with a list of members. The memberlist is
// always 2^5-byte aligned, this is an index there.
members_offset: u27,
groupname_len: u5,
// a groupname_len-sized string
groupname []u8;
2022-02-12 12:30:50 +02:00
}
2022-02-12 10:13:10 +02:00
const User = struct {
uid: u32,
gid: u32,
2022-02-12 12:30:50 +02:00
// pointer to a separate structure that contains a list of gids
2022-02-12 10:13:10 +02:00
additional_gids_offset: u29,
2022-02-12 23:01:16 +02:00
// shell is a different story, documented elsewhere.
shell_here: u1,
shell_len_or_place: u6,
2022-02-12 10:13:10 +02:00
home_len: u6,
2022-02-13 10:42:40 +02:00
username_pos: u1,
username_len: u5,
2022-02-12 10:13:10 +02:00
gecos_len: u8,
// a variable-sized array that will be stored immediately after this
// struct.
stringdata []u8;
2022-02-11 15:37:23 +02:00
}
2022-02-12 10:14:37 +02:00
```
2022-02-11 15:37:23 +02:00
2022-02-13 18:01:44 +02:00
Complete file structure
-----------------------
```
OFFSET Section SIZE DESCRIPTION
0<<6 Header 1<<6 documented above
*<<6 []Group num_groups * sizeof(Group)
*<<6 []User num_users * sizeof(User)
*<<6 []u32 num_groups * sizeof(u32)
*<<6 []u32 num_users * sizeof(u32)
*<<6 Shells unknown documented in "SHELLS"
*<<6 cmph_gid2group unknown offset by offset_cmph_gid2group
*<<6 cmph_uid2user unknown offset by offset_cmph_gid2group
*<<6 cmph_groupname2group unknown offset by offset_cmph_groupname2group
*<<6 cmph_username2user unknown offset by offset_cmph_username2user
*<<6 groupmembers unknown list of group members for each group
*<<6 additional_gids unknown list of gids (group membership) for each member
```
2022-02-12 23:01:16 +02:00
TODO explain:
- shells
- `additional_gids`
2022-02-13 18:01:44 +02:00
- `groupmembers`
2022-02-12 23:01:16 +02:00
[git-subtrac]: https://github.com/apenwarr/git-subtrac/
2022-02-11 13:31:54 +02:00
[cmph]: http://cmph.sourceforge.net/