Turbo NSS --------- glibc nss library for passwd and group. Checking out and building ------------------------- ``` $ git clone --recursive https://git.sr.ht/~motiejus/turbonss ``` Alternatively, if you forgot `--recursive`: ``` $ git submodule update --init ``` And run tests: ``` $ zig build test ``` ... the other commands will be documented as they are implemented. This project uses [git subtrac][git-subtrac] for managing dependencies. Steps ----- A known implementation runs id(1) at ~250 rps sequentially. Our goal is 10k ID/s. id(1) works as follows: - lookup user by name. - get all additional gids (an array attached to a member). - for each additional gid, get the group name. Assuming a member is in ~100 groups on average, that's 1M group lookups per second (cmph can do 1M in <200ms). We need to convert gid to a group index quickly. API --- The following operations need to be fast, in order of importance: 1. lookup gid -> group (this is on hot path in id) with or without members (2 separate calls). 2. lookup uid -> user. 3. lookup groupname -> group. 4. lookup username -> user. 5. (optional) iterate users using a defined order (`getent passwd`). 6. (optional) iterate groups using a defined order (`getent group`). Indices ------- Preliminary results of playing with [cmph][cmph]: BDZ: tried b=3, b=7 (default), and b=10. * BDZ algorithm stores 1M values in (900KB, 338KB, 306KB) respectively. * Latency for 1M keys: (170ms, 180ms, 230ms). * Packed vs non-packed latency differences are not meaningful. CHM retains order, however, 1M keys weigh 8MB. 10k keys are ~20x larger with CHM than with BDZ, eliminating the benefit of preserved ordering. Full file structure ------------------- The file structure stars with magic and version number, followed by a list of User, Group records and their indices. All indices are number of bytes, relative to the beginning of the file. ``` const File = struct { magic: [4]u8, version: u4, padding: u4, num_shells: u8, num_users: u32, num_groups: u32, offset_cmph_gid2group: u26, offset_cmph_uid2user: u26, offset_cmph_groupname2group: u26, offset_cmph_username2user: u26, offset_sorted_groups: u26, offset_sorted_users: u26, offset_groupmembers: u26, offset_additional_gids: u26, } ``` `magic` is 0xf09fa4b7, and `version` must be `0`. Offsets are indices to further sections of the file, with zero being the first block (the magic number). As all blobs are 64-byte aligned, the offsets are pointing to the beginning of the 64-byte "block" (thus u26). All numbers are little-endian. As of writing the file header is 40 bytes. Primitive types: ``` const Group = struct { gid: u32, // index to a separate structure with a list of members members_offset: u29, padding: u3, groupname_len: u8, // a variable-sized array that will be stored immediately after this // struct. stringdata []u8; } const User = struct { uid: u32, gid: u32, // pointer to a separate structure that contains a list of gids additional_gids_offset: u29, // shell is a different story, documented elsewhere. shell_here: u1, shell_len_or_place: u6, home_len: u6, username_len: u6, gecos_len: u8, // a variable-sized array that will be stored immediately after this // struct. stringdata []u8; } ``` TODO explain: - shells - `additional_gids` - `members` [git-subtrac]: https://github.com/apenwarr/git-subtrac/ [cmph]: http://cmph.sourceforge.net/