From f0d9d16cada32087281e6cd9eaeb7d1f8c05a810 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Motiejus=20Jak=C5=A1tys?= Date: Mon, 14 Feb 2022 10:55:49 +0200 Subject: [PATCH] more nss docs --- README.md | 150 +++++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 127 insertions(+), 23 deletions(-) diff --git a/README.md b/README.md index 1dc099a..ec9f4bd 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,60 @@ Turbo NSS --------- -Glibc nss library for passwd and group. +Turbonss is a plugin for GNU Name Service Switch (NSS) functionality of GNU C +Library (glibc). Turbonss implements lookup for `user` and `passwd` database +entries (i.e. system users, groups, and group memberships). It's main goal is +performance, with focus on making [`id(1)`][id] run as fast as possible. + +To understand more about name service switch, start with +[`nsswitch.conf(5)`](nsswitch). + +Design & constraints +-------------------- + +To be fast, the user/group database (later: DB) has to be small ([highly +recommended background viewing](data-oriented-design)). It encodes user & group +information in a way that minimizes the DB size, and reduces jumping across the +DB ("chasing pointers and polluting CPU cache"). + +For example, [`getpwnam_r(3)`](getpwnam_r) accepts a username and returns +the following user information: + +``` +struct passwd { + char *pw_name; /* username */ + char *pw_passwd; /* user password */ + uid_t pw_uid; /* user ID */ + gid_t pw_gid; /* group ID */ + char *pw_gecos; /* user information */ + char *pw_dir; /* home directory */ + char *pw_shell; /* shell program */ +}; +``` + +Turbonss, among others, implements this call, and takes the following steps to +resolve this: + +- Hash the username using a perfect hash function. Perfect hash function + returns a number between [0,N], where N is the total number of users. +- Jump to a known location in the DB (by pointer arithmetic) which links the + user's index to the user's information. That is an index to a different + location within the DB. +- Jump to the location which stores the full user information. +- Decode the user information (which is all in a continuous memory block) and + return it to the caller. + +In total, that's one hash for the username (~150ns), two pointer jumps within +the group file, and, now that the user record is found, `memcpy` for each +field. + +This tight packing places some constraints on the underlying data: + +- Maximum database size: 4GB. +- Maximum length of username and groupname: 32 bytes. +- Maximum length of shell and homedir: 64 bytes. +- Maximum comment ("gecos") length: 256 bytes. +- Username and groupname must be utf8-encoded. Checking out and building ------------------------- @@ -47,6 +100,10 @@ significant read amplification. Therefore, if `argv[0] == "id"`, `getgrid(3)` will return group without the members. This speeds up `id` by about 10x on a known NSS implementation. +Because `getgrid(3)` does not use the group members' information, the group +members are stored in a different location, making the `Groups` section +smaller, thus more CPU-cache-friendly. + Indices ------- @@ -85,8 +142,7 @@ OFFSET TYPE NAME DESCRIPTION 0 [4]u8 magic always 0xf09fa4b7 4 u8 version now `0` 5 u16 bom 0x1234 - 7 u2 padding - u6 num_shells see "SHELLS" section. + 7 u8 padding 8 u32 num_users number of passwd entries 12 u32 num_groups number of group entries 16 u32 offset_cmph_gid2group @@ -109,13 +165,14 @@ offsets are always pointing to the beginning of an 64-byte "block". Therefore, all `offset_*` values could be `u26`. As `u32` is easier to visualize with xxd, and the header block fits to 64 bytes anyway, we are keeping them as u32 now. -Primitive types: +Primitive types +--------------- ``` const Group = struct { gid: u32, // index to a separate structure with a list of members. The memberlist is - // always 2^5-byte aligned, this is an index there. + // always 2^5-byte aligned (32b), this is an index there. members_offset: u27, groupname_len: u5, // a groupname_len-sized string @@ -140,29 +197,76 @@ const User = struct { } ``` +`User` and `Group` entries are sorted by name, ordered by their unicode +codepoints. + +Shells +------ + +Normally there is a limited number of shells even in the huge user databases. A +few examples: `/bin/bash`, `/usr/bin/nologin`, `/bin/zsh` among others. +Therefore, "shells" have an optimization: they can be pointed by in the +external list, or reside among the user's data. + +64 (1>>6) most popular shells (i.e. referred to by at least two User entries) +are stored externally in "Shells" area. The less popular ones are stored with +userdata. + +The `shell_here=true` bit signifies that the shell is stored with userdata. +`false` means it is stored in the `Shells` section. If the shell is stored +"here", it is the first element in `stringdata`, and it's length is +`shell_len_or_place`. If it is stored externally, the latter variable points +to it's index in the external storage. + +Shells in the external storage are sorted by their weight, which is +`length*frequency`. + +`groupmembers`, `additional_gids` +--------------------------------- + +`groupmembers` and `additional_gids` store group and user memberships +respectively: for each group, a list of pointers ("offsets") to User records, +and for each user — a list of pointers to Group records. These fields are +always used in their entirety — making random-access not required, thus +suitable for tight packing. + +An entry of `groupmembers` and `additional_gids` looks like this piece of +pseudo-code: + +``` +const PackedList = struct { + length: varint, + members: []varint +} +const Groupmembers = PackedList; +const AdditionalGids = PackedList; +``` + +The single entry in `members` field points to an offset into a `User` or +`Group` entry (number of bytes relative to the first entry of the respective +type). The `members` field in `PackedList` is sorted by the name (`username` or +`groupname`) of the record it is pointing to. + Complete file structure ----------------------- ``` -OFFSET Section SIZE DESCRIPTION - 0<<6 Header 1<<6 documented above - *<<6 []Group num_groups * sizeof(Group) - *<<6 []User num_users * sizeof(User) - *<<6 []u32 num_groups * sizeof(u32) - *<<6 []u32 num_users * sizeof(u32) - *<<6 Shells unknown documented in "SHELLS" - *<<6 cmph_gid2group unknown offset by offset_cmph_gid2group - *<<6 cmph_uid2user unknown offset by offset_cmph_gid2group - *<<6 cmph_groupname2group unknown offset by offset_cmph_groupname2group - *<<6 cmph_username2user unknown offset by offset_cmph_username2user - *<<6 groupmembers unknown list of group members for each group - *<<6 additional_gids unknown list of gids (group membership) for each member + SECTION SIZE DESCRIPTION + Header 1<<6 documented above + []Group ? list of Group entries + []User ? list of User entries + Shells ? documented in "SHELLS" + cmph_gid2group ? offset by offset_cmph_gid2group + cmph_uid2user ? offset by offset_cmph_gid2group + cmph_groupname2group ? offset by offset_cmph_groupname2group + cmph_username2user ? offset by offset_cmph_username2user + groupmembers ? offset by offset_groupmembers + additional_gids ? offset by offset_additional_gids ``` -TODO explain: -- shells -- `additional_gids` -- `groupmembers` - [git-subtrac]: https://github.com/apenwarr/git-subtrac/ [cmph]: http://cmph.sourceforge.net/ +[id]: https://linux.die.net/man/1/id +[nsswitch]: https://linux.die.net/man/5/nsswitch.conf +[data-oriented-design]: https://media.handmade-seattle.com/practical-data-oriented-design/ +[getpwnam_r]: https://linux.die.net/man/3/getpwnam_r