more nss docs

2022-02-14 10:55:49 +02:00 · 2022-02-14 10:55:49 +02:00 · f0d9d16cad
commit f0d9d16cad
parent 1327587838
1 changed files with 127 additions and 23 deletions
--- a/README.md
+++ b/README.md
@ -1,7 +1,60 @@
 Turbo NSS
 ---------

-Glibc nss library for passwd and group.
+Turbonss is a plugin for GNU Name Service Switch (NSS) functionality of GNU C
+Library (glibc). Turbonss implements lookup for `user` and `passwd` database
+entries (i.e. system users, groups, and group memberships). It's main goal is
+performance, with focus on making [`id(1)`][id] run as fast as possible.
+
+To understand more about name service switch, start with
+[`nsswitch.conf(5)`](nsswitch).
+
+Design & constraints
+--------------------
+
+To be fast, the user/group database (later: DB) has to be small ([highly
+recommended background viewing](data-oriented-design)). It encodes user & group
+information in a way that minimizes the DB size, and reduces jumping across the
+DB ("chasing pointers and polluting CPU cache").
+
+For example, [`getpwnam_r(3)`](getpwnam_r) accepts a username and returns
+the following user information:
+
+```
+struct passwd {
+   char   *pw_name;       /* username */
+   char   *pw_passwd;     /* user password */
+   uid_t   pw_uid;        /* user ID */
+   gid_t   pw_gid;        /* group ID */
+   char   *pw_gecos;      /* user information */
+   char   *pw_dir;        /* home directory */
+   char   *pw_shell;      /* shell program */
+};
+```
+
+Turbonss, among others, implements this call, and takes the following steps to
+resolve this:
+
+- Hash the username using a perfect hash function. Perfect hash function
+  returns a number between [0,N], where N is the total number of users.
+- Jump to a known location in the DB (by pointer arithmetic) which links the
+  user's index to the user's information. That is an index to a different
+  location within the DB.
+- Jump to the location which stores the full user information.
+- Decode the user information (which is all in a continuous memory block) and
+  return it to the caller.
+
+In total, that's one hash for the username (~150ns), two pointer jumps within
+the group file, and, now that the user record is found, `memcpy` for each
+field.
+
+This tight packing places some constraints on the underlying data:
+
+- Maximum database size: 4GB.
+- Maximum length of username and groupname: 32 bytes.
+- Maximum length of shell and homedir: 64 bytes.
+- Maximum comment ("gecos") length: 256 bytes.
+- Username and groupname must be utf8-encoded.

 Checking out and building
 -------------------------
@ -47,6 +100,10 @@ significant read amplification. Therefore, if `argv[0] == "id"`, `getgrid(3)`
 will return group without the members. This speeds up `id` by about 10x on a
 known NSS implementation.

+Because `getgrid(3)` does not use the group members' information, the group
+members are stored in a different location, making the `Groups` section
+smaller, thus more CPU-cache-friendly.
+
 Indices
 -------

@ -85,8 +142,7 @@ OFFSET     TYPE     NAME                          DESCRIPTION
   0      [4]u8     magic                         always 0xf09fa4b7
   4         u8     version                       now `0`
   5        u16     bom                           0x1234
-   7         u2     padding
-             u6     num_shells                    see "SHELLS" section.
+   7         u8     padding
   8        u32     num_users                     number of passwd entries
  12        u32     num_groups                    number of group entries
  16        u32     offset_cmph_gid2group
@ -109,13 +165,14 @@ offsets are always pointing to the beginning of an 64-byte "block". Therefore,
 all `offset_*` values could be `u26`. As `u32` is easier to visualize with xxd,
 and the header block fits to 64 bytes anyway, we are keeping them as u32 now.

-Primitive types:
+Primitive types
+---------------

 ```
 const Group = struct {
    gid: u32,
    // index to a separate structure with a list of members. The memberlist is
-    // always 2^5-byte aligned, this is an index there.
+    // always 2^5-byte aligned (32b), this is an index there.
    members_offset: u27,
    groupname_len: u5,
    // a groupname_len-sized string
@ -140,29 +197,76 @@ const User = struct {
 }
 ```

+`User` and `Group` entries are sorted by name, ordered by their unicode
+codepoints.
+
+Shells
+------
+
+Normally there is a limited number of shells even in the huge user databases. A
+few examples: `/bin/bash`, `/usr/bin/nologin`, `/bin/zsh` among others.
+Therefore, "shells" have an optimization: they can be pointed by in the
+external list, or reside among the user's data.
+
+64 (1>>6) most popular shells (i.e. referred to by at least two User entries)
+are stored externally in "Shells" area. The less popular ones are stored with
+userdata.
+
+The `shell_here=true` bit signifies that the shell is stored with userdata.
+`false` means it is stored in the `Shells` section. If the shell is stored
+"here", it is the first element in `stringdata`, and it's length is
+`shell_len_or_place`. If it is stored externally, the latter variable points
+to it's index in the external storage.
+
+Shells in the external storage are sorted by their weight, which is
+`length*frequency`.
+
+`groupmembers`, `additional_gids`
+---------------------------------
+
+`groupmembers` and `additional_gids` store group and user memberships
+respectively: for each group, a list of pointers ("offsets") to User records,
+and for each user — a list of pointers to Group records. These fields are
+always used in their entirety — making random-access not required, thus
+suitable for tight packing.
+
+An entry of `groupmembers` and `additional_gids` looks like this piece of
+pseudo-code:
+
+```
+const PackedList = struct {
+    length: varint,
+    members: []varint
+}
+const Groupmembers = PackedList;
+const AdditionalGids = PackedList;
+```
+
+The single entry in `members` field points to an offset into a `User` or
+`Group` entry (number of bytes relative to the first entry of the respective
+type). The `members` field in `PackedList` is sorted by the name (`username` or
+`groupname`) of the record it is pointing to.
+
 Complete file structure
 -----------------------

 ```
-OFFSET    Section              SIZE                         DESCRIPTION
-  0<<6    Header               1<<6                         documented above
-  *<<6    []Group              num_groups * sizeof(Group)
-  *<<6    []User               num_users * sizeof(User)
-  *<<6    []u32                num_groups * sizeof(u32)
-  *<<6    []u32                num_users * sizeof(u32)
-  *<<6    Shells               unknown                      documented in "SHELLS"
-  *<<6    cmph_gid2group       unknown                      offset by offset_cmph_gid2group
-  *<<6    cmph_uid2user        unknown                      offset by offset_cmph_gid2group
-  *<<6    cmph_groupname2group unknown                      offset by offset_cmph_groupname2group
-  *<<6    cmph_username2user   unknown                      offset by offset_cmph_username2user
-  *<<6    groupmembers         unknown                      list of group members for each group
-  *<<6    additional_gids      unknown                      list of gids (group membership) for each member
+  SECTION              SIZE                         DESCRIPTION
+  Header               1<<6                         documented above
+  []Group                 ?                         list of Group entries
+  []User                  ?                         list of User entries
+  Shells                  ?                         documented in "SHELLS"
+  cmph_gid2group          ?                         offset by offset_cmph_gid2group
+  cmph_uid2user           ?                         offset by offset_cmph_gid2group
+  cmph_groupname2group    ?                         offset by offset_cmph_groupname2group
+  cmph_username2user      ?                         offset by offset_cmph_username2user
+  groupmembers            ?                         offset by offset_groupmembers
+  additional_gids         ?                         offset by offset_additional_gids
 ```

-TODO explain:
- shells
- `additional_gids`
- `groupmembers`
-
 [git-subtrac]: https://github.com/apenwarr/git-subtrac/
 [cmph]: http://cmph.sourceforge.net/
+[id]: https://linux.die.net/man/1/id
+[nsswitch]: https://linux.die.net/man/5/nsswitch.conf
+[data-oriented-design]: https://media.handmade-seattle.com/practical-data-oriented-design/
+[getpwnam_r]: https://linux.die.net/man/3/getpwnam_r