From d422cdf61bfa335925b1e28d41d49a57f81854ad Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Motiejus=20Jak=C5=A1tys?= <motiejus@jakstys.lt>
Date: Mon, 14 Feb 2022 13:37:10 +0200
Subject: [PATCH] add missing fields

---
 README.md | 60 ++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 37 insertions(+), 23 deletions(-)

diff --git a/README.md b/README.md
index e55b0fe..d153fce 100644
--- a/README.md
+++ b/README.md
@@ -12,13 +12,14 @@ To understand more about name service switch, start with
 Design & constraints
 --------------------
 
-To be fast, the user/group database (later: DB) has to be small ([highly
-recommended background viewing][data-oriented-design]). It encodes user & group
-information in a way that minimizes the DB size, and reduces jumping across the
-DB ("chasing pointers and thrashing CPU cache").
+To be fast, the user/group database (later: DB) has to be small
+([background][data-oriented-design]). It encodes user & group information in a
+way that minimizes the DB size, and reduces jumping across the DB ("chasing
+pointers and thrashing CPU cache").
 
-For example, [`getpwnam_r(3)`][getpwnam_r] accepts a username and returns
-the following user information:
+To understand how this is done efficiently, let's analyze the
+[`getpwnam_r(3)`][getpwnam_r] in high level. This API call accepts a username
+and returns the following user information:
 
 ```
 struct passwd {
@@ -35,18 +36,21 @@ struct passwd {
 Turbonss, among others, implements this call, and takes the following steps to
 resolve a username to a `struct passwd*`:
 
+- Open the DB (using `mmap`) and interpret it's first 40 bytes as a `struct
+  Header`. The header stores offsets to the sections of the file. This needs to
+  be done once, when the NSS library is loaded (or on the first call).
 - Hash the username using a perfect hash function. Perfect hash function
   returns a number `n ∈ [0,N-1]`, where N is the total number of users.
-- Jump to the `n`'th location in the DB (by pointer arithmetic) which contains
-  the index `i` to the user's information.
-- Jump to the location `i` (pointer arithmetic) which stores the full user
-  information.
+- Jump to the `n`'th location in the `idx_username2user` section (by pointer
+  arithmetic), which contains the index `i` to the user's information.
+- Jump to the location `i` of section `Users` (again, using pointer arithmetic)
+  which stores the full user information.
 - Decode the user information (which is all in a continuous memory block) and
   return it to the caller.
 
 In total, that's one hash for the username (~150ns), two pointer jumps within
-the group file, and, now that the user record is found, `memcpy` for each
-field.
+the group file (to sections `idx_username2user` and `Users`), and, now that the
+user record is found, `memcpy` for each field.
 
 The turbonss DB file is be `mmap`-ed, making it simple to implement pointer
 arithmetic and jumping across the file. This also reduces memory usage,
@@ -288,18 +292,28 @@ A packed list is a list of varints.
 Complete file structure
 -----------------------
 
+`idx_*` entries are of type `[]u29` and are pointing to the respective `Groups`
+and `Users` entries (from the beginning of the respective section). Since
+entries are 8-byte aligned, 3 bits are saved from every element.
+
+Each section is padded to 64 bytes.
+
 ```
-  SECTION              SIZE                         DESCRIPTION
-  Header               1<<6                         documented above
-  []Group                 ?                         list of Group entries
-  []User                  ?                         list of User entries
-  Shells                  ?                         documented in "SHELLS"
-  cmph_gid2group          ?                         offset by offset_cmph_gid2group
-  cmph_uid2user           ?                         offset by offset_cmph_gid2group
-  cmph_groupname2group    ?                         offset by offset_cmph_groupname2group
-  cmph_username2user      ?                         offset by offset_cmph_username2user
-  groupmembers            ?                         offset by offset_groupmembers
-  additional_gids         ?                         offset by offset_additional_gids
+SECTION                            SIZE   DESCRIPTION
+Header                               40   see "Turbonss header" section
+idx_gid2group        len(group)*4*29/32   list of gid2group indices
+idx_groupname2group  len(group)*4*29/32   list of groupname2group indices
+idx_uid2user          len(user)*4*29/32   list of uid2user indices
+idx_username2user     len(user)*4*29/32   list of username2user indices
+Groups                                ?   list of Group entries
+Users                                 ?   list of User entries
+Shells                                ?   See "Shells" section
+cmph_gid2group                        ?   offset by offset_cmph_gid2group
+cmph_uid2user                         ?   offset by offset_cmph_uid2user
+cmph_groupname2group                  ?   offset by offset_cmph_groupname2group
+cmph_username2user                    ?   offset by offset_cmph_username2user
+groupmembers                          ?   offset by offset_groupmembers
+additional_gids                       ?   offset by offset_additional_gids
 ```
 
 [git-subtrac]: https://github.com/apenwarr/git-subtrac/