From d422cdf61bfa335925b1e28d41d49a57f81854ad Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Motiejus=20Jak=C5=A1tys?= Date: Mon, 14 Feb 2022 13:37:10 +0200 Subject: [PATCH] add missing fields --- README.md | 60 ++++++++++++++++++++++++++++++++++--------------------- 1 file changed, 37 insertions(+), 23 deletions(-) diff --git a/README.md b/README.md index e55b0fe..d153fce 100644 --- a/README.md +++ b/README.md @@ -12,13 +12,14 @@ To understand more about name service switch, start with Design & constraints -------------------- -To be fast, the user/group database (later: DB) has to be small ([highly -recommended background viewing][data-oriented-design]). It encodes user & group -information in a way that minimizes the DB size, and reduces jumping across the -DB ("chasing pointers and thrashing CPU cache"). +To be fast, the user/group database (later: DB) has to be small +([background][data-oriented-design]). It encodes user & group information in a +way that minimizes the DB size, and reduces jumping across the DB ("chasing +pointers and thrashing CPU cache"). -For example, [`getpwnam_r(3)`][getpwnam_r] accepts a username and returns -the following user information: +To understand how this is done efficiently, let's analyze the +[`getpwnam_r(3)`][getpwnam_r] in high level. This API call accepts a username +and returns the following user information: ``` struct passwd { @@ -35,18 +36,21 @@ struct passwd { Turbonss, among others, implements this call, and takes the following steps to resolve a username to a `struct passwd*`: +- Open the DB (using `mmap`) and interpret it's first 40 bytes as a `struct + Header`. The header stores offsets to the sections of the file. This needs to + be done once, when the NSS library is loaded (or on the first call). - Hash the username using a perfect hash function. Perfect hash function returns a number `n ∈ [0,N-1]`, where N is the total number of users. -- Jump to the `n`'th location in the DB (by pointer arithmetic) which contains - the index `i` to the user's information. -- Jump to the location `i` (pointer arithmetic) which stores the full user - information. +- Jump to the `n`'th location in the `idx_username2user` section (by pointer + arithmetic), which contains the index `i` to the user's information. +- Jump to the location `i` of section `Users` (again, using pointer arithmetic) + which stores the full user information. - Decode the user information (which is all in a continuous memory block) and return it to the caller. In total, that's one hash for the username (~150ns), two pointer jumps within -the group file, and, now that the user record is found, `memcpy` for each -field. +the group file (to sections `idx_username2user` and `Users`), and, now that the +user record is found, `memcpy` for each field. The turbonss DB file is be `mmap`-ed, making it simple to implement pointer arithmetic and jumping across the file. This also reduces memory usage, @@ -288,18 +292,28 @@ A packed list is a list of varints. Complete file structure ----------------------- +`idx_*` entries are of type `[]u29` and are pointing to the respective `Groups` +and `Users` entries (from the beginning of the respective section). Since +entries are 8-byte aligned, 3 bits are saved from every element. + +Each section is padded to 64 bytes. + ``` - SECTION SIZE DESCRIPTION - Header 1<<6 documented above - []Group ? list of Group entries - []User ? list of User entries - Shells ? documented in "SHELLS" - cmph_gid2group ? offset by offset_cmph_gid2group - cmph_uid2user ? offset by offset_cmph_gid2group - cmph_groupname2group ? offset by offset_cmph_groupname2group - cmph_username2user ? offset by offset_cmph_username2user - groupmembers ? offset by offset_groupmembers - additional_gids ? offset by offset_additional_gids +SECTION SIZE DESCRIPTION +Header 40 see "Turbonss header" section +idx_gid2group len(group)*4*29/32 list of gid2group indices +idx_groupname2group len(group)*4*29/32 list of groupname2group indices +idx_uid2user len(user)*4*29/32 list of uid2user indices +idx_username2user len(user)*4*29/32 list of username2user indices +Groups ? list of Group entries +Users ? list of User entries +Shells ? See "Shells" section +cmph_gid2group ? offset by offset_cmph_gid2group +cmph_uid2user ? offset by offset_cmph_uid2user +cmph_groupname2group ? offset by offset_cmph_groupname2group +cmph_username2user ? offset by offset_cmph_username2user +groupmembers ? offset by offset_groupmembers +additional_gids ? offset by offset_additional_gids ``` [git-subtrac]: https://github.com/apenwarr/git-subtrac/