more nss docs

This commit is contained in:
Motiejus Jakštys 2022-02-14 10:55:49 +02:00 committed by Motiejus Jakštys
parent 1327587838
commit f0d9d16cad

150
README.md
View File

@ -1,7 +1,60 @@
Turbo NSS Turbo NSS
--------- ---------
Glibc nss library for passwd and group. Turbonss is a plugin for GNU Name Service Switch (NSS) functionality of GNU C
Library (glibc). Turbonss implements lookup for `user` and `passwd` database
entries (i.e. system users, groups, and group memberships). It's main goal is
performance, with focus on making [`id(1)`][id] run as fast as possible.
To understand more about name service switch, start with
[`nsswitch.conf(5)`](nsswitch).
Design & constraints
--------------------
To be fast, the user/group database (later: DB) has to be small ([highly
recommended background viewing](data-oriented-design)). It encodes user & group
information in a way that minimizes the DB size, and reduces jumping across the
DB ("chasing pointers and polluting CPU cache").
For example, [`getpwnam_r(3)`](getpwnam_r) accepts a username and returns
the following user information:
```
struct passwd {
char *pw_name; /* username */
char *pw_passwd; /* user password */
uid_t pw_uid; /* user ID */
gid_t pw_gid; /* group ID */
char *pw_gecos; /* user information */
char *pw_dir; /* home directory */
char *pw_shell; /* shell program */
};
```
Turbonss, among others, implements this call, and takes the following steps to
resolve this:
- Hash the username using a perfect hash function. Perfect hash function
returns a number between [0,N], where N is the total number of users.
- Jump to a known location in the DB (by pointer arithmetic) which links the
user's index to the user's information. That is an index to a different
location within the DB.
- Jump to the location which stores the full user information.
- Decode the user information (which is all in a continuous memory block) and
return it to the caller.
In total, that's one hash for the username (~150ns), two pointer jumps within
the group file, and, now that the user record is found, `memcpy` for each
field.
This tight packing places some constraints on the underlying data:
- Maximum database size: 4GB.
- Maximum length of username and groupname: 32 bytes.
- Maximum length of shell and homedir: 64 bytes.
- Maximum comment ("gecos") length: 256 bytes.
- Username and groupname must be utf8-encoded.
Checking out and building Checking out and building
------------------------- -------------------------
@ -47,6 +100,10 @@ significant read amplification. Therefore, if `argv[0] == "id"`, `getgrid(3)`
will return group without the members. This speeds up `id` by about 10x on a will return group without the members. This speeds up `id` by about 10x on a
known NSS implementation. known NSS implementation.
Because `getgrid(3)` does not use the group members' information, the group
members are stored in a different location, making the `Groups` section
smaller, thus more CPU-cache-friendly.
Indices Indices
------- -------
@ -85,8 +142,7 @@ OFFSET TYPE NAME DESCRIPTION
0 [4]u8 magic always 0xf09fa4b7 0 [4]u8 magic always 0xf09fa4b7
4 u8 version now `0` 4 u8 version now `0`
5 u16 bom 0x1234 5 u16 bom 0x1234
7 u2 padding 7 u8 padding
u6 num_shells see "SHELLS" section.
8 u32 num_users number of passwd entries 8 u32 num_users number of passwd entries
12 u32 num_groups number of group entries 12 u32 num_groups number of group entries
16 u32 offset_cmph_gid2group 16 u32 offset_cmph_gid2group
@ -109,13 +165,14 @@ offsets are always pointing to the beginning of an 64-byte "block". Therefore,
all `offset_*` values could be `u26`. As `u32` is easier to visualize with xxd, all `offset_*` values could be `u26`. As `u32` is easier to visualize with xxd,
and the header block fits to 64 bytes anyway, we are keeping them as u32 now. and the header block fits to 64 bytes anyway, we are keeping them as u32 now.
Primitive types: Primitive types
---------------
``` ```
const Group = struct { const Group = struct {
gid: u32, gid: u32,
// index to a separate structure with a list of members. The memberlist is // index to a separate structure with a list of members. The memberlist is
// always 2^5-byte aligned, this is an index there. // always 2^5-byte aligned (32b), this is an index there.
members_offset: u27, members_offset: u27,
groupname_len: u5, groupname_len: u5,
// a groupname_len-sized string // a groupname_len-sized string
@ -140,29 +197,76 @@ const User = struct {
} }
``` ```
`User` and `Group` entries are sorted by name, ordered by their unicode
codepoints.
Shells
------
Normally there is a limited number of shells even in the huge user databases. A
few examples: `/bin/bash`, `/usr/bin/nologin`, `/bin/zsh` among others.
Therefore, "shells" have an optimization: they can be pointed by in the
external list, or reside among the user's data.
64 (1>>6) most popular shells (i.e. referred to by at least two User entries)
are stored externally in "Shells" area. The less popular ones are stored with
userdata.
The `shell_here=true` bit signifies that the shell is stored with userdata.
`false` means it is stored in the `Shells` section. If the shell is stored
"here", it is the first element in `stringdata`, and it's length is
`shell_len_or_place`. If it is stored externally, the latter variable points
to it's index in the external storage.
Shells in the external storage are sorted by their weight, which is
`length*frequency`.
`groupmembers`, `additional_gids`
---------------------------------
`groupmembers` and `additional_gids` store group and user memberships
respectively: for each group, a list of pointers ("offsets") to User records,
and for each user — a list of pointers to Group records. These fields are
always used in their entirety — making random-access not required, thus
suitable for tight packing.
An entry of `groupmembers` and `additional_gids` looks like this piece of
pseudo-code:
```
const PackedList = struct {
length: varint,
members: []varint
}
const Groupmembers = PackedList;
const AdditionalGids = PackedList;
```
The single entry in `members` field points to an offset into a `User` or
`Group` entry (number of bytes relative to the first entry of the respective
type). The `members` field in `PackedList` is sorted by the name (`username` or
`groupname`) of the record it is pointing to.
Complete file structure Complete file structure
----------------------- -----------------------
``` ```
OFFSET Section SIZE DESCRIPTION SECTION SIZE DESCRIPTION
0<<6 Header 1<<6 documented above Header 1<<6 documented above
*<<6 []Group num_groups * sizeof(Group) []Group ? list of Group entries
*<<6 []User num_users * sizeof(User) []User ? list of User entries
*<<6 []u32 num_groups * sizeof(u32) Shells ? documented in "SHELLS"
*<<6 []u32 num_users * sizeof(u32) cmph_gid2group ? offset by offset_cmph_gid2group
*<<6 Shells unknown documented in "SHELLS" cmph_uid2user ? offset by offset_cmph_gid2group
*<<6 cmph_gid2group unknown offset by offset_cmph_gid2group cmph_groupname2group ? offset by offset_cmph_groupname2group
*<<6 cmph_uid2user unknown offset by offset_cmph_gid2group cmph_username2user ? offset by offset_cmph_username2user
*<<6 cmph_groupname2group unknown offset by offset_cmph_groupname2group groupmembers ? offset by offset_groupmembers
*<<6 cmph_username2user unknown offset by offset_cmph_username2user additional_gids ? offset by offset_additional_gids
*<<6 groupmembers unknown list of group members for each group
*<<6 additional_gids unknown list of gids (group membership) for each member
``` ```
TODO explain:
- shells
- `additional_gids`
- `groupmembers`
[git-subtrac]: https://github.com/apenwarr/git-subtrac/ [git-subtrac]: https://github.com/apenwarr/git-subtrac/
[cmph]: http://cmph.sourceforge.net/ [cmph]: http://cmph.sourceforge.net/
[id]: https://linux.die.net/man/1/id
[nsswitch]: https://linux.die.net/man/5/nsswitch.conf
[data-oriented-design]: https://media.handmade-seattle.com/practical-data-oriented-design/
[getpwnam_r]: https://linux.die.net/man/3/getpwnam_r