From 85552c13027ce3451534caf9bb6230be346154be Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Motiejus=20Jak=C5=A1tys?= Date: Thu, 17 Mar 2022 06:25:47 +0100 Subject: [PATCH] make section names more consistent --- README.md | 80 ++++++++++++++++++++++++++----------------------------- 1 file changed, 38 insertions(+), 42 deletions(-) diff --git a/README.md b/README.md index e1965e5..1aa00e0 100644 --- a/README.md +++ b/README.md @@ -150,7 +150,7 @@ OFFSET TYPE NAME DESCRIPTION 32 u64 nblocks_groups 40 u64 nblocks_users 48 u64 nblocks_groupmembers - 56 u64 nblocks_usergids + 56 u64 nblocks_additional_gids ``` `magic` is 0xf09fa4b7, and `version` must be `0`. All integers are @@ -174,10 +174,15 @@ the beginning of the section. const PackedGroup = packed struct { gid: u32, groupname_len: u8, // max is 32, but have too much space here. - // varint members_offset + (groupname_len-1)-length string - groupdata []u8; } +``` +PackedGroup is followed by the group name (of length `groupname_len`), followed +by a varint-compressed offset to the groupmembers section, followed by 8b padding. + +PackedUser is a bit more involved: + +``` pub const PackedUser = packed struct { uid: u32, gid: u32, @@ -188,28 +193,25 @@ pub const PackedUser = packed struct { home_len: u6, name_len: u5, gecos_len: u11, - // pseudocode: variable-sized array that will be stored immediately after - // this struct. - userdata []u8; } ``` -`userdata` contains a few entries: +... followed by `userdata: []u8`: - home. - name (optional). - gecos. - shell (optional). - `additional_gids_offset`: varint. -First byte of home is stored right after the `gecos_len` field, and it's -length is `home_len`. The same logic applies to all the `stringdata` fields: -there is a way to calculate their relative position from the length of the -fields before them. +First byte of home is stored right after the `gecos_len` field, and its length +is `home_len`. The same logic applies to all the `stringdata` fields: there is +a way to calculate their relative position from the length of the fields before +them. -Additionally, there are two "easy" optimizations: +PackedUser employs two "simple" compression techniques: - shells are often shared across different users, see the "Shells" section. -- `name` is frequently a suffix of `home`. For example, `/home/motiejus` and - `motiejus`. In this case storing both name and home is wasteful. Therefore +- `name` is frequently a suffix of `home`. For example, `/home/vidmantas` and + `vidmantas`. In this case storing both name and home is wasteful. Therefore name has two options: 1. `name_is_a_suffix=true`: name is a suffix of the home dir. Then `name` starts at the `home_len - name_len`'th byte of `home`, and ends at the same @@ -217,8 +219,8 @@ Additionally, there are two "easy" optimizations: 2. `name_is_a_suffix=false`: name begins one byte after home, and it's length is `name_len`. -The last field, `additional_gids_offset`, which is needed least frequently, -is stored at the end. +The last field `additional_gids_offset: varint` points to the `additional_gids` section for +this user. Shells ------ @@ -273,26 +275,20 @@ Similarly, when user's groups are resolved in (2), they are not always necessary (i.e. not part of `struct user*`), therefore the memberships themselves are stored out of bound. -`Groupmembers` and `UserGids` store group and user memberships -respectively. Membership IDs are used in their entirety — not necessitating -random access, thus suitable for tight packing and varint encoding. +`groupmembers` and `additional_gids` store group and user memberships respectively. +Membership IDs are packed — not necessitating random access, thus suitable for +compression. -- For each group — a list of pointers (offsets) to User records, because - `getgr*_r` returns pointers to membernames. -- For each user — a list of gids, because `initgroups_dyn` (and friends) - returns an array of gids. +- `groupmembers` is a list of pointers (offsets) to User records, because + `getgr*_r` returns pointers to membernames, thus a name has to be immediately + resolvable. +- `additional_gids` is a list of gids, because `initgroups_dyn` (and friends) returns + an array of gids. -An entry of `Groupmembers` and `UserGids` looks like this piece of -pseudo-code: - -``` -const PackedList = struct { - Length: varint, - Members: [Length]varint, -} -const Groupmembers = PackedList; -const UserGids = PackedList; -``` +Each entry of `groupmembers` and `additional_gids` starts with a varint N, which is +the number of upcoming elements, followed by N delta-compressed varints. These +N delta-compressed varints are sorted the same way entries in `users` (in +`groupmembers`) and `groups`. Indices ------- @@ -352,18 +348,18 @@ shell_blob <= 4032 shell data blob (max 63*64 bytes) groups ? packed Group entries (8b padding) users ? packed User entries (8b padding) groupmembers ? per-group delta varint memberlist (no padding) -user_gids ? per-user delta varint gidlist (no padding) +additional_gids ? per-user delta varint gidlist (no padding) ``` Section creation order: -1. ✅ `bdz_*`. No depdendencies. -1. ✅ `shellIndex`, `shellBlob`. No dependencies. -1. ✅ userGids. No dependencies. -1. ✅ Users. Requires `userGids` and shell. -1. ✅ Groupmembers. Requires Users. -1. ✅ Groups. Requires Groupmembers. -1. ✅ `idx_*`. Requires offsets to Groups and Users. +1. ✅ `bdz_*`. +1. ✅ `shell_index`, `shell_blob`. +1. ✅ `additional_gids`. +1. ✅ `users` requires `additional_gids` and shell. +1. ✅ `groupmembers` requires `users`. +1. ✅ `groups` requires `groupmembers`. +1. ✅ `idx_*`. requires offsets to `groups` and `users`. 1. Header. [git-subtrac]: https://apenwarr.ca/log/20191109