make section names more consistent

2022-03-17 06:25:47 +01:00 · 2022-03-17 06:25:47 +01:00 · 85552c1302
commit 85552c1302
parent 5ee8469ec5
1 changed files with 38 additions and 42 deletions
--- a/README.md
+++ b/README.md
@ -150,7 +150,7 @@ OFFSET     TYPE     NAME                      DESCRIPTION
  32        u64     nblocks_groups
  40        u64     nblocks_users
  48        u64     nblocks_groupmembers
-  56        u64     nblocks_usergids
+  56        u64     nblocks_additional_gids
 ```

 `magic` is 0xf09fa4b7, and `version` must be `0`. All integers are
@ -174,10 +174,15 @@ the beginning of the section.
 const PackedGroup = packed struct {
    gid: u32,
    groupname_len: u8, // max is 32, but have too much space here.
-    // varint members_offset + (groupname_len-1)-length string
-    groupdata []u8;
 }
+```

+PackedGroup is followed by the group name (of length `groupname_len`), followed
+by a varint-compressed offset to the groupmembers section, followed by 8b padding.
+
+PackedUser is a bit more involved:
+
+```
 pub const PackedUser = packed struct {
    uid: u32,
    gid: u32,
@ -188,28 +193,25 @@ pub const PackedUser = packed struct {
    home_len: u6,
    name_len: u5,
    gecos_len: u11,
-    // pseudocode: variable-sized array that will be stored immediately after
-    // this struct.
-    userdata []u8;
 }
 ```

-`userdata` contains a few entries:
+... followed by `userdata: []u8`:
 - home.
 - name (optional).
 - gecos.
 - shell (optional).
 - `additional_gids_offset`: varint.

-First byte of home is stored right after the `gecos_len` field, and it's
-length is `home_len`. The same logic applies to all the `stringdata` fields:
-there is a way to calculate their relative position from the length of the
-fields before them.
+First byte of home is stored right after the `gecos_len` field, and its length
+is `home_len`. The same logic applies to all the `stringdata` fields: there is
+a way to calculate their relative position from the length of the fields before
+them.

-Additionally, there are two "easy" optimizations:
+PackedUser employs two "simple" compression techniques:
 - shells are often shared across different users, see the "Shells" section.
- `name` is frequently a suffix of `home`. For example, `/home/motiejus` and
-  `motiejus`. In this case storing both name and home is wasteful. Therefore
+- `name` is frequently a suffix of `home`. For example, `/home/vidmantas` and
+  `vidmantas`. In this case storing both name and home is wasteful. Therefore
  name has two options:
  1. `name_is_a_suffix=true`: name is a suffix of the home dir. Then `name`
  starts at the `home_len - name_len`'th byte of `home`, and ends at the same
@ -217,8 +219,8 @@ Additionally, there are two "easy" optimizations:
  2. `name_is_a_suffix=false`: name begins one byte after home, and it's length
  is `name_len`.

-The last field, `additional_gids_offset`, which is needed least frequently,
-is stored at the end.
+The last field `additional_gids_offset: varint` points to the `additional_gids` section for
+this user.

 Shells
 ------
@ -273,26 +275,20 @@ Similarly, when user's groups are resolved in (2), they are not always necessary
 (i.e. not part of `struct user*`), therefore the memberships themselves are
 stored out of bound.

-`Groupmembers` and `UserGids` store group and user memberships
-respectively. Membership IDs are used in their entirety — not necessitating
-random access, thus suitable for tight packing and varint encoding.
+`groupmembers` and `additional_gids` store group and user memberships respectively.
+Membership IDs are packed — not necessitating random access, thus suitable for
+compression.

- For each group — a list of pointers (offsets) to User records, because
-  `getgr*_r` returns pointers to membernames.
- For each user — a list of gids, because `initgroups_dyn` (and friends)
-  returns an array of gids.
+- `groupmembers` is a list of pointers (offsets) to User records, because
+  `getgr*_r` returns pointers to membernames, thus a name has to be immediately
+  resolvable.
+- `additional_gids` is a list of gids, because `initgroups_dyn` (and friends) returns
+  an array of gids.

-An entry of `Groupmembers` and `UserGids` looks like this piece of
-pseudo-code:
-
-```
-const PackedList = struct {
-    Length: varint,
-    Members: [Length]varint,
-}
-const Groupmembers = PackedList;
-const UserGids  = PackedList;
-```
+Each entry of `groupmembers` and `additional_gids` starts with a varint N, which is
+the number of upcoming elements, followed by N delta-compressed varints. These
+N delta-compressed varints are sorted the same way entries in `users` (in
+`groupmembers`) and `groups`.

 Indices
 -------
@ -352,18 +348,18 @@ shell_blob            <= 4032          shell data blob (max 63*64 bytes)
 groups                ?                packed Group entries (8b padding)
 users                 ?                packed User entries (8b padding)
 groupmembers          ?                per-group delta varint memberlist (no padding)
-user_gids             ?                per-user delta varint gidlist (no padding)
+additional_gids       ?                per-user delta varint gidlist (no padding)
 ```

 Section creation order:

-1. ✅ `bdz_*`. No depdendencies.
-1. ✅ `shellIndex`, `shellBlob`. No dependencies.
-1. ✅ userGids. No dependencies.
-1. ✅ Users. Requires `userGids` and shell.
-1. ✅ Groupmembers. Requires Users.
-1. ✅ Groups. Requires Groupmembers.
-1. ✅ `idx_*`. Requires offsets to Groups and Users.
+1. ✅ `bdz_*`.
+1. ✅ `shell_index`, `shell_blob`.
+1. ✅ `additional_gids`.
+1. ✅ `users` requires `additional_gids` and shell.
+1. ✅ `groupmembers` requires `users`.
+1. ✅ `groups` requires `groupmembers`.
+1. ✅ `idx_*`. requires offsets to `groups` and `users`.
 1. Header.

 [git-subtrac]: https://apenwarr.ca/log/20191109