make section names more consistent

This commit is contained in:
Motiejus Jakštys 2022-03-17 06:25:47 +01:00 committed by Motiejus Jakštys
parent 5ee8469ec5
commit 85552c1302

View File

@ -150,7 +150,7 @@ OFFSET TYPE NAME DESCRIPTION
32 u64 nblocks_groups 32 u64 nblocks_groups
40 u64 nblocks_users 40 u64 nblocks_users
48 u64 nblocks_groupmembers 48 u64 nblocks_groupmembers
56 u64 nblocks_usergids 56 u64 nblocks_additional_gids
``` ```
`magic` is 0xf09fa4b7, and `version` must be `0`. All integers are `magic` is 0xf09fa4b7, and `version` must be `0`. All integers are
@ -174,10 +174,15 @@ the beginning of the section.
const PackedGroup = packed struct { const PackedGroup = packed struct {
gid: u32, gid: u32,
groupname_len: u8, // max is 32, but have too much space here. groupname_len: u8, // max is 32, but have too much space here.
// varint members_offset + (groupname_len-1)-length string
groupdata []u8;
} }
```
PackedGroup is followed by the group name (of length `groupname_len`), followed
by a varint-compressed offset to the groupmembers section, followed by 8b padding.
PackedUser is a bit more involved:
```
pub const PackedUser = packed struct { pub const PackedUser = packed struct {
uid: u32, uid: u32,
gid: u32, gid: u32,
@ -188,28 +193,25 @@ pub const PackedUser = packed struct {
home_len: u6, home_len: u6,
name_len: u5, name_len: u5,
gecos_len: u11, gecos_len: u11,
// pseudocode: variable-sized array that will be stored immediately after
// this struct.
userdata []u8;
} }
``` ```
`userdata` contains a few entries: ... followed by `userdata: []u8`:
- home. - home.
- name (optional). - name (optional).
- gecos. - gecos.
- shell (optional). - shell (optional).
- `additional_gids_offset`: varint. - `additional_gids_offset`: varint.
First byte of home is stored right after the `gecos_len` field, and it's First byte of home is stored right after the `gecos_len` field, and its length
length is `home_len`. The same logic applies to all the `stringdata` fields: is `home_len`. The same logic applies to all the `stringdata` fields: there is
there is a way to calculate their relative position from the length of the a way to calculate their relative position from the length of the fields before
fields before them. them.
Additionally, there are two "easy" optimizations: PackedUser employs two "simple" compression techniques:
- shells are often shared across different users, see the "Shells" section. - shells are often shared across different users, see the "Shells" section.
- `name` is frequently a suffix of `home`. For example, `/home/motiejus` and - `name` is frequently a suffix of `home`. For example, `/home/vidmantas` and
`motiejus`. In this case storing both name and home is wasteful. Therefore `vidmantas`. In this case storing both name and home is wasteful. Therefore
name has two options: name has two options:
1. `name_is_a_suffix=true`: name is a suffix of the home dir. Then `name` 1. `name_is_a_suffix=true`: name is a suffix of the home dir. Then `name`
starts at the `home_len - name_len`'th byte of `home`, and ends at the same starts at the `home_len - name_len`'th byte of `home`, and ends at the same
@ -217,8 +219,8 @@ Additionally, there are two "easy" optimizations:
2. `name_is_a_suffix=false`: name begins one byte after home, and it's length 2. `name_is_a_suffix=false`: name begins one byte after home, and it's length
is `name_len`. is `name_len`.
The last field, `additional_gids_offset`, which is needed least frequently, The last field `additional_gids_offset: varint` points to the `additional_gids` section for
is stored at the end. this user.
Shells Shells
------ ------
@ -273,26 +275,20 @@ Similarly, when user's groups are resolved in (2), they are not always necessary
(i.e. not part of `struct user*`), therefore the memberships themselves are (i.e. not part of `struct user*`), therefore the memberships themselves are
stored out of bound. stored out of bound.
`Groupmembers` and `UserGids` store group and user memberships `groupmembers` and `additional_gids` store group and user memberships respectively.
respectively. Membership IDs are used in their entirety — not necessitating Membership IDs are packed — not necessitating random access, thus suitable for
random access, thus suitable for tight packing and varint encoding. compression.
- For each group — a list of pointers (offsets) to User records, because - `groupmembers` is a list of pointers (offsets) to User records, because
`getgr*_r` returns pointers to membernames. `getgr*_r` returns pointers to membernames, thus a name has to be immediately
- For each user — a list of gids, because `initgroups_dyn` (and friends) resolvable.
returns an array of gids. - `additional_gids` is a list of gids, because `initgroups_dyn` (and friends) returns
an array of gids.
An entry of `Groupmembers` and `UserGids` looks like this piece of Each entry of `groupmembers` and `additional_gids` starts with a varint N, which is
pseudo-code: the number of upcoming elements, followed by N delta-compressed varints. These
N delta-compressed varints are sorted the same way entries in `users` (in
``` `groupmembers`) and `groups`.
const PackedList = struct {
Length: varint,
Members: [Length]varint,
}
const Groupmembers = PackedList;
const UserGids = PackedList;
```
Indices Indices
------- -------
@ -352,18 +348,18 @@ shell_blob <= 4032 shell data blob (max 63*64 bytes)
groups ? packed Group entries (8b padding) groups ? packed Group entries (8b padding)
users ? packed User entries (8b padding) users ? packed User entries (8b padding)
groupmembers ? per-group delta varint memberlist (no padding) groupmembers ? per-group delta varint memberlist (no padding)
user_gids ? per-user delta varint gidlist (no padding) additional_gids ? per-user delta varint gidlist (no padding)
``` ```
Section creation order: Section creation order:
1. ✅ `bdz_*`. No depdendencies. 1. ✅ `bdz_*`.
1. ✅ `shellIndex`, `shellBlob`. No dependencies. 1. ✅ `shell_index`, `shell_blob`.
1. ✅ userGids. No dependencies. 1. ✅ `additional_gids`.
1. ✅ Users. Requires `userGids` and shell. 1. ✅ `users` requires `additional_gids` and shell.
1. ✅ Groupmembers. Requires Users. 1. ✅ `groupmembers` requires `users`.
1. ✅ Groups. Requires Groupmembers. 1. ✅ `groups` requires `groupmembers`.
1. ✅ `idx_*`. Requires offsets to Groups and Users. 1. ✅ `idx_*`. requires offsets to `groups` and `users`.
1. Header. 1. Header.
[git-subtrac]: https://apenwarr.ca/log/20191109 [git-subtrac]: https://apenwarr.ca/log/20191109