rewrite shells

- Shell is up to 256 bytes long.
- Store up to 255 shells in the Shells area.
- Remove padding from the User struct.
This commit is contained in:
2022-03-17 16:50:41 +01:00
committed by Motiejus Jakštys
parent 85552c1302
commit 4e36d7850e
6 changed files with 81 additions and 134 deletions

View File

@@ -65,9 +65,12 @@ regions are shared. Turbonss reads do not consume any heap space.
Tight packing places some constraints on the underlying data:
- Permitted length of username and groupname: 1-32 bytes.
- Permitted length of shell and home: 1-64 bytes.
- Permitted length of shell and home: 1-256 bytes.
- Permitted comment ("gecos") length: 0-255 bytes.
- User name, groupname, gecos and shell must be utf8-encoded.
- User and Groups sections are up to 2^35B (~34GB) large. Assuming an "average"
user record takes 50 bytes, this section would fit ~660M users. The
worst-case upper bound is left as an exercise to the reader.
Sorting is stable. In v0:
- Groups are sorted by gid, ascending.
@@ -173,7 +176,8 @@ the beginning of the section.
```
const PackedGroup = packed struct {
gid: u32,
groupname_len: u8, // max is 32, but have too much space here.
padding: u3,
groupname_len: u5,
}
```
@@ -186,8 +190,7 @@ PackedUser is a bit more involved:
pub const PackedUser = packed struct {
uid: u32,
gid: u32,
padding: u2 = 0,
shell_len_or_idx: u6,
shell_len_or_idx: u8,
shell_here: bool,
name_is_a_suffix: bool,
home_len: u6,
@@ -219,8 +222,8 @@ PackedUser employs two "simple" compression techniques:
2. `name_is_a_suffix=false`: name begins one byte after home, and it's length
is `name_len`.
The last field `additional_gids_offset: varint` points to the `additional_gids` section for
this user.
The last field `additional_gids_offset: varint` points to the `additional_gids`
section for this user.
Shells
------
@@ -231,23 +234,20 @@ others. Therefore, "shells" have an optimization: they can be pointed by in the
external list, or, if they are unique to the user, reside among the user's
data.
63 most popular shells (i.e. referred to by at least two User entries) are
255 most popular shells (i.e. referred to by at least two User entries) are
stored externally in "Shells" area. The less popular ones are stored with
userdata.
Shells section consists of two sub-sections: the index and the blob. The index
is a list of structs which point to a location in the "blob" area:
is an array of offsets: the i'th shell starts at `offsets[i]` byte, and ends at
`offsets[i+1]` byte. If there is at least one shell in the shell section, the
index contains a sentinel index as the last element, which signifies the position
of the last byte of the shell blob.
```
const ShellIndex = struct {
offset: u10,
len: u6,
};
```
In the user's struct `shell_here=true` signifies that the shell is stored with
userdata, and it's length is `shell_len_or_idx`. `shell_here=false` means it is
stored in the `Shells` section, and it's index is `shell_len_or_idx`.
`shell_here=true` in the User struct means the shell is stored with userdata,
and it's length is `shell_len_or_idx`. `shell_here=false` means it is stored in
the `Shells` section, and it's index is `shell_len_or_idx` (and the actual
string start and end offsets are resolved as described in the paragraph above).
Variable-length integers (varints)
----------------------------------
@@ -264,7 +264,6 @@ There are two group memberships at play:
1. Given a group (gid/name), resolve the members' names (e.g. `getgrgid`).
2. Given a username, resolve user's group gids (for `initgroups(3)`).
When group's memberships are resolved in (1), the same call also requires other
group information: gid and group name. Therefore it makes sense to store a
pointer to the group members in the group information itself. However, the
@@ -323,9 +322,10 @@ will be pointing to a number `n ∈ [0,N-1]`, regardless whether the value was i
the initial dictionary. Therefore one must always confirm, after calculating
the hash, that the key matches what's been hashed.
`idx_*` sections are of type `[]PackedIntArray(u29)` and are pointing to the
respective `Groups` and `Users` entries (from the beginning of the respective
section). Since User and Group records are 8-byte aligned, `u29` is used.
`idx_*` sections are of type `[]u32` and are pointing to the respective
`Groups` and `Users` entries (from the beginning of the respective section).
Since User and Group records are 8-byte aligned, the actual offset to the
record is acquired by right-shifting this value by 3 bits.
Database file structure
-----------------------
@@ -344,7 +344,7 @@ idx_groupname2group len(group)*4 bdz->offset Groups
idx_uid2user len(user)*4 bdz->offset Users
idx_name2user len(user)*4 bdz->offset Users
shell_index len(shells)*2 shell index array
shell_blob <= 4032 shell data blob (max 63*64 bytes)
shell_blob <= 65280 shell data blob (max 255*256 bytes)
groups ? packed Group entries (8b padding)
users ? packed User entries (8b padding)
groupmembers ? per-group delta varint memberlist (no padding)