rewrite shells
- Shell is up to 256 bytes long. - Store up to 255 shells in the Shells area. - Remove padding from the User struct.
This commit is contained in:
46
README.md
46
README.md
@@ -65,9 +65,12 @@ regions are shared. Turbonss reads do not consume any heap space.
|
||||
Tight packing places some constraints on the underlying data:
|
||||
|
||||
- Permitted length of username and groupname: 1-32 bytes.
|
||||
- Permitted length of shell and home: 1-64 bytes.
|
||||
- Permitted length of shell and home: 1-256 bytes.
|
||||
- Permitted comment ("gecos") length: 0-255 bytes.
|
||||
- User name, groupname, gecos and shell must be utf8-encoded.
|
||||
- User and Groups sections are up to 2^35B (~34GB) large. Assuming an "average"
|
||||
user record takes 50 bytes, this section would fit ~660M users. The
|
||||
worst-case upper bound is left as an exercise to the reader.
|
||||
|
||||
Sorting is stable. In v0:
|
||||
- Groups are sorted by gid, ascending.
|
||||
@@ -173,7 +176,8 @@ the beginning of the section.
|
||||
```
|
||||
const PackedGroup = packed struct {
|
||||
gid: u32,
|
||||
groupname_len: u8, // max is 32, but have too much space here.
|
||||
padding: u3,
|
||||
groupname_len: u5,
|
||||
}
|
||||
```
|
||||
|
||||
@@ -186,8 +190,7 @@ PackedUser is a bit more involved:
|
||||
pub const PackedUser = packed struct {
|
||||
uid: u32,
|
||||
gid: u32,
|
||||
padding: u2 = 0,
|
||||
shell_len_or_idx: u6,
|
||||
shell_len_or_idx: u8,
|
||||
shell_here: bool,
|
||||
name_is_a_suffix: bool,
|
||||
home_len: u6,
|
||||
@@ -219,8 +222,8 @@ PackedUser employs two "simple" compression techniques:
|
||||
2. `name_is_a_suffix=false`: name begins one byte after home, and it's length
|
||||
is `name_len`.
|
||||
|
||||
The last field `additional_gids_offset: varint` points to the `additional_gids` section for
|
||||
this user.
|
||||
The last field `additional_gids_offset: varint` points to the `additional_gids`
|
||||
section for this user.
|
||||
|
||||
Shells
|
||||
------
|
||||
@@ -231,23 +234,20 @@ others. Therefore, "shells" have an optimization: they can be pointed by in the
|
||||
external list, or, if they are unique to the user, reside among the user's
|
||||
data.
|
||||
|
||||
63 most popular shells (i.e. referred to by at least two User entries) are
|
||||
255 most popular shells (i.e. referred to by at least two User entries) are
|
||||
stored externally in "Shells" area. The less popular ones are stored with
|
||||
userdata.
|
||||
|
||||
Shells section consists of two sub-sections: the index and the blob. The index
|
||||
is a list of structs which point to a location in the "blob" area:
|
||||
is an array of offsets: the i'th shell starts at `offsets[i]` byte, and ends at
|
||||
`offsets[i+1]` byte. If there is at least one shell in the shell section, the
|
||||
index contains a sentinel index as the last element, which signifies the position
|
||||
of the last byte of the shell blob.
|
||||
|
||||
```
|
||||
const ShellIndex = struct {
|
||||
offset: u10,
|
||||
len: u6,
|
||||
};
|
||||
```
|
||||
|
||||
In the user's struct `shell_here=true` signifies that the shell is stored with
|
||||
userdata, and it's length is `shell_len_or_idx`. `shell_here=false` means it is
|
||||
stored in the `Shells` section, and it's index is `shell_len_or_idx`.
|
||||
`shell_here=true` in the User struct means the shell is stored with userdata,
|
||||
and it's length is `shell_len_or_idx`. `shell_here=false` means it is stored in
|
||||
the `Shells` section, and it's index is `shell_len_or_idx` (and the actual
|
||||
string start and end offsets are resolved as described in the paragraph above).
|
||||
|
||||
Variable-length integers (varints)
|
||||
----------------------------------
|
||||
@@ -264,7 +264,6 @@ There are two group memberships at play:
|
||||
1. Given a group (gid/name), resolve the members' names (e.g. `getgrgid`).
|
||||
2. Given a username, resolve user's group gids (for `initgroups(3)`).
|
||||
|
||||
|
||||
When group's memberships are resolved in (1), the same call also requires other
|
||||
group information: gid and group name. Therefore it makes sense to store a
|
||||
pointer to the group members in the group information itself. However, the
|
||||
@@ -323,9 +322,10 @@ will be pointing to a number `n ∈ [0,N-1]`, regardless whether the value was i
|
||||
the initial dictionary. Therefore one must always confirm, after calculating
|
||||
the hash, that the key matches what's been hashed.
|
||||
|
||||
`idx_*` sections are of type `[]PackedIntArray(u29)` and are pointing to the
|
||||
respective `Groups` and `Users` entries (from the beginning of the respective
|
||||
section). Since User and Group records are 8-byte aligned, `u29` is used.
|
||||
`idx_*` sections are of type `[]u32` and are pointing to the respective
|
||||
`Groups` and `Users` entries (from the beginning of the respective section).
|
||||
Since User and Group records are 8-byte aligned, the actual offset to the
|
||||
record is acquired by right-shifting this value by 3 bits.
|
||||
|
||||
Database file structure
|
||||
-----------------------
|
||||
@@ -344,7 +344,7 @@ idx_groupname2group len(group)*4 bdz->offset Groups
|
||||
idx_uid2user len(user)*4 bdz->offset Users
|
||||
idx_name2user len(user)*4 bdz->offset Users
|
||||
shell_index len(shells)*2 shell index array
|
||||
shell_blob <= 4032 shell data blob (max 63*64 bytes)
|
||||
shell_blob <= 65280 shell data blob (max 255*256 bytes)
|
||||
groups ? packed Group entries (8b padding)
|
||||
users ? packed User entries (8b padding)
|
||||
groupmembers ? per-group delta varint memberlist (no padding)
|
||||
|
||||
Reference in New Issue
Block a user