tidy up the header structure

2022-03-17 06:10:39 +01:00 · 2022-03-17 06:10:39 +01:00 · 5ee8469ec5
commit 5ee8469ec5
parent d526f1fab8
1 changed files with 28 additions and 42 deletions
--- a/README.md
+++ b/README.md
@ -64,7 +64,6 @@ regions are shared. Turbonss reads do not consume any heap space.

 Tight packing places some constraints on the underlying data:

- Maximum database size: 4GB.
 - Permitted length of username and groupname: 1-32 bytes.
 - Permitted length of shell and home: 1-64 bytes.
 - Permitted comment ("gecos") length: 0-255 bytes.
@ -136,45 +135,32 @@ Turbonss header
 The turbonss header looks like this:

 ```
-OFFSET     TYPE     NAME                          DESCRIPTION
-   0      [4]u8     magic                         always 0xf09fa4b7
-   4         u8     version                       now `0`
-   5        u16     bom                           0x1234
-             u8     num_shells                    max value: 63.
-   8        u32     num_users                     number of passwd entries
-  12        u32     num_groups                    number of group entries
-  16        u32     offset_bdz_uid2user
-  24        u32     offset_bdz_name2user
-  20        u32     offset_bdz_groupname2group
-  28        u32     offset_idx                    offset to the first idx_ section
-  32        u32     offset_groups
-  36        u32     offset_users
-  40        u32     offset_groupmembers
-  44        u32     offset_additional_gids
+OFFSET     TYPE     NAME                      DESCRIPTION
+   0      [4]u8     magic                     f0 9f a4 b7
+   4         u8     version                   0
+   5         u8     bigendian                 0 for little-endian, 1 for big-endian
+   6         u8     nblocks_shell_blob        max value: 63
+   7         u8     num_shells                max value: 63
+   8        u32     num_groups                number of group entries
+  12        u32     num_users                 number of passwd entries
+  16        u32     nblocks_bdz_gid           bdz_gid section block count
+  20        u32     nblocks_bdz_groupname
+  24        u32     nblocks_bdz_uid
+  28        u32     nblocks_bdz_username
+  32        u64     nblocks_groups
+  40        u64     nblocks_users
+  48        u64     nblocks_groupmembers
+  56        u64     nblocks_usergids
 ```

 `magic` is 0xf09fa4b7, and `version` must be `0`. All integers are
-native-endian. `bom` is a byte-order-mark. It must resolve to `0x1234` (4460).
-If that's not true, the file is consumed in a different endianness than it was
-created at. Turbonss files cannot be moved across different-endianness
-computers. If that happens, turbonss will refuse to read the file.
+native-endian. `nblocks_*` is the count of blocks of a particular section; this
+helps calculate the offsets to all sections.

-Offsets are indices to further sections of the file, with zero being the first
-block (pointing to the `magic` field). As all sections are aligned to 64 bytes,
-the offsets are always pointing to the beginning of an 64-byte "block".
-Therefore, all `offset_*` values could be `u26`. As `u32` is easier to
-visualize with xxd, and the header block fits to 64 bytes anyway, we are
-keeping them as u32 now.
-
-Sections whose lengths can be calculated do not have a corresponding `offset_*`
-header field. For example, `bdz_gid2group` comes immediately after the header,
-and `idx_groupname2group` comes after `idx_gid2group`, whose offset is
-`offset_idx`, and size can be calculated.
-
-`num_shells` would fit to u6; however, we would need 2 bits of padding (all
-other fields are byte-aligned). If we instead do `u2` followed by `u6`, the
-byte would look very unusual on a little-endian architecture. Therefore we will
-just reject the DB if the number of shells exceeds 63.
+Some numbers, like `nblocks_shell_blob`, `num_shells`, would fit to smaller
+number of bytes. However, interpreting `[2]u6` with `xxd(1)` is harder than
+interpreting `[2]u8`. Therefore we are using the space we have to make these
+integers byte-wide.

 Primitive types
 ---------------
@ -345,14 +331,14 @@ the hash, that the key matches what's been hashed.
 respective `Groups` and `Users` entries (from the beginning of the respective
 section). Since User and Group records are 8-byte aligned, `u29` is used.

-Complete file structure
+Database file structure
 -----------------------

 Each section is padded to 64 bytes.

 ```
 SECTION               SIZE             DESCRIPTION
-Header                48               see "Turbonss header" section
+header                64               see "Turbonss header" section
 bdz_gid               ?                bdz(gid)
 bdz_groupname         ?                bdz(groupname)
 bdz_uid               ?                bdz(uid)
@ -361,12 +347,12 @@ idx_gid2group         len(group)*4     bdz->offset Groups
 idx_groupname2group   len(group)*4     bdz->offset Groups
 idx_uid2user          len(user)*4      bdz->offset Users
 idx_name2user         len(user)*4      bdz->offset Users
-shellIndex            len(shells)*2    shell index array
-shellBlob             <= 4032          shell data blob (max 63*64 bytes)
+shell_index           len(shells)*2    shell index array
+shell_blob            <= 4032          shell data blob (max 63*64 bytes)
 groups                ?                packed Group entries (8b padding)
 users                 ?                packed User entries (8b padding)
-groupMembers          ?                per-group delta varint memberlist (no padding)
-userGids              ?                per-user delta varint gidlist (no padding)
+groupmembers          ?                per-group delta varint memberlist (no padding)
+user_gids             ?                per-user delta varint gidlist (no padding)
 ```

 Section creation order: