undocker/rootfs/doc.go

79 lines
2.9 KiB
Go
Raw Normal View History

2021-05-24 00:11:58 +03:00
// Package rootfs extracts all layers of a Docker container image to a single
// tarball. It will go trough all layers in order and copy every file to the
// destination archive.
//
2021-05-24 00:11:58 +03:00
// It will also reasonably process those files.
2021-05-24 00:11:58 +03:00
//
// == Non-directory will be copied only once ==
2021-05-31 21:44:17 +03:00
//
2021-05-24 00:11:58 +03:00
// A non-directory will be copied only once, only from within it's past
// occurrence. I.e. if file /a/b was found in layers 0 and 2, only the file
// from layer 2 will be used.
// Directories will always be copied, even if there are duplicates. This is
// to avoid a situation like this:
// layer0:
2021-05-31 21:44:17 +03:00
// ./dir/
// ./dir/file
2021-05-24 00:11:58 +03:00
// layer1:
2021-05-31 21:44:17 +03:00
// ./dir/
// ./dir/file
2021-05-24 00:11:58 +03:00
// In theory, the directory from layer 1 takes precedence, so a tarball like
// this could be created:
2021-05-31 21:44:17 +03:00
// ./dir/ (from layer1)
// ./dir/file1 (from layer1)
2021-05-24 00:11:58 +03:00
// However, imagine the following:
// layer0:
2021-05-31 21:44:17 +03:00
// ./dir/
// ./dir/file1
2021-05-24 00:11:58 +03:00
// layer1:
2021-05-31 21:44:17 +03:00
// ./dir/
2021-05-24 00:11:58 +03:00
// Then the resulting tarball would have:
2021-05-31 21:44:17 +03:00
// ./dir/file1 (from layer1)
// ./dir/ (from layer0)
2021-05-24 00:11:58 +03:00
// Which would mean `untar` would try to untar a file to a directory which
// was not yet created. Therefore directories will be copied to the resulting
// tar in the order they appear in the layers.
//
// == Special files: .dockerenv ==
//
// .dockernv is present in all docker containers, and is likely to remain
// such. So if you do `docker export <container>`, the resulting tarball will
// have this file. rootfs will not add it. You are welcome to append one
// yourself.
//
// == Special files: opaque files and dirs (.wh.*) ==
//
// From mount.aufs(8)[1]:
//
2021-05-31 21:44:17 +03:00
// The whiteout is for hiding files on lower branches. Also it is applied to
// stop readdir going lower branches. The latter case is called opaque
// directory. Any whiteout is an empty file, it means whiteout is just an
// mark. In the case of hiding lower files, the name of whiteout is
// .wh.<filename>. And in the case of stopping readdir, the name is
// .wh..wh..opq. All whiteouts are hardlinked, including <writable branch
// top dir>/.wh..wh.aufs`.
2021-05-24 00:11:58 +03:00
//
// My interpretation:
2021-05-31 21:44:17 +03:00
//
// 1. a file/hardlink called `.wh..wh..opq` means that directory contents from
2021-05-24 00:11:58 +03:00
// the layers below the mentioned file should be ignored. Higher layers may add
2021-05-31 21:44:17 +03:00
// files on top. Ambiguity: should the directory from the lower layers be
// removed? I am assuming yes, but this assumptions is baseless.
//
// 2. if file/hardlink `.wh.([^/]+)` is found, $1 should be deleted from the
2021-05-24 00:11:58 +03:00
// current and lower layers.
//
// Note: these may be regular files in practice. So this implementation will
// match either.
2021-05-24 00:11:58 +03:00
//
// == Tar format ==
//
// Since we do care about long filenames and large file sizes (>8GB), we are
// using "classic" GNU Tar. However, at least NetBSD pax is known to have
// problems reading it[2].
//
// [1]: https://manpages.debian.org/unstable/aufs-tools/mount.aufs.8.en.html
2021-05-31 21:44:17 +03:00
//
2021-05-24 00:11:58 +03:00
// [2]: https://mgorny.pl/articles/portability-of-tar-features.html
package rootfs