undocker

extract docker archives
Log | Files | Refs | README | LICENSE

doc.go (3004B) - Raw


      1 // Package rootfs extracts all layers of a Docker container image to a single
      2 // tarball. It will go trough all layers in order and copy every file to the
      3 // destination archive.
      4 //
      5 // It will also reasonably process those files.
      6 //
      7 // == Non-directory will be copied only once ==
      8 //
      9 // A non-directory will be copied only once, only from within it's past
     10 // occurrence. I.e. if file /a/b was found in layers 0 and 2, only the file
     11 // from layer 2 will be used.
     12 // Directories will always be copied, even if there are duplicates. This is
     13 // to avoid a situation like this:
     14 //   layer0:
     15 //     ./dir/
     16 //     ./dir/file
     17 //   layer1:
     18 //     ./dir/
     19 //     ./dir/file
     20 // In theory, the directory from layer 1 takes precedence, so a tarball like
     21 // this could be created:
     22 //   ./dir/      (from layer1)
     23 //   ./dir/file1 (from layer1)
     24 // However, imagine the following:
     25 //   layer0:
     26 //     ./dir/
     27 //     ./dir/file1
     28 //   layer1:
     29 //     ./dir/
     30 // Then the resulting tarball would have:
     31 //   ./dir/file1 (from layer1)
     32 //   ./dir/      (from layer0)
     33 // Which would mean `untar` would try to untar a file to a directory which
     34 // was not yet created. Therefore directories will be copied to the resulting
     35 // tar in the order they appear in the layers.
     36 //
     37 // == Special files: .dockerenv ==
     38 //
     39 // .dockernv is present in all docker containers, and is likely to remain
     40 // such. So if you do `docker export <container>`, the resulting tarball will
     41 // have this file. rootfs will not add it. You are welcome to append one
     42 // yourself.
     43 //
     44 // == Special files: opaque files and dirs (.wh.*) ==
     45 //
     46 // From mount.aufs(8)[1]:
     47 //
     48 //   The whiteout is for hiding files on lower branches. Also it is applied to
     49 //   stop readdir going lower branches. The latter case is called ‘opaque
     50 //   directory.’ Any whiteout is an empty file, it means whiteout is just an
     51 //   mark. In the case of hiding lower files, the name of whiteout is
     52 //   ‘.wh.<filename>.’ And in the case of stopping readdir, the name is
     53 //   ‘.wh..wh..opq’. All whiteouts are hardlinked, including ‘<writable branch
     54 //   top dir>/.wh..wh.aufs`.
     55 //
     56 // My interpretation:
     57 //
     58 // 1. a file/hardlink called `.wh..wh..opq` means that directory contents from
     59 // the layers below the mentioned file should be ignored. Higher layers may add
     60 // files on top. Ambiguity: should the directory from the lower layers be
     61 // removed? I am assuming yes, but this assumptions is baseless.
     62 //
     63 // 2. if file/hardlink `.wh.([^/]+)` is found, $1 should be deleted from the
     64 // current and lower layers.
     65 //
     66 // Note: these may be regular files in practice. So this implementation will
     67 // match either.
     68 //
     69 // == Tar format ==
     70 //
     71 // Since we do care about long filenames and large file sizes (>8GB), we are
     72 // using "classic" GNU Tar. However, at least NetBSD pax is known to have
     73 // problems reading it[2].
     74 //
     75 // [1]: https://manpages.debian.org/unstable/aufs-tools/mount.aufs.8.en.html
     76 //
     77 // [2]: https://mgorny.pl/articles/portability-of-tar-features.html
     78 package rootfs