258 lines
12 KiB
Markdown
258 lines
12 KiB
Markdown
|
---
|
||
|
title: "smart bundling"
|
||
|
date: 2022-05-08T15:52:00+03:00
|
||
|
slug: smart-bundling
|
||
|
draft: true
|
||
|
---
|
||
|
|
||
|
TLDR
|
||
|
----
|
||
|
|
||
|
Could our package managers bundle our dependencies whilst keeping the git
|
||
|
history as if they didn't?
|
||
|
|
||
|
Number of dependencies
|
||
|
----------------------
|
||
|
|
||
|
All of the programming languages I've used professionally, the names of which
|
||
|
do not start with "c"[^1], have package managers[^2], which make "dependency
|
||
|
management" easy. These package managers will, as part of the project's build
|
||
|
process, download and build dependencies. They are easy enough to use that
|
||
|
there is virtually no resistance to add dependencies when they deem necessary.
|
||
|
|
||
|
Dependencies are usually stored outside of the project's code repository;
|
||
|
either looked up in the system (common for C/C++) or downloaded from the
|
||
|
Internet (common for everything else). Many system dependencies irritates
|
||
|
users, so developers are incentivized to reduce them. However, there is no
|
||
|
incentive to have few statically linked, downloaded-from-the-internet
|
||
|
dependencies (I call them "external"), which brings us to this post.
|
||
|
|
||
|
Adding external dependencies is like candy: the initial costs are nearly zero,
|
||
|
tastes good while eating, but the long-term effects are ill and dangerous. Why
|
||
|
and how to be cautious of external dependencies is a post for another day, but
|
||
|
suffice to say, I have a checklist and am prepared to do the work to avoid
|
||
|
adding a dependency if I can.
|
||
|
|
||
|
If at least one external dependency [disappears][crash-of-leftpad], we have
|
||
|
serious problems building our project.
|
||
|
|
||
|
{{<img src="_/2022/brick-house.jpg"
|
||
|
alt="House made out of Duplo pieces"
|
||
|
caption="Just like this brick house, \"modern\" package managers are optimized for building, not maintenance. House by my sons, photo mine."
|
||
|
hint="photo"
|
||
|
>}}
|
||
|
|
||
|
C++ programs I wrote a decade ago still generally build and run; Erlang, Java
|
||
|
and Python generally don't. Judging by the way "modern" languages handle
|
||
|
dependencies, it is fair to say that they optimize for initial development, but
|
||
|
not maintenance. Ten years ago I didn't think this will happen, I am less naïve
|
||
|
now. As [Corbet says][linux-rust], "We can't understand why Kids These Days
|
||
|
just don't want to live that way". Kids want to build, John, not think about
|
||
|
the future. A 4-letter Danish corporation made a fortune by selling toys that
|
||
|
are designed to be disassembled and built anew. Look ma', no maintenance! Kids
|
||
|
are still kids: growing up and sticking to the rules, even if they are ours,
|
||
|
requires discipline.
|
||
|
|
||
|
If we require Something On The Internet to be available to build our
|
||
|
application, it will inevitably go away. The more things we rely on, and the
|
||
|
more time passes, the higher chance of misery when it does. We cannot abolish
|
||
|
dependencies these days, since some of them are too good to ignore (hello
|
||
|
SQLite, 241,245 lines of top-quality C). So we need to find a balance: how can
|
||
|
we have dependencies to satisfy the kids, but be mature and strategic in in the
|
||
|
long-term? We have a few options today:
|
||
|
|
||
|
1. Mirror everything to an internal system, which never deletes code. Change
|
||
|
package manager to read from there instead. Discounting convenience, some
|
||
|
companies must absolutely have every line of code of their every build for
|
||
|
decades, and be able to rebuild it. Think about the firmware of your car's
|
||
|
[ABS][abs] or the infamous Boeing's [MCAS][MCAS]. This problem alone is a
|
||
|
whole B2B business segment and costs big money.
|
||
|
2. Copy the dependency verbatim to `deps/<dependency>`. While easy to do, this
|
||
|
loses history of the dependency and rewrites the hashes, also making it
|
||
|
difficult to distinguish "our" from the upstream changes. Upgrades become
|
||
|
cumbersome, leading to the only obvious outcome of never upgrading after the
|
||
|
initial import.
|
||
|
3. Step up from (2): use [`git-subtree`][git-subtree] to copy the dependency to
|
||
|
the application tree, but preserve the history of the dependency. This
|
||
|
messes up the hashes. Therefore all refs in the dependency, like `Reverts
|
||
|
<commit>` do not make sense in isolation. Upgrades are somewhat easier than
|
||
|
with (2), because history is still sort-of there, but still cumbersome,
|
||
|
leading to the same unfortunate outcome.
|
||
|
4. Download the dependencies at build time and store them in a "safe place",
|
||
|
like [go-mod-archiver][go-mod-archiver]. It does not change how day-to-day
|
||
|
development works with go modules, but offers a lifeboat when a dependency
|
||
|
disappears. History-wise it is still same as (2) — copying the dependency
|
||
|
tree without it's history; if dependency does go away, bringing it back
|
||
|
under our own wing is an exertion. As it does not change the development
|
||
|
process, it is quite easy to sell to any team.
|
||
|
|
||
|
Option (1) is viable for very specific audiences and costs big money. Options
|
||
|
(2) and (3) blur the line between our application and dependencies and rewrite
|
||
|
the git history. Option (4) serves a different purpose: it is not a dependency
|
||
|
management system; it is a lifeboat when they inevitably disappear:
|
||
|
dependencies are still downloaded from the internet on every build.
|
||
|
|
||
|
This number of approaches seem to suggest there is an apetite to protect
|
||
|
ourselves when dependencies disappear (vendoring of increasing sophistication),
|
||
|
that preserve git history in some way (`git-subtree`) and do not get in a way
|
||
|
of using the language's build tool (`go-mod-archiver`). But the problem is not
|
||
|
yet solved for any of the languages that I have worked with.
|
||
|
|
||
|
So what about all of the below:
|
||
|
- "smart" vendoring to protect ourselves from things disappearing, and
|
||
|
- no friction when doing it?
|
||
|
|
||
|
Sharing code hygienically
|
||
|
-------------------------
|
||
|
|
||
|
|
||
|
[Avery Pennarun][apenwarr], the creator of `git-subtree`, wrote
|
||
|
[`git-subtrac`][git-subtrac], which vendors dependencies in a special branch
|
||
|
without rewriting their history (i.e. leaving the hashes intact). Wait, stop
|
||
|
here. Repeat after me: _git-subtrac vendors our dependencies, but all refs stay
|
||
|
in our repository_. Since all the dependencies are in our repo, no external
|
||
|
force can make our dependency unavailable. Let's discuss it's advantages:
|
||
|
|
||
|
1. The dependency keeps it's hashes, so it's history is left intact (as a
|
||
|
side-effect, `git show <hash of the dependency>` in our repository, will,
|
||
|
surprisingly, work).
|
||
|
2. The dependency is vendored to our tree, so it will not disappear.
|
||
|
3. Because humans are more observant to download times than building times, it
|
||
|
will keep a nice check on the overall size of the repository. Hopefully
|
||
|
preventing us (or our kids) from pulling in V8 (over 2M lines of code) just
|
||
|
to interpret a couple hundred lines of lines of JavaScript[^3].
|
||
|
|
||
|
Some of my friends point out that it has a disadvantage by design: it uses [git
|
||
|
submodules][git-submodules]. Submodules is the only way to convince git to to
|
||
|
check out another repository (i.e. the dependency) into our tree without git
|
||
|
thinking it's part of our code. Submodules are infamous for their footguns when
|
||
|
used directly. Higher-level `git-subtrac` shields us from being overly exposed
|
||
|
to git submodules, keeping footguns at the minimum. Oh, this description also
|
||
|
applies to the other 150+ git plumbing commands[^4], so nothing new here.
|
||
|
|
||
|
Andrew meets Git Subtrac (for 5 seconds)
|
||
|
----------------------------------------
|
||
|
|
||
|
A couple of weeks ago in a park in Milan I was selling `git-subtrac` to Andrew
|
||
|
Kelley as a default package manager for Zig ([zig does not have one yet][zig-pkg-manager]). Our
|
||
|
conversation went like this:
|
||
|
|
||
|
- me: "git-subtrac yadda yadda yadda submodules but better yadda yadda yadda".
|
||
|
- Andrew: "If I clone a repository that uses subtrac with no extra parameters,
|
||
|
will it work as expected?"
|
||
|
- me: "No, you have to pass `--recursive`, so git will checkout submodules...
|
||
|
even if they are already fetched."
|
||
|
- Andrew: "Then it's a piece-of-shit-approach."
|
||
|
|
||
|
And I agree: `git-subtrac` is a tool for managing submodules, and does not try
|
||
|
to be anything more: it is not a package manager, nor it is a dependency
|
||
|
management tool. As far as potential Zig users users are concerned, it should
|
||
|
be a "git plumbing" command.
|
||
|
|
||
|
If we never expose the nitty-gritty handling of git submodules (like
|
||
|
`git-subtrac`, but more sophisticated), maybe it's OK? I have tried to manage
|
||
|
two projects with `git-subtrac`, and it's quite close to pretend of not using
|
||
|
submodules.
|
||
|
|
||
|
Zig and subtrac?
|
||
|
----------------
|
||
|
|
||
|
A package manager does much more than just downloading the dependencies:
|
||
|
|
||
|
- Resolve the right versions of direct dependencies.
|
||
|
- Resolve and download the right transitive dependencies.
|
||
|
- Figure out the diamond dependencies and incompatibilities.
|
||
|
- Provide one-click means to upgrade everything.
|
||
|
- Build the dependencies (it is part of the build system, but usually the build
|
||
|
system and package manager are coupled).
|
||
|
|
||
|
Just like git will not build our code, `git-subtrac` will neither. What if we
|
||
|
make `zig pkg` rely on `git-subtrac` (or, if we are serious, it's
|
||
|
reimplementation) to manage directory trees?
|
||
|
|
||
|
Think about it for a minute. Imagine this workflow:
|
||
|
|
||
|
**Step 1: clone**
|
||
|
|
||
|
`git clone https://<...>`
|
||
|
|
||
|
- Download the application source.
|
||
|
- Download all dependencies to `.git/`, but not check them out (due to the
|
||
|
nature of git submodules). `deps/` is an empty directory at this point.
|
||
|
|
||
|
**Step 2: build**
|
||
|
|
||
|
`zig pkg build`:
|
||
|
|
||
|
- Check out dependencies in `deps/` using git's plumbing commands. No network
|
||
|
involved.
|
||
|
- Build dependencies, transitive dependencies and the application.
|
||
|
|
||
|
At this point, the dependency is checked out in `deps/`, ready for hacking. If
|
||
|
we change the code there, git makes it obvious, but does not forbid us from
|
||
|
doing so, which is nice when hacking.
|
||
|
|
||
|
**Step 3: add a dependency**
|
||
|
|
||
|
`zig pkg get https://git.example.org/repo`
|
||
|
|
||
|
- record the path of the dependency (just the user's *intent*, as typed) to the
|
||
|
zigpkg's config file.
|
||
|
- download the latest (tagged release|ref) and put amongst other dependencies.
|
||
|
|
||
|
**Step 4: upgrading the dependencies**
|
||
|
|
||
|
`zig pkg upgrade`
|
||
|
|
||
|
- Go through the list of dependencies recorded in step 3, try to fetch the
|
||
|
updaded dependency versions.
|
||
|
- With hand holding and guardrails:
|
||
|
- If dependency no longer exists, inform and advice further course of action
|
||
|
- If the "newest version" is not a parent of what we have now, warn the user
|
||
|
as it's not an upgrade, but a new thing.
|
||
|
- See, no lock file needed! Just the list of dependency URLs, which translate
|
||
|
to the exact refs.
|
||
|
|
||
|
This sums up the basic workflow.
|
||
|
|
||
|
Drawbacks
|
||
|
---------
|
||
|
|
||
|
There are a few:
|
||
|
|
||
|
- From my experience with `git-subtrac`, git submodules is still a leaky
|
||
|
abstraction. Since we are using git too, I am not fully convinced we may hide
|
||
|
*all* of it from the unsuspecting user.
|
||
|
- Obviously, this only works when both our repository and the dependencies are
|
||
|
in git. This may not be good if you are in the Fossil land (SQLite and
|
||
|
SpatiaLite come to mind).
|
||
|
|
||
|
Did I miss something? Tell me.
|
||
|
|
||
|
Credits
|
||
|
-------
|
||
|
|
||
|
Many thanks Johny Marler and Anton Lavrik for reading drafts of this.
|
||
|
|
||
|
[^1]: Alphabetically: Erlang, Go, Java, JavaScript, PHP, Perl, Python.
|
||
|
[^2]: Usually written in the same language. Zoo of package managers (sometimes
|
||
|
a couple of popular ones for the same programming language) is a can of worms
|
||
|
in an on itself worth another blog post.
|
||
|
[^3]: True story.
|
||
|
[^4]: git plumbing commands are ones that the users should almost never use,
|
||
|
but are used by git itself or other low-level tools. E.g. `git cat-file` and
|
||
|
`git hash-object` are plumbing commands, which, I can assure, 99% of the git
|
||
|
users have never heard of.
|
||
|
|
||
|
[linux-rust]: https://lwn.net/SubscriberLink/889924/a733d6630e3b5115/
|
||
|
[git-subtrac]: https://github.com/apenwarr/git-subtrac/
|
||
|
[apenwarr-subtrac]: https://apenwarr.ca/log/20191109
|
||
|
[git-subtree]: https://manpages.debian.org/testing/git-man/git-subtree.1.en.html
|
||
|
[go-mod-archiver]: https://github.com/tailscale/go-mod-archiver
|
||
|
[crash-of-leftpad]: https://drewdevault.com/2021/11/16/Cash-for-leftpad.html
|
||
|
[git-submodules]: https://git-scm.com/book/en/v2/Git-Tools-Submodules
|
||
|
[MCAS]: https://en.wikipedia.org/wiki/Maneuvering_Characteristics_Augmentation_System
|
||
|
[ABS]: https://en.wikipedia.org/wiki/Anti-lock_braking_system
|
||
|
[apenwarr]: https://apenwarr.ca/
|
||
|
[zig-pkg-manager]: https://github.com/ziglang/zig/issues/943
|