diff --git a/config.yaml b/config.yaml index 84319b7..4819267 100644 --- a/config.yaml +++ b/config.yaml @@ -12,7 +12,7 @@ Menus: - Name: RSS URL: /rss.xml permalinks: - log: '/:year/:title/' + log: '/:year/:slug/' Params: dateFormat: '2006-01-02' outputs: diff --git a/content/log/2022/dependencies.md b/content/log/2022/dependencies.md index f49d615..f605e74 100644 --- a/content/log/2022/dependencies.md +++ b/content/log/2022/dependencies.md @@ -1,16 +1,13 @@ --- title: "Dependencies, zig and git-subtrac" date: 2022-04-23T05:37:51+03:00 -# FIXME: "slug: dependencies" doesn't do what I meant. -url: 2022/dependencies -draft: true +slug: dependencies # FIXME: 'list: never' keeps the link in the feed #_build: # list: never +draft: true --- - - TLDR ---- diff --git a/content/log/2022/smart-bundling.md b/content/log/2022/smart-bundling.md new file mode 100644 index 0000000..f42aabf --- /dev/null +++ b/content/log/2022/smart-bundling.md @@ -0,0 +1,257 @@ +--- +title: "smart bundling" +date: 2022-05-08T15:52:00+03:00 +slug: smart-bundling +draft: true +--- + +TLDR +---- + +Could our package managers bundle our dependencies whilst keeping the git +history as if they didn't? + +Number of dependencies +---------------------- + +All of the programming languages I've used professionally, the names of which +do not start with "c"[^1], have package managers[^2], which make "dependency +management" easy. These package managers will, as part of the project's build +process, download and build dependencies. They are easy enough to use that +there is virtually no resistance to add dependencies when they deem necessary. + +Dependencies are usually stored outside of the project's code repository; +either looked up in the system (common for C/C++) or downloaded from the +Internet (common for everything else). Many system dependencies irritates +users, so developers are incentivized to reduce them. However, there is no +incentive to have few statically linked, downloaded-from-the-internet +dependencies (I call them "external"), which brings us to this post. + +Adding external dependencies is like candy: the initial costs are nearly zero, +tastes good while eating, but the long-term effects are ill and dangerous. Why +and how to be cautious of external dependencies is a post for another day, but +suffice to say, I have a checklist and am prepared to do the work to avoid +adding a dependency if I can. + +If at least one external dependency [disappears][crash-of-leftpad], we have +serious problems building our project. + +{{House made out of Duplo pieces}} + +C++ programs I wrote a decade ago still generally build and run; Erlang, Java +and Python generally don't. Judging by the way "modern" languages handle +dependencies, it is fair to say that they optimize for initial development, but +not maintenance. Ten years ago I didn't think this will happen, I am less naïve +now. As [Corbet says][linux-rust], "We can't understand why Kids These Days +just don't want to live that way". Kids want to build, John, not think about +the future. A 4-letter Danish corporation made a fortune by selling toys that +are designed to be disassembled and built anew. Look ma', no maintenance! Kids +are still kids: growing up and sticking to the rules, even if they are ours, +requires discipline. + +If we require Something On The Internet to be available to build our +application, it will inevitably go away. The more things we rely on, and the +more time passes, the higher chance of misery when it does. We cannot abolish +dependencies these days, since some of them are too good to ignore (hello +SQLite, 241,245 lines of top-quality C). So we need to find a balance: how can +we have dependencies to satisfy the kids, but be mature and strategic in in the +long-term? We have a few options today: + +1. Mirror everything to an internal system, which never deletes code. Change + package manager to read from there instead. Discounting convenience, some + companies must absolutely have every line of code of their every build for + decades, and be able to rebuild it. Think about the firmware of your car's + [ABS][abs] or the infamous Boeing's [MCAS][MCAS]. This problem alone is a + whole B2B business segment and costs big money. +2. Copy the dependency verbatim to `deps/`. While easy to do, this + loses history of the dependency and rewrites the hashes, also making it + difficult to distinguish "our" from the upstream changes. Upgrades become + cumbersome, leading to the only obvious outcome of never upgrading after the + initial import. +3. Step up from (2): use [`git-subtree`][git-subtree] to copy the dependency to + the application tree, but preserve the history of the dependency. This + messes up the hashes. Therefore all refs in the dependency, like `Reverts + ` do not make sense in isolation. Upgrades are somewhat easier than + with (2), because history is still sort-of there, but still cumbersome, + leading to the same unfortunate outcome. +4. Download the dependencies at build time and store them in a "safe place", + like [go-mod-archiver][go-mod-archiver]. It does not change how day-to-day + development works with go modules, but offers a lifeboat when a dependency + disappears. History-wise it is still same as (2) — copying the dependency + tree without it's history; if dependency does go away, bringing it back + under our own wing is an exertion. As it does not change the development + process, it is quite easy to sell to any team. + +Option (1) is viable for very specific audiences and costs big money. Options +(2) and (3) blur the line between our application and dependencies and rewrite +the git history. Option (4) serves a different purpose: it is not a dependency +management system; it is a lifeboat when they inevitably disappear: +dependencies are still downloaded from the internet on every build. + +This number of approaches seem to suggest there is an apetite to protect +ourselves when dependencies disappear (vendoring of increasing sophistication), +that preserve git history in some way (`git-subtree`) and do not get in a way +of using the language's build tool (`go-mod-archiver`). But the problem is not +yet solved for any of the languages that I have worked with. + +So what about all of the below: +- "smart" vendoring to protect ourselves from things disappearing, and +- no friction when doing it? + +Sharing code hygienically +------------------------- + + +[Avery Pennarun][apenwarr], the creator of `git-subtree`, wrote +[`git-subtrac`][git-subtrac], which vendors dependencies in a special branch +without rewriting their history (i.e. leaving the hashes intact). Wait, stop +here. Repeat after me: _git-subtrac vendors our dependencies, but all refs stay +in our repository_. Since all the dependencies are in our repo, no external +force can make our dependency unavailable. Let's discuss it's advantages: + +1. The dependency keeps it's hashes, so it's history is left intact (as a + side-effect, `git show ` in our repository, will, + surprisingly, work). +2. The dependency is vendored to our tree, so it will not disappear. +3. Because humans are more observant to download times than building times, it + will keep a nice check on the overall size of the repository. Hopefully + preventing us (or our kids) from pulling in V8 (over 2M lines of code) just + to interpret a couple hundred lines of lines of JavaScript[^3]. + +Some of my friends point out that it has a disadvantage by design: it uses [git +submodules][git-submodules]. Submodules is the only way to convince git to to +check out another repository (i.e. the dependency) into our tree without git +thinking it's part of our code. Submodules are infamous for their footguns when +used directly. Higher-level `git-subtrac` shields us from being overly exposed +to git submodules, keeping footguns at the minimum. Oh, this description also +applies to the other 150+ git plumbing commands[^4], so nothing new here. + +Andrew meets Git Subtrac (for 5 seconds) +---------------------------------------- + +A couple of weeks ago in a park in Milan I was selling `git-subtrac` to Andrew +Kelley as a default package manager for Zig ([zig does not have one yet][zig-pkg-manager]). Our +conversation went like this: + +- me: "git-subtrac yadda yadda yadda submodules but better yadda yadda yadda". +- Andrew: "If I clone a repository that uses subtrac with no extra parameters, + will it work as expected?" +- me: "No, you have to pass `--recursive`, so git will checkout submodules... + even if they are already fetched." +- Andrew: "Then it's a piece-of-shit-approach." + +And I agree: `git-subtrac` is a tool for managing submodules, and does not try +to be anything more: it is not a package manager, nor it is a dependency +management tool. As far as potential Zig users users are concerned, it should +be a "git plumbing" command. + +If we never expose the nitty-gritty handling of git submodules (like +`git-subtrac`, but more sophisticated), maybe it's OK? I have tried to manage +two projects with `git-subtrac`, and it's quite close to pretend of not using +submodules. + +Zig and subtrac? +---------------- + +A package manager does much more than just downloading the dependencies: + +- Resolve the right versions of direct dependencies. +- Resolve and download the right transitive dependencies. +- Figure out the diamond dependencies and incompatibilities. +- Provide one-click means to upgrade everything. +- Build the dependencies (it is part of the build system, but usually the build + system and package manager are coupled). + +Just like git will not build our code, `git-subtrac` will neither. What if we +make `zig pkg` rely on `git-subtrac` (or, if we are serious, it's +reimplementation) to manage directory trees? + +Think about it for a minute. Imagine this workflow: + +**Step 1: clone** + +`git clone https://<...>` + +- Download the application source. +- Download all dependencies to `.git/`, but not check them out (due to the + nature of git submodules). `deps/` is an empty directory at this point. + +**Step 2: build** + +`zig pkg build`: + +- Check out dependencies in `deps/` using git's plumbing commands. No network + involved. +- Build dependencies, transitive dependencies and the application. + +At this point, the dependency is checked out in `deps/`, ready for hacking. If +we change the code there, git makes it obvious, but does not forbid us from +doing so, which is nice when hacking. + +**Step 3: add a dependency** + +`zig pkg get https://git.example.org/repo` + +- record the path of the dependency (just the user's *intent*, as typed) to the + zigpkg's config file. +- download the latest (tagged release|ref) and put amongst other dependencies. + +**Step 4: upgrading the dependencies** + +`zig pkg upgrade` + +- Go through the list of dependencies recorded in step 3, try to fetch the + updaded dependency versions. +- With hand holding and guardrails: + - If dependency no longer exists, inform and advice further course of action + - If the "newest version" is not a parent of what we have now, warn the user + as it's not an upgrade, but a new thing. +- See, no lock file needed! Just the list of dependency URLs, which translate + to the exact refs. + +This sums up the basic workflow. + +Drawbacks +--------- + +There are a few: + +- From my experience with `git-subtrac`, git submodules is still a leaky + abstraction. Since we are using git too, I am not fully convinced we may hide + *all* of it from the unsuspecting user. +- Obviously, this only works when both our repository and the dependencies are + in git. This may not be good if you are in the Fossil land (SQLite and + SpatiaLite come to mind). + +Did I miss something? Tell me. + +Credits +------- + +Many thanks Johny Marler and Anton Lavrik for reading drafts of this. + +[^1]: Alphabetically: Erlang, Go, Java, JavaScript, PHP, Perl, Python. +[^2]: Usually written in the same language. Zoo of package managers (sometimes + a couple of popular ones for the same programming language) is a can of worms + in an on itself worth another blog post. +[^3]: True story. +[^4]: git plumbing commands are ones that the users should almost never use, + but are used by git itself or other low-level tools. E.g. `git cat-file` and + `git hash-object` are plumbing commands, which, I can assure, 99% of the git + users have never heard of. + +[linux-rust]: https://lwn.net/SubscriberLink/889924/a733d6630e3b5115/ +[git-subtrac]: https://github.com/apenwarr/git-subtrac/ +[apenwarr-subtrac]: https://apenwarr.ca/log/20191109 +[git-subtree]: https://manpages.debian.org/testing/git-man/git-subtree.1.en.html +[go-mod-archiver]: https://github.com/tailscale/go-mod-archiver +[crash-of-leftpad]: https://drewdevault.com/2021/11/16/Cash-for-leftpad.html +[git-submodules]: https://git-scm.com/book/en/v2/Git-Tools-Submodules +[MCAS]: https://en.wikipedia.org/wiki/Maneuvering_Characteristics_Augmentation_System +[ABS]: https://en.wikipedia.org/wiki/Anti-lock_braking_system +[apenwarr]: https://apenwarr.ca/ +[zig-pkg-manager]: https://github.com/ziglang/zig/issues/943