wip dependencies and smart-bundling
This commit is contained in:
parent
d311e0f66d
commit
1943baef2d
@ -12,7 +12,7 @@ Menus:
|
||||
- Name: RSS
|
||||
URL: /rss.xml
|
||||
permalinks:
|
||||
log: '/:year/:title/'
|
||||
log: '/:year/:slug/'
|
||||
Params:
|
||||
dateFormat: '2006-01-02'
|
||||
outputs:
|
||||
|
@ -1,16 +1,13 @@
|
||||
---
|
||||
title: "Dependencies, zig and git-subtrac"
|
||||
date: 2022-04-23T05:37:51+03:00
|
||||
# FIXME: "slug: dependencies" doesn't do what I meant.
|
||||
url: 2022/dependencies
|
||||
draft: true
|
||||
slug: dependencies
|
||||
# FIXME: 'list: never' keeps the link in the feed
|
||||
#_build:
|
||||
# list: never
|
||||
draft: true
|
||||
---
|
||||
|
||||
<!-- o_ -->
|
||||
|
||||
TLDR
|
||||
----
|
||||
|
||||
|
257
content/log/2022/smart-bundling.md
Normal file
257
content/log/2022/smart-bundling.md
Normal file
@ -0,0 +1,257 @@
|
||||
---
|
||||
title: "smart bundling"
|
||||
date: 2022-05-08T15:52:00+03:00
|
||||
slug: smart-bundling
|
||||
draft: true
|
||||
---
|
||||
|
||||
TLDR
|
||||
----
|
||||
|
||||
Could our package managers bundle our dependencies whilst keeping the git
|
||||
history as if they didn't?
|
||||
|
||||
Number of dependencies
|
||||
----------------------
|
||||
|
||||
All of the programming languages I've used professionally, the names of which
|
||||
do not start with "c"[^1], have package managers[^2], which make "dependency
|
||||
management" easy. These package managers will, as part of the project's build
|
||||
process, download and build dependencies. They are easy enough to use that
|
||||
there is virtually no resistance to add dependencies when they deem necessary.
|
||||
|
||||
Dependencies are usually stored outside of the project's code repository;
|
||||
either looked up in the system (common for C/C++) or downloaded from the
|
||||
Internet (common for everything else). Many system dependencies irritates
|
||||
users, so developers are incentivized to reduce them. However, there is no
|
||||
incentive to have few statically linked, downloaded-from-the-internet
|
||||
dependencies (I call them "external"), which brings us to this post.
|
||||
|
||||
Adding external dependencies is like candy: the initial costs are nearly zero,
|
||||
tastes good while eating, but the long-term effects are ill and dangerous. Why
|
||||
and how to be cautious of external dependencies is a post for another day, but
|
||||
suffice to say, I have a checklist and am prepared to do the work to avoid
|
||||
adding a dependency if I can.
|
||||
|
||||
If at least one external dependency [disappears][crash-of-leftpad], we have
|
||||
serious problems building our project.
|
||||
|
||||
{{<img src="_/2022/brick-house.jpg"
|
||||
alt="House made out of Duplo pieces"
|
||||
caption="Just like this brick house, \"modern\" package managers are optimized for building, not maintenance. House by my sons, photo mine."
|
||||
hint="photo"
|
||||
>}}
|
||||
|
||||
C++ programs I wrote a decade ago still generally build and run; Erlang, Java
|
||||
and Python generally don't. Judging by the way "modern" languages handle
|
||||
dependencies, it is fair to say that they optimize for initial development, but
|
||||
not maintenance. Ten years ago I didn't think this will happen, I am less naïve
|
||||
now. As [Corbet says][linux-rust], "We can't understand why Kids These Days
|
||||
just don't want to live that way". Kids want to build, John, not think about
|
||||
the future. A 4-letter Danish corporation made a fortune by selling toys that
|
||||
are designed to be disassembled and built anew. Look ma', no maintenance! Kids
|
||||
are still kids: growing up and sticking to the rules, even if they are ours,
|
||||
requires discipline.
|
||||
|
||||
If we require Something On The Internet to be available to build our
|
||||
application, it will inevitably go away. The more things we rely on, and the
|
||||
more time passes, the higher chance of misery when it does. We cannot abolish
|
||||
dependencies these days, since some of them are too good to ignore (hello
|
||||
SQLite, 241,245 lines of top-quality C). So we need to find a balance: how can
|
||||
we have dependencies to satisfy the kids, but be mature and strategic in in the
|
||||
long-term? We have a few options today:
|
||||
|
||||
1. Mirror everything to an internal system, which never deletes code. Change
|
||||
package manager to read from there instead. Discounting convenience, some
|
||||
companies must absolutely have every line of code of their every build for
|
||||
decades, and be able to rebuild it. Think about the firmware of your car's
|
||||
[ABS][abs] or the infamous Boeing's [MCAS][MCAS]. This problem alone is a
|
||||
whole B2B business segment and costs big money.
|
||||
2. Copy the dependency verbatim to `deps/<dependency>`. While easy to do, this
|
||||
loses history of the dependency and rewrites the hashes, also making it
|
||||
difficult to distinguish "our" from the upstream changes. Upgrades become
|
||||
cumbersome, leading to the only obvious outcome of never upgrading after the
|
||||
initial import.
|
||||
3. Step up from (2): use [`git-subtree`][git-subtree] to copy the dependency to
|
||||
the application tree, but preserve the history of the dependency. This
|
||||
messes up the hashes. Therefore all refs in the dependency, like `Reverts
|
||||
<commit>` do not make sense in isolation. Upgrades are somewhat easier than
|
||||
with (2), because history is still sort-of there, but still cumbersome,
|
||||
leading to the same unfortunate outcome.
|
||||
4. Download the dependencies at build time and store them in a "safe place",
|
||||
like [go-mod-archiver][go-mod-archiver]. It does not change how day-to-day
|
||||
development works with go modules, but offers a lifeboat when a dependency
|
||||
disappears. History-wise it is still same as (2) — copying the dependency
|
||||
tree without it's history; if dependency does go away, bringing it back
|
||||
under our own wing is an exertion. As it does not change the development
|
||||
process, it is quite easy to sell to any team.
|
||||
|
||||
Option (1) is viable for very specific audiences and costs big money. Options
|
||||
(2) and (3) blur the line between our application and dependencies and rewrite
|
||||
the git history. Option (4) serves a different purpose: it is not a dependency
|
||||
management system; it is a lifeboat when they inevitably disappear:
|
||||
dependencies are still downloaded from the internet on every build.
|
||||
|
||||
This number of approaches seem to suggest there is an apetite to protect
|
||||
ourselves when dependencies disappear (vendoring of increasing sophistication),
|
||||
that preserve git history in some way (`git-subtree`) and do not get in a way
|
||||
of using the language's build tool (`go-mod-archiver`). But the problem is not
|
||||
yet solved for any of the languages that I have worked with.
|
||||
|
||||
So what about all of the below:
|
||||
- "smart" vendoring to protect ourselves from things disappearing, and
|
||||
- no friction when doing it?
|
||||
|
||||
Sharing code hygienically
|
||||
-------------------------
|
||||
|
||||
|
||||
[Avery Pennarun][apenwarr], the creator of `git-subtree`, wrote
|
||||
[`git-subtrac`][git-subtrac], which vendors dependencies in a special branch
|
||||
without rewriting their history (i.e. leaving the hashes intact). Wait, stop
|
||||
here. Repeat after me: _git-subtrac vendors our dependencies, but all refs stay
|
||||
in our repository_. Since all the dependencies are in our repo, no external
|
||||
force can make our dependency unavailable. Let's discuss it's advantages:
|
||||
|
||||
1. The dependency keeps it's hashes, so it's history is left intact (as a
|
||||
side-effect, `git show <hash of the dependency>` in our repository, will,
|
||||
surprisingly, work).
|
||||
2. The dependency is vendored to our tree, so it will not disappear.
|
||||
3. Because humans are more observant to download times than building times, it
|
||||
will keep a nice check on the overall size of the repository. Hopefully
|
||||
preventing us (or our kids) from pulling in V8 (over 2M lines of code) just
|
||||
to interpret a couple hundred lines of lines of JavaScript[^3].
|
||||
|
||||
Some of my friends point out that it has a disadvantage by design: it uses [git
|
||||
submodules][git-submodules]. Submodules is the only way to convince git to to
|
||||
check out another repository (i.e. the dependency) into our tree without git
|
||||
thinking it's part of our code. Submodules are infamous for their footguns when
|
||||
used directly. Higher-level `git-subtrac` shields us from being overly exposed
|
||||
to git submodules, keeping footguns at the minimum. Oh, this description also
|
||||
applies to the other 150+ git plumbing commands[^4], so nothing new here.
|
||||
|
||||
Andrew meets Git Subtrac (for 5 seconds)
|
||||
----------------------------------------
|
||||
|
||||
A couple of weeks ago in a park in Milan I was selling `git-subtrac` to Andrew
|
||||
Kelley as a default package manager for Zig ([zig does not have one yet][zig-pkg-manager]). Our
|
||||
conversation went like this:
|
||||
|
||||
- me: "git-subtrac yadda yadda yadda submodules but better yadda yadda yadda".
|
||||
- Andrew: "If I clone a repository that uses subtrac with no extra parameters,
|
||||
will it work as expected?"
|
||||
- me: "No, you have to pass `--recursive`, so git will checkout submodules...
|
||||
even if they are already fetched."
|
||||
- Andrew: "Then it's a piece-of-shit-approach."
|
||||
|
||||
And I agree: `git-subtrac` is a tool for managing submodules, and does not try
|
||||
to be anything more: it is not a package manager, nor it is a dependency
|
||||
management tool. As far as potential Zig users users are concerned, it should
|
||||
be a "git plumbing" command.
|
||||
|
||||
If we never expose the nitty-gritty handling of git submodules (like
|
||||
`git-subtrac`, but more sophisticated), maybe it's OK? I have tried to manage
|
||||
two projects with `git-subtrac`, and it's quite close to pretend of not using
|
||||
submodules.
|
||||
|
||||
Zig and subtrac?
|
||||
----------------
|
||||
|
||||
A package manager does much more than just downloading the dependencies:
|
||||
|
||||
- Resolve the right versions of direct dependencies.
|
||||
- Resolve and download the right transitive dependencies.
|
||||
- Figure out the diamond dependencies and incompatibilities.
|
||||
- Provide one-click means to upgrade everything.
|
||||
- Build the dependencies (it is part of the build system, but usually the build
|
||||
system and package manager are coupled).
|
||||
|
||||
Just like git will not build our code, `git-subtrac` will neither. What if we
|
||||
make `zig pkg` rely on `git-subtrac` (or, if we are serious, it's
|
||||
reimplementation) to manage directory trees?
|
||||
|
||||
Think about it for a minute. Imagine this workflow:
|
||||
|
||||
**Step 1: clone**
|
||||
|
||||
`git clone https://<...>`
|
||||
|
||||
- Download the application source.
|
||||
- Download all dependencies to `.git/`, but not check them out (due to the
|
||||
nature of git submodules). `deps/` is an empty directory at this point.
|
||||
|
||||
**Step 2: build**
|
||||
|
||||
`zig pkg build`:
|
||||
|
||||
- Check out dependencies in `deps/` using git's plumbing commands. No network
|
||||
involved.
|
||||
- Build dependencies, transitive dependencies and the application.
|
||||
|
||||
At this point, the dependency is checked out in `deps/`, ready for hacking. If
|
||||
we change the code there, git makes it obvious, but does not forbid us from
|
||||
doing so, which is nice when hacking.
|
||||
|
||||
**Step 3: add a dependency**
|
||||
|
||||
`zig pkg get https://git.example.org/repo`
|
||||
|
||||
- record the path of the dependency (just the user's *intent*, as typed) to the
|
||||
zigpkg's config file.
|
||||
- download the latest (tagged release|ref) and put amongst other dependencies.
|
||||
|
||||
**Step 4: upgrading the dependencies**
|
||||
|
||||
`zig pkg upgrade`
|
||||
|
||||
- Go through the list of dependencies recorded in step 3, try to fetch the
|
||||
updaded dependency versions.
|
||||
- With hand holding and guardrails:
|
||||
- If dependency no longer exists, inform and advice further course of action
|
||||
- If the "newest version" is not a parent of what we have now, warn the user
|
||||
as it's not an upgrade, but a new thing.
|
||||
- See, no lock file needed! Just the list of dependency URLs, which translate
|
||||
to the exact refs.
|
||||
|
||||
This sums up the basic workflow.
|
||||
|
||||
Drawbacks
|
||||
---------
|
||||
|
||||
There are a few:
|
||||
|
||||
- From my experience with `git-subtrac`, git submodules is still a leaky
|
||||
abstraction. Since we are using git too, I am not fully convinced we may hide
|
||||
*all* of it from the unsuspecting user.
|
||||
- Obviously, this only works when both our repository and the dependencies are
|
||||
in git. This may not be good if you are in the Fossil land (SQLite and
|
||||
SpatiaLite come to mind).
|
||||
|
||||
Did I miss something? Tell me.
|
||||
|
||||
Credits
|
||||
-------
|
||||
|
||||
Many thanks Johny Marler and Anton Lavrik for reading drafts of this.
|
||||
|
||||
[^1]: Alphabetically: Erlang, Go, Java, JavaScript, PHP, Perl, Python.
|
||||
[^2]: Usually written in the same language. Zoo of package managers (sometimes
|
||||
a couple of popular ones for the same programming language) is a can of worms
|
||||
in an on itself worth another blog post.
|
||||
[^3]: True story.
|
||||
[^4]: git plumbing commands are ones that the users should almost never use,
|
||||
but are used by git itself or other low-level tools. E.g. `git cat-file` and
|
||||
`git hash-object` are plumbing commands, which, I can assure, 99% of the git
|
||||
users have never heard of.
|
||||
|
||||
[linux-rust]: https://lwn.net/SubscriberLink/889924/a733d6630e3b5115/
|
||||
[git-subtrac]: https://github.com/apenwarr/git-subtrac/
|
||||
[apenwarr-subtrac]: https://apenwarr.ca/log/20191109
|
||||
[git-subtree]: https://manpages.debian.org/testing/git-man/git-subtree.1.en.html
|
||||
[go-mod-archiver]: https://github.com/tailscale/go-mod-archiver
|
||||
[crash-of-leftpad]: https://drewdevault.com/2021/11/16/Cash-for-leftpad.html
|
||||
[git-submodules]: https://git-scm.com/book/en/v2/Git-Tools-Submodules
|
||||
[MCAS]: https://en.wikipedia.org/wiki/Maneuvering_Characteristics_Augmentation_System
|
||||
[ABS]: https://en.wikipedia.org/wiki/Anti-lock_braking_system
|
||||
[apenwarr]: https://apenwarr.ca/
|
||||
[zig-pkg-manager]: https://github.com/ziglang/zig/issues/943
|
Loading…
Reference in New Issue
Block a user