jakstys.lt/dependencies.md at 1943baef2d93ea8f854cc060b09bbfa0968d2e68

Motiejus Jakštys 1943baef2d wip dependencies and smart-bundling

2022-05-08 21:49:05 +03:00

11 KiB

Raw Blame History

title	date	slug	draft
Dependencies, zig and git-subtrac	2022-04-23T05:37:51+03:00	dependencies	true

TLDR

Modern programming languages make it very easy to add many dependencies. That is nice for development, but a nightmare for long-term maintenance. Unfortunately, zig is following suit. I wish we could accept that adding dependencies does not have to be trivial. If we accept that, thanks to ubiquity of git, we may have almost solved the dependency problem: not only for zig, but for everyone.

Adding dependencies

All of the programming languages I've used professionally, the names of which do not start with "c"¹, have package managers², which make "dependency management" easy. These package managers will, as part of the project's build process, download and build the dependencies. So there is virtually no resistance to add dependencies when we need them.

Because C/C++ still does not have a "universal" package manager, not adding external dependencies to C/C++ is the path of least resistance; instead, one relies on libraries already installed in the system. There is a plethora of tools that will discover system dependencies: autotools, cmake, pkg-config, and others. As a result, C/C++ projects I've participated in usually had 0-5 non-system dependencies, whereas non-C/C++ projects -- tens, hundreds, or thousands³. Having many system dependencies is painful for every user of the package (because they have to make sure the libraries, and their correct versions, are installed), so C/C++ projects tend avoid having too many of them.

In Go and Python, a small number of dependencies is often a sign of care and quality. mattn/go-sqlite3, uber/zap, apenwarr/redo and django are good examples. I've built and used these projects in a number of environments. Conversely, projects with many dependencies, even when pinned, often fail to build even in the environment they are developed at and thus had received most testing (e.g. a specific OS+architecture, like Ubuntu 16.04 x86_64). It's even worse if the environment, no matter how trivially, is different from the one developer is working at⁴. Let's forget about a different OS or a different build system. Inability to build software, unsurprisingly, leads to user frustration, packagers' frustration, and the developers asking themselves why have they chosen a career in software instead of, say, farming.

To recap, the costs of just having dependencies are huge. I haven't done a survey and have only my experience to base this on (read: "many anecdotes of me failing to build stuff I or others wrote a decade ago"). But it is bad enough that I have a dependency checklist and am prepared to do the grunt work to save my future self. Here is it:

Does the dependency do what I want, does it work at all?
Is it well written? API surface, documentation, tests, error handling, error signaling, logging, metrics, memory usage (if applicable).
How easy is it to build, run, and run it's tests? Related: can it be used outside the default package manager?
It's system dependencies.
It's transitive dependencies.

When working with a "programming-language-specific package manager that does what it's advertised to do", the path of least resistance, when it comes to this checklist, is doing (1), and perhaps (2). Why bother with transitive dependencies or it's build complexity, if the package manager takes care of it all anyway?

Except package manager will only help during the initial development, when the developer happily adds the package. It will work for a couple of days. Package manager will not help when the dependency disappears, its API changes, it stops doing what it has advertised and many other problems. When something breaks (and it inevitably will, unless it's SQLite), the work is on the maintainer to fix it.

I am following my checklist. If a dependency is well written, but has more transitive dependencies than I need and there is no good alternative, I will fork and trim it. My recent example is sql-migrate.

Not doing things that are easy to do requires discipline: brushing teeth, limiting candy intake, not adding dependencies all over the place. If adding dependencies is easy (and there is no established discipline of limiting them), the project will tend to gain them; lots of them.

{{<img src="_/2022/brick-house.jpg" alt="House made out of Duplo pieces" caption="Just like this brick house, "modern" package managers are optimized for building, not maintenance. Photo mine, house by my sons." hint="photo" >}}

To sum up, the "modern" languages optimize for initial development experience, not maintenance. And as Corbet says, "We can't understand why Kids These Days just don't want to live that way". Kids want to build, John, not maintain. A 4-letter Danish corporation made a fortune by selling toys that do not need to be maintained: they are designed to be disassembled and built anew. We are still kids. Growing up and sticking to our own rules requires discipline.

If I may combine Corbet's views with mine: if we understand and audit our dependencies (all of them, including transitive ones), we will have less dependencies and a more maintainable system. Win.

Which brings us to git submodules and git-subtract.

git submodules and git-subtrac

A quick primer on git submodules, a prerequisite to understand git-subtrac:

A submodule is a pointer to a particular ref in a separate repository, optionally checked out in our tree. For example, deps/cmph would contain all the files from cmph. This means that once the repository is fully set up (technically, the submodule is synced/updated), the build system (Makefiles, build.zig or what have you) can use it just like a regular directory.
The pointer to the submodule in your repository is just a tuple: (git URL, sha1).
When cloning a repository that has submodules, git will not clone the submodules, it will just leave empty directories. We must pass --recursive for git to clone everything. Which makes sense when submodules are external and may not download at all.

Submodules were designed for adding external dependencies to a repository. However, using them incorrectly is way too easy, and is not fun when happens. I see at least these significant usability problems:

It is too easy to commit unintended changes to submodule, causing misery to others.
By default submodule contents (i.e. code of your dependency) lives outside the repository. This means that, with time, if dependency disappears, we will not be able to compile our code. Gone.

Because of the many usability problems of submodules, very few people use it. So Avery Pennarun (creator of git-subtree, by the way) created git-subtrac. git-subtrac bundles our git dependencies just like "classic" git submodules, but all refs of the dependencies stay in the same repository. Wait, stop here. Repeat after me: it is git submodules, but all refs stay in the same repository. I also call it "good vendoring". Since all the dependencies are in our repo, no external force can make our dependency unavailable. And it will keep the size of the repository in check, because it's all there when we pull it. git-subtrac fixes a few other submodule usability problems along the way.

Because git-subtrac is a vendoring tool, not a package manager, it only vendors but does not help building packages. Therefore, with git-subtrac it is harder to add and "make work" (build, test, add transitive dependencies) a dependency than with a language-specific package manager.

git-subtrac, just like git and submodules, does not understand "semantic versions". So we can't ask for "latest foo of version 1.2.X"; the developer will need to figure out, and hardcode, exactly which versions to use. Also, updating dependencies is not as easy as, say, in Gospeak, go get -u ./...; git will need a bit more hand holding.

What about Zig?

Zig will have a package manager (ziglang/zig#943). I am not not very enthusiastic about it; can we all use git-subtrac and be done with it? A few weeks ago in a park in Milan my conversation with Andrew Kelley was something like:

me: "git-subtrac yadda yadda yadda submodules but better yadda yadda yadda".
Andrew: "If I clone a repository that uses subtrac with no extra parameters, will it work as expected?"
me: "No, you have to pass --recursive, so git will checkout submodules... even if they are already fetched."
Andrew: "Then it's a piece-of-shit-approach."

Uh, I agree. People have not grown muscle memory to clone repositories with --recursive flag and never will, so it's impossible to adopt git-subtrac beyond well-controlled silos. Which is why we will have a yet-another-programming-language-specific-package-manager. Or at least my argument offering git-subtrac as Zig's package manager (thus saving a lot of time for Zig folks, and a lot of inevitable misery for its users) stops right there.

Zig has a rich standard library, therefore it does not need many dependencies by design. Does it really need a package manager?

Conclusion

When all contents of the submodules are in our repository, can git check out submodules too? That way, my and Andrew's conversation of reconsidering (or not having) a Zig package manager will have a chance to not stop after 5 seconds.

Alphabetically: Erlang, Go, Java, Javascript, PHP, Perl, Python. ↩︎
Usually written in the same language. Zoo of package managers (sometimes a couple of popular ones for the same programming language) is a can of worms in an on itself worth another blog post. ↩︎
go.sum of a project I am currently involved in clocks around 6k lines. This is quite a lot for Go, but still peanuts to Node.js. ↩︎
For example, they would work on Ubuntu 16.04, but fail on Ubuntu 18.04. ↩︎

11 KiB Raw Blame History