remove dependencies

This commit is contained in:
Motiejus Jakštys 2022-05-09 05:53:04 +03:00
parent d9d8412872
commit bcb49d61c6

View File

@ -1,211 +0,0 @@
---
title: "Dependencies, zig and git-subtrac"
date: 2022-04-23T05:37:51+03:00
slug: dependencies
# FIXME: 'list: never' keeps the link in the feed
#_build:
# list: never
draft: true
---
TLDR
----
Modern programming languages make it very easy to add many dependencies. That
is nice for development, but a nightmare for long-term maintenance.
Unfortunately, zig is following suit. I wish we could accept that adding
dependencies does not have to be trivial. If we accept that, thanks to ubiquity
of git, we may have almost solved the dependency problem: not only for zig, but
for everyone.
Adding dependencies
-------------------
All of the programming languages I've used professionally, the names of which
do not start with "c"[^1], have package managers[^2], which make "dependency
management" easy. These package managers will, as part of the project's build
process, download and build the dependencies. So there is virtually no
resistance to add dependencies when we need them.
Because C/C++ still does not have a "universal" package manager, not adding
external dependencies to C/C++ is the path of least resistance; instead, one
relies on libraries already installed in the system. There is a plethora of
tools that will discover system dependencies: autotools, cmake, pkg-config, and
others. As a result, C/C++ projects I've participated in usually had 0-5
non-system dependencies, whereas non-C/C++ projects -- tens, hundreds, or
thousands[^3]. Having many system dependencies is painful for *every user* of
the package (because they have to make sure the libraries, and their correct
versions, are installed), so C/C++ projects tend avoid having too many of them.
In Go and Python, a small number of dependencies is often a sign of care and
quality. [mattn/go-sqlite3](https://github.com/mattn/go-sqlite3),
[uber/zap](https://github.com/uber-go/zap),
[apenwarr/redo](https://github.com/apenwarr/redo) and
[django](https://djangoproject.com) are good examples. I've built and used
these projects in a number of environments. Conversely, projects with many
dependencies, even when pinned, often fail to build even in the environment
they are developed at and thus had received most testing (e.g. a specific
OS+architecture, like `Ubuntu 16.04 x86_64`). It's even worse if the
environment, no matter how trivially, is different from the one developer is
working at[^4]. Let's forget about a different OS or a different build system.
Inability to build software, unsurprisingly, leads to user frustration,
packagers' frustration, and the developers asking themselves why have they
chosen a career in software instead of, say, farming.
To recap, the costs of just having dependencies are huge. I haven't done a
survey and have only my experience to base this on (read: "many anecdotes of me
failing to build stuff I or others wrote a decade ago"). But it is bad enough
that I have a dependency checklist and am prepared to do the grunt work to save
my future self. Here is it:
1. Does the dependency do what I want, does it work at all?
2. Is it well written? API surface, documentation, tests, error handling, error
signaling, logging, metrics, memory usage (if applicable).
3. How easy is it to build, run, and run it's tests? Related: can it be used
outside the default package manager?
4. It's system dependencies.
5. It's transitive dependencies.
When working with a "programming-language-specific package manager that does
what it's advertised to do", the path of least resistance, when it comes to
this checklist, is doing (1), and perhaps (2). Why bother with transitive
dependencies or it's build complexity, if the package manager takes care of it
all anyway?
Except package manager will only help during the initial development, when the
developer happily adds the package. It will work for a couple of days. Package
manager will not help when the dependency disappears, its API changes, it stops
doing what it has advertised and many other [problems][crash-of-leftpad]. When
something breaks (and it inevitably will, unless it's SQLite), the work is on
the maintainer to fix it.
I am following my checklist. If a dependency is well written, but has more
transitive dependencies than I need and there is no good alternative, I will
fork and trim it. My recent example is
[sql-migrate](https://github.com/motiejus/sql-migrate).
Not doing things that are easy to do requires discipline: brushing teeth,
limiting candy intake, not adding dependencies all over the place. If adding
dependencies is easy (and there is no established discipline of limiting them),
the project will tend to gain them; lots of them.
{{<img src="_/2022/brick-house.jpg"
alt="House made out of Duplo pieces"
caption="Just like this brick house, \"modern\" package managers are optimized for building, not maintenance. Photo mine, house by my sons."
hint="photo"
>}}
To sum up, the "modern" languages optimize for initial development experience,
not maintenance. And as [Corbet says][linux-rust], "We can't understand why
Kids These Days just don't want to live that way". Kids want to build, John,
not maintain. A 4-letter Danish corporation made a fortune by selling toys that
do not need to be maintained: they are designed to be disassembled and built
anew. We are still kids. Growing up and sticking to our own rules requires
discipline.
If I may combine Corbet's views with mine: if we understand and audit our
dependencies (all of them, including transitive ones), we will have less
dependencies and a more maintainable system. Win.
Which brings us to git submodules and git-subtract.
git submodules and git-subtrac
------------------------------
A quick primer on [git submodules][git-submodule], a prerequisite to understand
`git-subtrac`:
* A submodule is a pointer to a particular ref in a separate repository,
optionally checked out in our tree. For example, `deps/cmph` would contain
all the files from [cmph][cmph]. This means that once the repository is fully
set up (technically, the submodule is synced/updated), the build system
(Makefiles, build.zig or what have you) can use it just like a regular
directory.
* The pointer to the submodule in your repository is just a tuple: `(git URL,
sha1)`.
* When cloning a repository that has submodules, git will not clone the
submodules, it will just leave empty directories. We must pass `--recursive`
for git to clone everything. Which makes sense when submodules are external
and may not download at all.
Submodules were designed for adding external dependencies to a repository.
However, using them incorrectly is way too easy, and is not fun when happens. I
see at least these significant usability problems:
- It is too easy to commit unintended changes to submodule, causing misery to
others.
- By default submodule contents (i.e. code of your dependency) lives *outside
the repository*. This means that, with time, if dependency disappears, we
will not be able to compile our code. Gone.
Because of the many usability problems of submodules, very few people use it.
So [Avery Pennarun][apenwarr] (creator of [git-subtree][git-subtree], by the
way) created [`git-subtrac`][git-subtrac]. `git-subtrac` bundles our git
dependencies just like "classic" git submodules, but all refs of the
dependencies stay in the same repository. Wait, stop here. Repeat after me: _it
is git submodules, but all refs stay in the same repository_. I also call it
"good vendoring". Since all the dependencies are in our repo, no external force
can make our dependency unavailable. And it will keep the size of the
repository in check, because it's all there when we pull it. [`git-subtrac`
fixes a few other submodule usability problems][apenwarr-subtrac] along the
way.
Because `git-subtrac` is a vendoring tool, not a package manager, it only
vendors but does not help building packages. Therefore, with `git-subtrac` it
is harder to add and "make work" (build, test, add transitive dependencies) a
dependency than with a language-specific package manager.
`git-subtrac`, just like git and submodules, does not understand "semantic
versions". So we can't ask for "latest foo of version 1.2.X"; the developer
will need to figure out, and hardcode, *exactly* which versions to use. Also,
updating dependencies is not as easy as, say, in Gospeak, `go get -u ./...`;
git will need a bit more hand holding.
What about Zig?
---------------
Zig will have a package manager ([ziglang/zig#943][943]). I am not not very
enthusiastic about it; can we all use git-subtrac and be done with it? A few
weeks ago in a park in Milan my conversation with [Andrew
Kelley](https://andrewkelley.me/) was something like:
- me: "git-subtrac yadda yadda yadda submodules but better yadda yadda yadda".
- Andrew: "If I clone a repository that uses subtrac with no extra parameters,
will it work as expected?"
- me: "No, you have to pass `--recursive`, so git will checkout submodules...
even if they are already fetched."
- Andrew: "Then it's a piece-of-shit-approach."
Uh, I agree. People have not grown muscle memory to clone repositories with
`--recursive` flag and never will, so it's impossible to adopt git-subtrac
beyond well-controlled silos. Which is why we will have a
yet-another-programming-language-specific-package-manager. Or at least my
argument offering `git-subtrac` as Zig's package manager (thus saving a lot of
time for Zig folks, and a lot of inevitable misery for its users) stops right
there.
Zig has a rich standard library, therefore it does not need many dependencies
by design. Does it *really* need a package manager?
Conclusion
----------
When all contents of the submodules are in our repository, can git check out
submodules too? That way, my and Andrew's conversation of reconsidering (or not
having) a Zig package manager will have a chance to not stop after 5 seconds.
[^1]: Alphabetically: Erlang, Go, Java, Javascript, PHP, Perl, Python.
[^2]: Usually written in the same language. Zoo of package managers (sometimes
a couple of popular ones for the same programming language) is a can of worms
in an on itself worth another blog post.
[^3]: `go.sum` of a project I am currently involved in clocks around 6k lines.
This is quite a lot for Go, but still peanuts to Node.js.
[^4]: For example, they would work on Ubuntu 16.04, but fail on Ubuntu 18.04.
[git-subtrac]: https://github.com/apenwarr/git-subtrac/
[linux-rust]: https://lwn.net/SubscriberLink/889924/a733d6630e3b5115/
[crash-of-leftpad]: https://drewdevault.com/2021/11/16/Cash-for-leftpad.html
[943]: https://github.com/ziglang/zig/issues/943
[git-submodule]: https://git-scm.com/book/en/v2/Git-Tools-Submodules
[cmph]: http://cmph.sourceforge.net/
[git-subtree]: https://git.kernel.org/pub/scm/git/git.git/plain/contrib/subtree/git-subtree.txt
[apenwarr]: https://apenwarr.ca
[apenwarr-subtrac]: https://apenwarr.ca/log/20191109