1
Fork 0

README prose, the way I understand it

nix
Motiejus Jakštys 2022-04-15 15:20:42 +03:00
parent 4d65b80903
commit 0c02a827ae
4 changed files with 207 additions and 106 deletions

298
README.md
View File

@ -3,12 +3,21 @@
# Bazel zig cc toolchain
This is a C/C++ toolchain that can (cross-)compile C/C++ programs. It contains
clang-13, musl, glibc (versions 2-2.34, selectable), all in a ~40MB package.
Read
clang-13, musl, glibc 2-2.35, all in a ~40MB package. Read
[here](https://andrewkelley.me/post/zig-cc-powerful-drop-in-replacement-gcc-clang.html)
about zig-cc; the rest of the README will present how to use this toolchain
from Bazel.
Configuring toolchains in Bazel is complex, under-documented, and fraught with
peril. I, the co-author of bazel-zig-cc, am still confused on how this all
works, and often wonder why it works at all. That aside, we made the our best
effort to make bazel-zig-cc usable for your C/C++/CGo projects, with as many
guardrails as we could install.
While copy-pasting the code in your project, attempt to read and understand the
text surrounding the code snippets. This will save you hours of head
scratching, I promise.
# Usage
Add this to your `WORKSPACE`:
@ -25,23 +34,18 @@ http_archive(
load("@bazel-zig-cc//toolchain:defs.bzl", zig_toolchains = "toolchains")
zig_toolchains()
# version, url_formats and host_platform_sha256 are optional, but highly
# recommended. Zig SDK is by default downloaded from dl.jakstys.lt, which is a
# tiny server in the closet of Yours Truly.
zig_toolchains(
version = "<...>",
url_formats = [
"https://example.org/zig/zig-{host_platform}-{version}.tar.xz",
],
host_platform_sha256 = { ... },
)
```
> ### zig sdk download control
>
> If you are using this in production, you probably want more control over
> where the zig sdk is downloaded from:
> ```
> zig_register_toolchains(
> version = "<...>",
> url_formats = [
> "https://example.internal/zig/zig-{host_platform}-{version}.tar.xz",
> ],
> host_platform_sha256 = { ... },
> )
> ```
And this to `.bazelrc`:
```
@ -49,15 +53,20 @@ build --incompatible_enable_cc_toolchain_resolution
```
The snippets above will download the zig toolchain and make the bazel
toolchains available for registration and usage.
toolchains available for registration and usage. If you do nothing else, this
may work. The `.bazelrc` snippet instructs Bazel to use the registered "new
kinds of toolchains". All above are required regardless of how wants to use it.
The next steps depend on how one wants to use bazel-zig-cc. The descriptions
below is a gentle introduction to C++ toolchains from "user's perspective" too.
The next steps depend on your use case.
## Use case: manually build a single target with a specific zig cc toolchain
## I want to manually build a single target with a specific zig cc toolchain
This option is least disruptive to the workflow compared to no hermetic C++
toolchain, and works best when trying out or getting started with bazel-zig-cc
for a subset of targets.
You may explicitly request Bazel to use a specific toolchain (compatible with
the specified platform). For example, if you wish to compile a specific binary
(or run tests) on linux/amd64/musl, you may specify:
To request Bazel to use a specific toolchain (compatible with the specified
platform) for build/tests/whatever on linux-amd64-musl, do:
```
bazel build \
@ -66,13 +75,82 @@ bazel build \
//test/go:go
```
This registers the toolchain `@zig_sdk//toolchain:linux_arm64_musl` for linux
arm64 targets. This toolchains links code statically with musl. We also specify
that we want to build //test/go:go for linux arm64.
There are a few things going on here, let's try to dissect them.
## I want to use zig cc as the default compiler
### Option `--platforms @zig_sdk//platform:linux_arm64`
Specifies that the our target platform is `linux_arm64`, which resolves into:
```
$ bazel query --output=build @zig_sdk//platform:linux_arm64
platform(
name = "linux_arm64",
generator_name = "linux_arm64",
generator_function = "declare_platforms",
generator_location = "platform/BUILD:7:18",
constraint_values = ["@platforms//os:linux", "@platforms//cpu:aarch64"],
)
```
`constraint_values` instructs Bazel to be looking for a **toolchain** that is
compatible with (in Bazelspeak, `target_compatible_with`) **all of the**
`["@platforms//os:linux", "@platforms//cpu:aarch64"]`.
### Option `--toolchains=@zig_sdk//toolchain:linux_arm64_musl`
Inspect first:
```
$ bazel query --output=build @zig_sdk//toolchain:linux_arm64_musl
toolchain(
name = "linux_arm64_musl",
generator_name = "linux_arm64_musl",
generator_function = "declare_toolchains",
generator_location = "toolchain/BUILD:7:19",
toolchain_type = "@bazel_tools//tools/cpp:toolchain_type",
target_compatible_with = [
"@platforms//os:linux",
"@platforms//cpu:aarch64",
"@zig_sdk//libc:unconstrained",
],
toolchain = "@zig_sdk//private:aarch64-linux-musl_cc",
)
```
The above means toolchain is compatible with platforms that include
`@platforms//os:linux`, `@platforms//cpu:aarch64` (an alias to
`@platforms//cpu:arm64`) and `@zig_sdk//libc:unconstrained`. For a platform to
pick up the right toolchain, the toolchain's `target_compatible_with` must be
equivalent or a superset to the platforms `constraint_values`. Since the
toolchain is a superset (therefore, `libc:unconstrained` does not matter here),
the platform is compatible with this toolchain. As a result, `--platforms
@zig_sdk//platform:linux_amd64` causes Bazel to select a toolchain
`@zig_sdk//platform:linux_arm64_musl` (because it satisfies all constraints),
which will compile and link the C/C++ code with musl.
`@zig_sdk//libc:unconstrained` will become important later.
### Same as above, less typing (with `--config`)
Specifying the platform and toolchain for every target may become burdensome,
so they can be put used via `--config`. For example, append this to `.bazelrc`:
```
build:linux_arm64 --platforms @zig_sdk//platform:linux_arm64
build:linux_arm64 --extra_toolchains @zig_sdk//toolchain:linux_arm64_musl
```
And then building to linux-arm64-musl boils down to:
```
bazel build --config=linux_arm64_musl //test/go:go
```
## Use case: always compile with zig cc
Instead of adding the toolchains to `.bazelrc`, they can be added
unconditionally. Append this to `WORKSPACE` after `zig_toolchains(...)`:
Replace the call to `zig_register_toolchains` with
```
register_toolchains(
"@zig_sdk//toolchain:linux_amd64_gnu.2.19",
@ -82,100 +160,93 @@ register_toolchains(
)
```
The snippets above will download the zig toolchain and register it for the
following configurations:
Append this to `.bazelrc`:
- `toolchain:linux_amd64_gnu.2.19` for `["@platforms//os:linux", "@platforms//cpu:x86_64", "@zig_sdk//libc:unconstrained"]`.
- `toolchain:linux_arm64_gnu.2.28` for `["@platforms//os:linux", "@platforms//cpu:aarch64", "@zig_sdk//libc:unconstrained"]`.
- `toolchain:darwin_arm64` for `["@platforms//os:macos", "@platforms//cpu:x86_64"]`.
- `toolchain:darwin_arm64` for `["@platforms//os:macos", "@platforms//cpu:aarch64"]`.
> ### Naming
>
> Both Go and Bazel naming schemes are accepted. For convenience with
> Go, the following Go-style toolchain aliases are created:
>
> |Bazel (zig) name | Go name |
> |---------------- | -------- |
> |`x86_64` | `amd64` |
> |`aarch64` | `arm64` |
> |`macos` | `darwin` |
>
> For example, the toolchain `linux_amd64_gnu.2.28` is aliased to
> `x86_64-linux-gnu.2.28`. To find out which toolchains can be registered or
> used, run:
>
> ```
> $ bazel query @zig_sdk//toolchain/...
> ```
> ### Disabling the default bazel cc toolchain
>
> It may be useful to disable the default toolchain that bazel configures for
> you, so that configuration issues can be caught early on:
>
> .bazelrc
> ```
> build:zig_cc --action_env BAZEL_DO_NOT_DETECT_CPP_TOOLCHAIN=1
> ```
>
> This is not documented in bazel, so use at your own peril.
## I want to start using the zig cc toolchain gradually
You can register your zig cc toolchains under a config in your .bazelrc
```
build:zig_cc --extra_toolchains @zig_sdk//toolchain:linux_amd64_gnu.2.19
build:zig_cc --extra_toolchains @zig_sdk//toolchain:linux_arm64_gnu.2.28
build:zig_cc --extra_toolchains @zig_sdk//toolchain:darwin_amd64
build:zig_cc --extra_toolchains @zig_sdk//toolchain:darwin_arm64
build --action_env BAZEL_DO_NOT_DETECT_CPP_TOOLCHAIN=1
```
Then for your builds/tests you need to specify that the `zig_cc` config
should be used:
From Bazel's perspective, this is almost equivalent to always specifying
`--extra_toolchains` on every `bazel <...>` command-line invocation. It also
means there is no way to disable the toolchain with the command line. This is
useful if you find bazel-zig-cc useful enough to compile for all of your
targets and tools.
With `BAZEL_DO_NOT_DETECT_CPP_TOOLCHAIN=1` Bazel stops detecting the default
host toolchain. Configuring toolchains is complicated enough, and the
auto-detection (read: fallback to non-hermetic toolchain) is a footgun best
avoided. This option is not documented in bazel, so may break. If you intend to
use the hermetic toolchain exclusively, it won't hurt.
## Use case: zig-cc for targets for multiple libc variants
When some targets need to be build with different libcs (either different
versions of glibc or musl), use a linux toolchain from
`@zig_sdk//libc_aware/toolchains:<...>`. The toolchain will only be selected
when building for a specific libc. For example, in `WORKSPACE`:
```
bazel build --config zig_cc //test/go:go
register_toolchains(
"@zig_sdk//libc_aware/toolchain:linux_amd64_gnu.2.19",
"@zig_sdk//libc_aware/toolchain:linux_amd64_gnu.2.28",
"@zig_sdk//libc_aware/toolchain:x86_64-linux-musl",
)
```
You can build a target for a different platform like so:
What does `@zig_sdk//libc_aware/toolchain:linux_amd64_gnu.2.19` mean?
```
bazel build --config zig_cc \
--platforms @zig_sdk//platform:linux_arm64 \
//test/go:go
$ bazel query --output=build @zig_sdk//libc_aware/toolchain:linux_amd64_gnu.2.19 |& grep target
target_compatible_with = ["@platforms//os:linux", "@platforms//cpu:x86_64", "@zig_sdk//libc:gnu.2.19"],
```
## I want to use zig to build targets for multiple libc variants
To see how this relates to the platform:
If you have targets that need to be build with different glibc versions or with
musl, you can register a linux toolchain declared under `libc_aware/toolchains`.
It will only be selected when building for a specific libc version. For example
- `libc_aware/toolchain:linux_amd64_gnu.2.19` for `["@platforms//os:linux", "@platforms//cpu:x86_64", "@zig_sdk//libc:gnu.2.19"]`.
- `libc_aware/toolchain:linux_amd64_gnu.2.28` for `["@platforms//os:linux", "@platforms//cpu:x86_64", "@zig_sdk//libc:gnu.2.28"]`.
- `libc_aware/toolchain:x86_64-linux-musl` for `["@platforms//os:linux", "@platforms//cpu:x86_64", "@zig_sdk//libc:musl"]`.
With these toolchains registered, you can build a project for a specific libc
aware platform:
```
$ bazel build --platforms @zig_sdk//libc_aware/platform:linux_amd64_gnu.2.19 //test/go:go
$ bazel build --platforms @zig_sdk//libc_aware/platform:linux_amd64_gnu.2.28 //test/go:go
$ bazel build --platforms @zig_sdk//libc_aware/platform:linux_amd64_musl //test/go:go
$ bazel query --output=build @zig_sdk//libc_aware/platform:linux_amd64_gnu.2.19 |& grep constraint
constraint_values = ["@platforms//os:linux", "@platforms//cpu:x86_64", "@zig_sdk//libc:gnu.2.19"],
```
You can see the list of libc aware toolchains and platforms by running:
In this case, the platform's `constraint_values` and toolchain's
`target_compatible_with` are identical, causing Bazel to select the right
toolchain for the requested platform. With these toolchains registered, one can
build a project for a specific libc-aware platform; it will select the
appropriate toolchain:
```
$ bazel run --platforms @zig_sdk//libc_aware/platform:linux_amd64_gnu.2.19 //test/c:which_libc
glibc_2.19
$ bazel run --platforms @zig_sdk//libc_aware/platform:linux_amd64_gnu.2.28 //test/c:which_libc
glibc_2.28
$ bazel run --platforms @zig_sdk//libc_aware/platform:linux_amd64_musl //test/c:which_libc
non_glibc
$ bazel run --run_under=file --platforms @zig_sdk//libc_aware/platform:linux_arm64_gnu.2.28 //test/c:which_libc
which_libc: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, for GNU/Linux 2.0.0, stripped
```
To the list of libc aware toolchains and platforms:
```
$ bazel query @zig_sdk//libc_aware/toolchain/...
$ bazel query @zig_sdk//libc_aware/platform/...
```
This is especially useful if you are relying on [transitions][transitions], as
transitioning `extra_platforms` will cause your host tools to be rebuilt with
the specific libc version, which takes time, and your host may not be able to
run them.
Libc-aware toolchains are especially useful when relying on
[transitions][transitions], as transitioning `extra_platforms` will cause the
host tools to be rebuilt with the specific libc version, which takes time; also
the build host may not be able to run them if, say, target glibc version is
newer than on the host. Some tests in this repository (under `test/`) are using
transitions; you may check out how it's done.
The `@zig_sdk//libc:variant` constraint is necessary to select a matching
toolchain. Remember: the toolchain's `target_compatible_with` must be
equivalent or a superset of the platform's `constraint_values`. This is why
both libc-aware platforms and libc-aware toolchains reside in their own
namespace; if we try to mix non-libc-aware to libc-aware, confusion ensues.
To use the libc constraints in the project's platform definitions, add a
`@zig_sdk//libc:variant` constraint to them. See the list of available values:
The `@zig_sdk//libc:variant` constraint is used to select a matching toolchain.
If you are using your own platform definitions, add a `@zig_sdk//libc:variant`
constraint to them. See the list of available values:
```
$ bazel query "attr(constraint_setting, @zig_sdk//libc:variant, @zig_sdk//...)"
```
@ -183,8 +254,28 @@ $ bazel query "attr(constraint_setting, @zig_sdk//libc:variant, @zig_sdk//...)"
`@zig_sdk//libc:unconstrained` is a special value that indicates that no value
for the constraint is specified. The non libc aware linux toolchains are only
compatible with this value to prevent accidental silent fallthrough to them.
This is a guardrail. Thanks, future me!
# UBSAN and "SIGILL: Illegal Instruction"
# Note: Naming
Both Go and Bazel naming schemes are accepted. For convenience with
Go, the following Go-style toolchain aliases are created:
|Bazel (zig) name | Go name |
|---------------- | -------- |
|`x86_64` | `amd64` |
|`aarch64` | `arm64` |
|`macos` | `darwin` |
For example, the toolchain `linux_amd64_gnu.2.28` is aliased to
`x86_64-linux-gnu.2.28`. To find out which toolchains can be registered or
used, run:
```
$ bazel query @zig_sdk//toolchain/...
```
# Note: UBSAN and "SIGILL: Illegal Instruction"
`zig cc` differs from "mainstream" compilers by [enabling UBSAN by
default][ubsan1]. Which means your program may compile successfully and crash
@ -197,6 +288,7 @@ SIGILL: illegal instruction
This is by design: it encourages program authors to fix the undefined behavior.
There are [many ways][ubsan2] to find the undefined behavior.
# Known Issues In bazel-zig-cc
These are the things you may stumble into when using bazel-zig-cc. I am

View File

@ -50,8 +50,17 @@ load(
zig_toolchains()
register_toolchains(
# if no `--platform` is selected, these toolchains will be used.
"@zig_sdk//toolchain:linux_amd64_gnu.2.19",
"@zig_sdk//toolchain:linux_arm64_gnu.2.28",
"@zig_sdk//toolchain:darwin_amd64",
"@zig_sdk//toolchain:darwin_arm64",
# when a libc-aware platform is selected, these will be used. arm64:
"@zig_sdk//libc_aware/toolchain:linux_arm64_gnu.2.28",
"@zig_sdk//libc_aware/toolchain:linux_arm64_musl",
# ditto, amd64:
"@zig_sdk//libc_aware/toolchain:linux_amd64_gnu.2.19",
"@zig_sdk//libc_aware/toolchain:linux_amd64_gnu.2.28",
"@zig_sdk//libc_aware/toolchain:linux_amd64_musl",
)

View File

@ -3,9 +3,9 @@
int main() {
#ifdef __GLIBC__
printf("glibc_%d.%d", __GLIBC__, __GLIBC_MINOR__);
printf("glibc_%d.%d\n", __GLIBC__, __GLIBC_MINOR__);
#else
puts("non-glibc");
printf("non-glibc\n");
#endif
return 0;
}

View File

@ -28,7 +28,7 @@ _GLIBCS = [
"2.34",
]
LIBCS = ["musl", "gnu"] + ["gnu.{}".format(glibc) for glibc in _GLIBCS]
LIBCS = ["musl"] + ["gnu.{}".format(glibc) for glibc in _GLIBCS]
def target_structs():
ret = []