Tfw your kernel makes your linker print to STDERR - Gentoo, Mold, and AArch64

2023-02-10

We’ve been passively experimenting with the mold linker for the past six months or so. We’ve got a Quartz64 that hasn’t needed to do much of anything, and because it’s only got 4 efficiency cores, the linker runs long enough that we can actually watch it and see how it’s doing. We’re not really sure if it’s been worth using, and haven’t done any scientific comparisons on it. That said, here’s how we have it set up.

AArch64 Weirdness

There’s a strange interaction between the default kernel config for aarch64 and mold- well, more with mimalloc which is the allocator mold uses. When I say default I mean downloaded straight from kernel.org. Every time mold ran, it’d print this to stderr: unable to allocate aligned OS memory directly, fall back to over-allocation.

Now what does this have to do with the kernel? Well, looking at the mimalloc code we can find this comment:

on 64-bit systems, use the virtual address area after 2TiB for 4MiB aligned allocations

So for 4MiB-aligned allocations, it’s using virtual memory above 2^41 bytes. Looking at the kernel config, I found it set to CONFIG_ARM64_VA_BITS_39. That means that every process only gets 39 bits of virtual memory address space. 39 is famously known for being less than 41 bits. And what do you know, I set it to CONFIG_ARM64_VA_BITS_48 and the problem went away.

This actually caused at least one package to fail the build, and I think a couple, because they were not expecting the linker to start spuriously printing memory allocation problems to stderr.

Gentoo Config

Aside from using mold, we’ve got a few other weird things going on, which is that we’re also using clang with Thin LTO for as many packages as possible. Not everything works with that, so sometimes we have to fall back to gcc, that’s not news to anyone who does this sort of thing.

So first off, the relevant make.conf lines:

# I'm not really clear on whether omit-frame-pointer is default-on in clang yet for O2+
COMMON_FLAGS="-O3 -mcpu=cortex-a55 -fomit-frame-pointer -pipe -fPIC"
CC="clang"
CXX="clang++"
AR="llvm-ar"
NM="llvm-nm"
RANLIB="llvm-ranlib"

# Save the default LDFLAGS so we can restore them for builds that are broken
OLDLDFLAGS="${LDFLAGS}"
# i dont think O2 does anything here?
LDFLAGS="${LDFLAGS} -fuse-ld=mold -rtlib=compiler-rt -unwindlib=libunwind -Wl,-O2 -Wl,--as-needed"

CFLAGS="${COMMON_FLAGS} -flto=thin"
CXXFLAGS="${COMMON_FLAGS} -flto=thin"
FCFLAGS="${COMMON_FLAGS}"
FFLAGS="${COMMON_FLAGS}"

There’s some other fun flags that might be worth passing in to mold (read the man page) but I’m not doing any of that.

Ok so now we need a couple fallbacks for the things that break.

First, the obligatory “compiler-gcc” env:

CC="gcc"
CXX="g++"
AR="${CHOST}-ar"
NM="${CHOST}-nm"
RANLIB="${CHOST}-ranlib"

COMMON_FLAGS="-O2 -mcpu=cortex-a55 -fomit-frame-pointer -pipe"
CFLAGS="${COMMON_FLAGS}"
CXXFLAGS="${COMMON_FLAGS}"
FCFLAGS="${COMMON_FLAGS}"
FFLAGS="${COMMON_FLAGS}"
LDFLAGS="${OLDLDFLAGS} -B/usr/libexec/mold"

Notice this one is still using mold. I haven’t run into any packages yet that need the combo of GCC+Not Mold. Everything I’ve compiled has worked with either

clang + mold
clang + lld
gcc + mold

To that end, here’s clang-without-mold:

LDFLAGS="${OLDLDFLAGS} -fuse-ld=lld -rtlib=compiler-rt -unwindlib=libunwind -Wl,-O2 -Wl,--as-needed"

Now, here’s the exceptions that I have in those two. First, the things I have using gcc:

dev-libs/boost compiler-gcc # everything seems to claim that this is supposed to work so im not sure why its not
sys-devel/gcc compiler-gcc
sci-libs/fftw compiler-gcc # broken for uhhh reasons? this could use clang, its just passing flto=thin to fortran for some reason. also doesnt like mold flags because it uses gnu ld regardless
media-libs/rubberband compiler-gcc # depends on boost, mangling is wrong with clang
dev-util/systemtap compiler-gcc # tapsets.cxx:68:17: error: expected namespace name using namespace __gnu_cxx;
sys-apps/plocate compiler-gcc
dev-java/snappy compiler-gcc # complains about linking libc++ after building with -fPIC, maybe we need to rebuild libc++ with fPIC, but nothing else has complained so idk
sys-devel/binutils compiler-gcc # need to use gcc AR for pgo/lto
games-emulation/mgba compiler-gcc

And here’s what I have using clang with lld:

dev-lang/ruby clang-without-mold    # configure: error: something wrong with LDFLAGS="-Wl,-O1 -Wl,--as-needed -fuse-ld=mold -rtlib=compiler-rt -unwindlib=libunwind -Wl,-O2 -Wl,--as-needed"
x11-libs/cairo clang-without-mold    # cairo can't link with pthread for some reason. i saw imagemagick do it just fine though.
app-emulation/qemu clang-without-mold # sizeof(size_t) doesn't match GLIB_SIZEOF_SIZE_T.
dev-libs/libtomcrypt clang-without-mold # links a file with $CC, then links another file with gcc. ????? might be able to fix with an env that uses -B instead of -fuse-ld for clang too
sys-libs/compiler-rt clang-without-mold


dev-util/cmake clang-without-mold # The C++ compiler does not support C++11 (e.g.  std::unique_ptr).
media-libs/lcms clang-without-mold
net-dns/bind-tools clang-without-mold
dev-util/glslang clang-without-mold
media-libs/mesa clang-without-mold # silent error

A few caveats here. First, some of these packages I compiled a number of mold releases ago and might work now. Secondly, some of the things I’ve switched to GCC might actually be working around LTO-related bugs instead of clang-related bugs. I don’t have a clang-without-thinlto environment, because I haven’t felt like setting one up and adding another variation to try out. But bear that in mind, I highly suspect that turning off thin LTO would have solved a number of them.

Compilation Performance

Honestly, it’s not enough that I’d do it again. I don’t think mold’s to blame here- it’s more just the nature of LTO as far as I can tell. mold will initially use all cores, but for almost everything it very quickly falls down to only using one core for a very long time. This feels very much to me like the LTO step, and for now mold is at the whims of llvm’s LTO plugin in that regard. But if you’re into weird toolchains or you’re not using LTO, I’d say maybe give it a go.