cross-compile with NixOS and deploy that shit continuously

Alright the premise of this one is way simpler than how we got there. We’ve got a raspberry pi 2, and we wanted to set it up to do some system monitoring. Pretty simple stuff ultimately: it’s got an FTDI serial adapter and an ST-Link both plugged in over USB to monitor a long computer that we’re doing work on right now. It’s also got an ethernet connection to that computer for netbooting, and then it’s bridging that connection to the rest of our network over a second ethernet port. We could get most of the way to what we wanted with Alpine Linux or even raspbian, but we’re running humility to do the ST-Link side of things, and there’s no way in hell I’m waiting around for that to compile an a raspi2. So, cross compile right? Yeah, but cross compiling sucks. Unless NixOS can save us? Turns out it can.

Look I’m not really a NixOS girl usually, but I’ve been coming around to it. Some of my roommates are really selling me on it lately, and I’ve been using it for creating x86 live ISOs, so I figured that it had a shot of being good here. It’s uh. Well it’s good once you get there, but very little of what I’m doing is directly documented (main reason I’m writing this, to teach that knowledge forward). And the interaction between cross-compilation and nix flakes still kinda sucks. We’ll be grappling with that a few times in this post. It’s worth it though, it’s way better than dealing with cross toolchains directly.

Anyways due to that lack of documentation and my inexperience with NixOS this would not be at all possible without help from Xe / open skies / ckie. They did most of telling me what to look at and figuring out how to get things to work. I just put the pieces together.

Ok let’s get on with it.

There’s a few discrete things we want to do here:

Humility

Let’s start with humility. This is a little unintuitive. Here’s a flake.nix that I dropped into the humility repo at some random commit that happened to be on main at the time:

{
  description = "debugger for Hubris";

  inputs = {
    nixpkgs.url = "github:nixos/nixpkgs/nixos-23.05";
    flake-utils.url = "github:numtide/flake-utils";
  };

  outputs = { self, nixpkgs, flake-utils }:
    let system = flake-utils.lib.system;
    in flake-utils.lib.eachSystem [
      system.x86_64-linux
      system.aarch64-linux
      system.armv7l-linux
    ] (system:
      let
        pkgs = nixpkgs.legacyPackages.${system};
        build-humility = (pkgs:
          pkgs.rustPlatform.buildRustPackage {
            pname = "humility";
            version = "20230526";

            src = ./.;

            cargoSha256 = "sha256-+2JAuY6zQkepLrbRKII6rOUJYQw6Psq92fIiE0Gm1Ns=";
            buildInputs = [ pkgs.libudev-zero ];
            nativeBuildInputs = [ pkgs.pkg-config pkgs.cargo-readme ];

            meta = with pkgs.lib; {
              description = "debugger for Hubris";
              homepage = "https://github.com/oxidecomputer/humility";
              license = licenses.mpl20;
              mainProgram = "humility";
            };
          });
      in rec {
        packages = rec {
          humility = build-humility pkgs;
          humility-cross-armv7l-linux =
            build-humility pkgs.pkgsCross.armv7l-hf-multiplatform;
          humility-cross-aarch64-linux =
            build-humility pkgs.pkgsCross.aarch64-multiplatform;
        };

        defaultPackage = packages.humility;

      });

}

Ok so remember, we want to be able to do a native compilation and a cross compilation. Because we’re cross compiling we need to think about the distinction between the host system (the thing running the compiler) and the target system (the thing running the code).

I’m using flake-utils.lib.eachSystem to iterate over possible host systems. Realistically I should just list every <arch>-linux combo here so someone using my flake could attempt to compile this from any host. I mean really there’s no reason not to just use anySystem I guess, if it fails it fails. But yeah, the point is, we’re iterating over all possible host systems.

Next up we have built-humility. This is a function which takes in some version of nixpkgs and defines a build of humility for that nixpkgs. That’s confusing until I explain how we’re using it.

So down below we have this in the packages section:

humility = build-humility pkgs;
humility-cross-armv7l-linux =
  build-humility pkgs.pkgsCross.armv7l-hf-multiplatform;
humility-cross-aarch64-linux =
  build-humility pkgs.pkgsCross.aarch64-multiplatform;

humility here defines a build where the target is the same as the host. So like. You’re on an x86_64 computer, or an aarch64 computer or whatever. you want to just compile and use humility. You use the humility package. It compiles something you can run on the system you’re on right now.

Next we have humility-cross-armv7l-linux which defines a build where the target is armv7l-linux. We use pkgs.pkgsCross.armv7l-hf-multiplatform which gives us an alternate view into nixpkgs where every build is defined as being cross compiled instead of native compiled. The host is whatever system we happen to be running on, its any of the systems we passed in to flake-utils.lib.eachSystem up above. So like in my flake here it could be an x86_64 or an aarch64 system, or I could go add riscv or powerpc to the list of potential build hosts if I was feeling ambitious. Really most things should work.

Then we’ve got humility-cross-aarch64-linux. Same thing as the armv7l-linux one, but now we’re targeting aarch64 from whatever our host is. There’s probably some way to iterate over all possible targets to make this better than just listing them out one by one.

This is pretty cool. You can run

nix build .#humility-cross-armv7l-linux

and it will build the entire dependency chain and then build humility! This will take kind of awhile your first time doing it unless you have a lot of computer because when I say it builds the entire dependency chain I mean the entire dependency chain, no binary cache available. This is the downside to using pkgsCross, and so you’ll have some bootstrapping overhead from this. In the case of armv7l that’s hardly a downside though because armv7l doesn’t have an official binary cache anyway and we didn’t feel like trying to figure out how to use a community one.

Why not qemu-user

You may have have run into an alternative way to do cross-compilations with flakes wherein you build via for example .#defaultPackages.armv7l-linux. This works very differently: instead of actually cross compiling, it instead does a “native compile”, but emulates the armv7l instruction set in userspace using qemu-user. This is really cool, and we’ll actually be using qemu-user emulation later in this post, but it’s also slow as dirt because you’re emulating the entire compiler. rustc is slow enough as it is, it doesn’t need help being slower.

Plus, since there’s no binary cache for armv7l, we’d have to build the entire dependency tree this way. That would take me like days. or weeks. I dunno.

Still, in some complex situations, cross-compilation doesn’t work, and your options will lay between qemu-user emulation, full system emulation in a VM, or trying to debug/fix the cross-comp up your dependency tree. In that case, pick your poison.

Bootable raspi image

Time for another flake. I’ll give you the minimal flake that gets something booting and then we’ll go from there:

{
  description = "Build image";
  # update to whatever version
  inputs.nixpkgs.url = "github:nixos/nixpkgs/nixos-23.05";

  outputs = { self, nixpkgs }: rec {
    nixosConfigurations.vulpix =
      nixpkgs.legacyPackages.x86_64-linux.pkgsCross.armv7l-hf-multiplatform.nixos {

        imports = [
          "${nixpkgs}/nixos/modules/installer/sd-card/sd-image-armv7l-multiplatform.nix"
          nixosModules.vulpix
        ];
      };
    images.vulpix = nixosConfigurations.vulpix.config.system.build.sdImage;

    nixosModules.vulpix = ({ lib, config, pkgs, ... }: {
      environment.systemPackages = with pkgs; [
        neofetch
      ];

      services.openssh.enable = true;

      users.users.root.openssh.authorizedKeys.keys = [
        "ssh-ed255119 AAAAAAAAAAAAAAAAAAsdfgjgkly idk i dont speak bottom"
      ];

      networking.hostName = "vulpix";
    });
  };
}

Ok so you put in a real SSH key in there but this is enough to get a built image. Run

nix build .#images.vulpix

and it’ll cross-compile a shitload of packages and a mainline linux kernel and load it up into result/sd-image/somethingorother.img.zst. Do a

zstdcat <that file.img.zst> | sudo dd of=/dev/sdWhatever bs=4M status=progress oflag=direct

and you will have a bootable SD card. It even outputs u-boot and kernel spew to the serial console! fuck yeah. The thing to pay attention to here is nixpkgs.legacyPackages.x86_64-linux.pkgsCross.armv7l-hf-multiplatform.nixos. We’re using this to actually cross-compile everything, similar to how we did in the flake.

Here I’ve hardcoded the host system to x86_64-linux because I don’t really care about trying to build this thing on other hosts right now, but we could do the same trick as with humility, using flake-utils to make it generic across multiple host builder architectures. Truthfully I just don’t feel like making that change and re-testing it before finishing this blog post.

dealing with pi bullshit

For the pi, I need you to hold the fuck up and maybe don’t do use the config I just gave you. Mainline kernel might work for you, but for us the st-link would just NOT work on mainline on the pi. I don’t know why. It was causing libusb error spam and breaking shit, so we needed to use the raspi vendor kernel.

BUT! Using the vendor kernel is different in some other exciting ways.

First off, out of the box you’ll get this error somewhere in the steps to building the SD image: modprobe: FATAL: Module ahci not found in directory /nix/store/gl48ccw2i45p80bkr43fpqpqi3xxw93v-linux-armv7l-u nknown-linux-gnueabihf-6.1.21-1.20230405-modules/lib/modules/6.1.21

exciting right? There’s a workaround that we found on github.

Ok so the other issue is there’s some kernel bug I don’t understand that caused one or both of the ethernet adapters to fail out and not come up. We found this thread about a similar thing that suggested setting coherent_pool=4M in the kernel parameters. We tried that and it worked so. lol. lmao i guess. whatever.

{
  description = "Build image";
  inputs.nixpkgs.url = "github:nixos/nixpkgs/nixos-23.05";

  outputs = { self, nixpkgs }: rec {
    nixosConfigurations.vulpix =
      nixpkgs.legacyPackages.x86_64-linux.pkgsCross.armv7l-hf-multiplatform.nixos {

        imports = [
          "${nixpkgs}/nixos/modules/installer/sd-card/sd-image-armv7l-multiplatform.nix"
          nixosModules.vulpix
        ];
      };
    images.vulpix = nixosConfigurations.vulpix.config.system.build.sdImage;

    nixosModules.vulpix = ({ lib, config, pkgs, ... }: {
      environment.systemPackages = with pkgs; [
        neofetch
      ];

      services.openssh.enable = true;

      # deal with that "module ahci not found" error
      nixpkgs.overlays = [
        (final: super: {
          makeModulesClosure = x:
            super.makeModulesClosure (x // { allowMissing = true; });
        })
      ];

      users.users.root.openssh.authorizedKeys.keys = [
        "ssh-ed255119 AAAAAAAAAAAAAAAAAAsdfgjgkly idk i dont speak bottom"
      ];

      networking.hostName = "vulpix";

      # good luck
      # needed for the stlink to work
      boot.kernelPackages = lib.mkForce pkgs.linuxKernel.packages.linux_rpi2;

      # if you don't have this and you have 2 network devices plugged in
      # with the rpi kernel then networking breaks due to kernel bugs. lol.
      boot.kernelParams = [ "coherent_pool=4M" ];
    });
  };
}

Also don’t get me wrong, as annoying as the pi stuff is, this is still shockingly painless compared to what dealing with this sort of problem often looks like with other ditros/distro builders.

let’s add humility

Adding humility from the flake we defined previously is easy. We just add that flake as an input, and then add humility.packages.x86_64-linux.humility-cross-armv7l-linux to the environment.systemPackages. I don’t want to drop another full copy of the config with that change, and I want to leave these configs copy-pastable for your personal use, so if you need to see a full example config with humility imported, click this link for flake-with-humility.nix. Again, that x86_64-linux could be made generic across multiple build host architectures, but I didn’t bother.

Deploy Changes

I don’t want to pull the SD card out and re-flash it every time I make changes. It’s annoying, it wipes any persistent data I’ve put on there, it wastes write-cycles on the flash. There is a better way. We’re using deploy-rs because Xe recommended it to us, though we think there’s also something called “Morph” which fills a similar niche.

With deploy-rs, all you have to do is import it as an input and add a new deploy output to your flake, and then you can update the system on the fly by running nix run github:serokell/deploy-rs in the repo your flake is in. Or rather, that’s almost all you have to do. Here’s that section, see if you can spot the catch:

deploy.nodes.vulpix = {
  profiles.system = { 
    user = "root";
    path = deploy-rs.lib.x86_64-linux.activate.nixos nixosConfigurations.vulpix;
  };

  # this is how it ssh's into the target system to send packages/configs over.
  sshUser = "root";
  hostname = "host.of.the.system.that.it.should.ssh.into";
};

Yeah you see that x86_64-linux? That’s a binary that’s going to run on the target. Which is notably armv7l-linux for us. So… sigh, ok look, here’s where cross comp fails us. deploy-rs’s flake doesn’t support armv7l-linux, for no real reason other than the list it uses for supported systems doesn’t include it. We could fork the flake and add it, and then we could actually use armv7l-linux. But that will try and do the qemu-user compile which, as previously mentioned, is utter hell. If you’re targeting an aarch64 system from an x86_64 host maybe you don’t care because I think deploy-rs has a binary cache to cover you there. But in this case, we just left it as x86_64-linux, and took a different option, adding this to our raspi system config:

# needed for deploy-rs
boot.binfmt.emulatedSystems = [ "x86_64-linux" ];

Yes, we’re going to emulate the x86_64-linux binary on the 900MHz processor of the pi. This is actually fine because it doesn’t actually have to do much computationally, and it’s a rust binary so we’re at least emulating native code instead of like, the python interpreter. It’s genuinely not a problem to do this, I 100% recommend it.

At this point if you already flashed your SD card while following along, sorry, you’ll need to re-flash the SD card with the emulatedSystems change before you can start using deploy-rs. AFTER you do that, with a flake like the one below, you can start using deploy-rs to build new packages and send changes over.

{
  description = "Build image";
  inputs.nixpkgs.url = "github:nixos/nixpkgs/nixos-23.05";
  inputs.deploy-rs.url = "github:serokell/deploy-rs";

  outputs = { self, nixpkgs, deploy-rs }: rec {
    nixosConfigurations.vulpix =
      nixpkgs.legacyPackages.x86_64-linux.pkgsCross.armv7l-hf-multiplatform.nixos {

        imports = [
          "${nixpkgs}/nixos/modules/installer/sd-card/sd-image-armv7l-multiplatform.nix"
          nixosModules.vulpix
        ];
      };
    images.vulpix = nixosConfigurations.vulpix.config.system.build.sdImage;

    nixosModules.vulpix = ({ lib, config, pkgs, ... }: {
      environment.systemPackages = with pkgs; [
        neofetch
      ];

      nixpkgs.overlays = [
        (final: super: {
          makeModulesClosure = x:
            super.makeModulesClosure (x // { allowMissing = true; });
        })
      ];

      services.openssh.enable = true;

      users.users.root.openssh.authorizedKeys.keys = [
        "ssh-ed255119 AAAAAAAAAAAAAAAAAAsdfgjgkly idk i dont speak bottom"
      ];

      networking.hostName = "vulpix";

      # needed for deploy-rs
      boot.binfmt.emulatedSystems = [ "x86_64-linux" ];

      # good luck
      # needed for the stlink to work
      boot.kernelPackages = lib.mkForce pkgs.linuxKernel.packages.linux_rpi2;

      # if you don't have this and you have 2 network devices plugged in
      # with the rpi kernel then networking breaks due to kernel bugs. lol.
      boot.kernelParams = [ "coherent_pool=4M" ];
    });

    
    deploy.nodes.vulpix = {
      profiles.system = { 
        user = "root";
        path = deploy-rs.lib.x86_64-linux.activate.nixos nixosConfigurations.vulpix;
      };

      # this is how it ssh's into the target system to send packages/configs over.
      sshUser = "root";
      hostname = "host.of.the.system.that.it.should.ssh.into";
    };
  };
}

Any time you change this, you just run nix run github:serokell/deploy-rs and your changes are delivered! Basically the same as if you were editing a configuration.nix on a normal NixOS system and doing nixos-rebuild switch or whatever it is (sorry if that’s wrong I don’t use NixOS on my desktop sorry).

Here’s the most kick-ass part of this, is that it updates the boot configurations properly too. And, since we have a working u-boot console, we can actually choose which boot configuration to use at startup:

switch to partitions #0, OK
mmc0 is current device
Scanning mmc 0:2...
Found /boot/extlinux/extlinux.conf
Retrieving file: /boot/extlinux/extlinux.conf
------------------------------------------------------------
1:      NixOS - Default
2:      NixOS - Configuration 4 (2023-06-07 06:31 - 23.05pre-git)
3:      NixOS - Configuration 3 (2023-06-06 01:46 - 23.05pre-git)
4:      NixOS - Configuration 2 (2023-06-06 01:44 - 23.05pre-git)
5:      NixOS - Configuration 1 (1970-01-01 00:00 - 23.05pre-git)
Enter choice: 

It auto-boots into the latest one after a delay, but you can just go ahead and pick something else.

This saved my ass multiple times, both when trying to switch from the mainline kernel to the rpi kernel, and when I accidentally rearranged the network adapter to a different USB port and broke my network bridge configuration, because I was able to just boot back into a previously working config and then re-deploy a new configuration from there. No bullshit having to take the SD card out and chroot into it from my desktop or something to fix stuff.

So there you go. Cross compile shit to your tiny devices. Send new configs to them. Do it all without it being a royal pain and without it feeling like it’ll fall apart at any moment. I swear I’m not a Nix fan but this stuff is undeniably kinda cool.