Copying NixOS's Live CD to RAM: a short look at NixOS's early boot

2023-03-07

Due to Circumstances, I want to boot a NixOS live ISO in such a way that the storage medium can be removed after boot-up. This is at a technical level rather simple. NixOS live images store the Nix store in a SquashFS, and everything else is already in RAM via tmpfs anyway. So in theory all we need to do is copy the SquashFS to ram before it gets mounted. But that does raise the question: how is the SquashFS mounted in the first place? And how can we change that? If you were a reasonable person you would read nixos.wiki/wiki/Bootloader, and you would find your answer: pass copytoram in boot.kernelParams. I am not reasonable, and figured it out from the source code, which is what the remainder of this post is about.

Our first clue is in nixos/modules/installer/cd-dvd/iso-image.nix:

  # store them in lib so we can mkImageMediaOverride the
  # entire file system layout in installation media (only)
  config.lib.isoFileSystems = {
    "/" = mkImageMediaOverride
      {
        fsType = "tmpfs";
        options = [ "mode=0755" ];
      };

    # Note that /dev/root is a symlink to the actual root device
    # specified on the kernel command line, created in the stage 1
    # init script.
    "/iso" = mkImageMediaOverride
      { device = "/dev/root";
        neededForBoot = true;
        noCheck = true;
      };

    # In stage 1, mount a tmpfs on top of /nix/store (the squashfs
    # image) to make this a live CD.
    "/nix/.ro-store" = mkImageMediaOverride
      { fsType = "squashfs";
        device = "/iso/nix-store.squashfs";
        options = [ "loop" ];
        neededForBoot = true;
      };

    "/nix/.rw-store" = mkImageMediaOverride
      { fsType = "tmpfs";
        options = [ "mode=0755" ];
        neededForBoot = true;
      };

    "/nix/store" = mkImageMediaOverride
      { fsType = "overlay";
        device = "overlay";
        options = [
          "lowerdir=/nix/.ro-store"
          "upperdir=/nix/.rw-store/store"
          "workdir=/nix/.rw-store/work"
        ];
        depends = [
          "/nix/.ro-store"
          "/nix/.rw-store/store"
          "/nix/.rw-store/work"
        ];
      };
  };

/ will be a tmpfs. /iso will be the USB stick or DVD or whatever we’re booting from. The SquashFS inside of it gets mounted and combined with a second tmpfs to provide the Nix store from the boot media while alloying temporary in-memory additions. But how does this translate into the actual bootup process?

Let’s look at nixos/modules/system/boot/stage-1-init.sh. This is the script that gets installed as /init in the initramfs, so it’s the very first thing that gets executed during the bootup process. Here’s the section related to mounting our set of file systems, though, don’t bother reading this whole snippet. I’ll highlight the important things after.

exec 3< @fsInfo@

while read -u 3 mountPoint; do
    read -u 3 device
    read -u 3 fsType
    read -u 3 options

    # !!! Really quick hack to support bind mounts, i.e., where the
    # "device" should be taken relative to /mnt-root, not /.  Assume
    # that every device that starts with / but doesn't start with /dev
    # is a bind mount.
    pseudoDevice=
    case $device in
        /dev/*)
            ;;
        //*)
            # Don't touch SMB/CIFS paths.
            pseudoDevice=1
            ;;
        /*)
            device=/mnt-root$device
            ;;
        *)
            # Not an absolute path; assume that it's a pseudo-device
            # like an NFS path (e.g. "server:/path").
            pseudoDevice=1
            ;;
    esac

    if test -z "$pseudoDevice" && ! waitDevice "$device"; then
        # If it doesn't appear, try to mount it anyway (and
        # probably fail).  This is a fallback for non-device "devices"
        # that we don't properly recognise.
        echo "Timed out waiting for device $device, trying to mount anyway."
    fi

    # Wait once more for the udev queue to empty, just in case it's
    # doing something with $device right now.
    udevadm settle

    # If copytoram is enabled: skip mounting the ISO and copy its content to a tmpfs.
    if [ -n "$copytoram" ] && [ "$device" = /dev/root ] && [ "$mountPoint" = /iso ]; then
      fsType=$(blkid -o value -s TYPE "$device")
      fsSize=$(blockdev --getsize64 "$device" || stat -Lc '%s' "$device")

      mkdir -p /tmp-iso
      mount -t "$fsType" /dev/root /tmp-iso
      mountFS tmpfs /iso size="$fsSize" tmpfs

      cp -r /tmp-iso/* /mnt-root/iso/

      umount /tmp-iso
      rmdir /tmp-iso
      if [ -n "$isoPath" ] && [ $fsType = "iso9660" ] && mountpoint -q /findiso; then
       umount /findiso
      fi
      continue
    fi

    if [ "$mountPoint" = / ] && [ "$device" = tmpfs ] && [ ! -z "$persistence" ]; then
        echo persistence...
        waitDevice "$persistence"
        echo enabling persistence...
        mountFS "$persistence" "$mountPoint" "$persistence_opt" "auto"
        continue
    fi

    mountFS "$device" "$(escapeFstab "$mountPoint")" "$(escapeFstab "$options")" "$fsType"
done

exec 3>&-

So a few things of note. The actual mount data is pulled from some source called fsInfo. That’s provided by nixos/modules/system/boot/stage-1.nix:

fsInfo =
  let f = fs: [ fs.mountPoint (if fs.device != null then fs.device else "/dev/disk/by-label/${fs.label}") fs.fsType (builtins.concatStringsSep "," fs.options) ];
  in pkgs.writeText "initrd-fsinfo" (concatStringsSep "\n" (concatMap f fileSystems));

And fileSystems is that list of mount points we saw earlier! This is converting that list into a file that’s easy to parse line by line from bash at early boot.

However, before we go any further, maybe we don’t need to do any work at all to get our squashfs in ram. See this?

    # If copytoram is enabled: skip mounting the ISO and copy its content to a tmpfs.
    if [ -n "$copytoram" ] && [ "$device" = /dev/root ] && [ "$mountPoint" = /iso ]; then
      # ... snip ...

      mountFS tmpfs /iso size="$fsSize" tmpfs
      cp -r /tmp-iso/* /mnt-root/iso/

      # ... snip ..
    fi

That’s doing literally exactly what I want, mounting a tmpfs on /mnt-root/iso and copying the SquashFS (and the rest of the ISO contents) into it. So how can we enable copytoram? Earlier in the script, there’s a little loop that parses any arguments that were passed in as kernel boot parameters:

for o in $(cat /proc/cmdline); do
    case $o in
        # ... snip ...
        copytoram)
            copytoram=1
            ;;
        # ... snip ... 
    esac
done

So all we need to do to boot from ram is to pass copytoram as a kernel parameter? Sweet! That’s an incredibly simple change to our system definition:

boot.kernelParams = [ "copytoram" ]

Hah! Easy! Let’s make sure it actually worked though. The easiest way for me to tell is maybe by booting up the ISO as a remote-attached ISO through the BMC of one of my computers. It gives me a little read-out of how much data has been loaded over the network. My ISO is 284MiB large, so we should expect about that much data transferred, give or take 1000-vs-1024 measurements. But enough talk, time to boot:

<<< NixOS Stage 1 >>>

loading module loop...
loading module overlay...
loading module dm_mod...
running udev...
Starting version 251.12
starting device mapper and LVM...
mounting tmpfs on /...
waiting for device /dev/root to appear.......
mounting tmpfs on /iso...

There it is, mounting tmpfs on /iso..., exactly what we want to see. And then my console sat here for awhile as my KVM counted the ISO bytes transmitted up to a nice 284MiB. Just to be sure, I’ll detach the ISO, check the mount point, and then checksum the SquashFS.

# mount
tmpfs on /iso
/iso/nix-store.squashfs on /nix/.ro-store

# sha256sum /iso/nix-store.squashfs
4bb86abad14682f73105943b710e26864a1c6f063f01d9728e489ef98f034039 /iso/nix-store.squashfs

It does in fact work! Thanks, NixOS.