Oxide at Home: Propolis says Hello

2022-03-14

So Oxide is making some cool stuff huh? Big metal boxes with lots of computer in them. Servers as they should be! Too bad I can’t afford to buy one for myself… but wait, they’re open-sourcing the software they’re writing to do it. Mom said we can have Oxide at Home!

Oxide at Home:

absolutely awful arrangement of computer hardware in an ATX case

Hardware, with the software haphazardly jammed in

Let’s be clear, I’m not aiming for elegance here. I’m not aiming for enterprise grade either. I want something dirty, something hacky, something that makes you go “what the fuck, why, no???????”.

To that end I’m choosing right at the start to make my life more interesting. Oxide’s software is mostly written for illumos, a direct descendant of OpenSolaris. There’s a handful of illumos distributions out there, but Oxide develops primarily for their distribution called Helios. Their Omicron README (no relation) also mentions OmniOS. Naturally I’m going to use neither of those and make it work on OpenIndiana instead.

You see, I can’t get a copy of Helios right now unless I commit corporate espionage, and OmniOS describes itself as “enterprise”. As I’ve already stated, I am not an enterprise, nor do I plan on becoming one unless Jean-Luc Picard starts taking estrogen and wants to be my captain. Tribblix was also in the running but I couldn’t get the installer to work, so I landed on OpenIndiana.

Anyways, everything I do, I’ll do with the intention of getting it working, not making it good. Expect awful things along the way.

But first, we need to summon Ferris

Oxide, as their name implies, likes to write software in rust. Some of that software wants to use a nightly rust too. Might help to have rustup huh? Well, there’s a couple problems. First, rustup’s install script is not actually as universal as they think it is:

vi@box:~$ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
sh[455]: local: not found [No such file or directory]
sh[456]: local: not found [No such file or directory]
sh[457]: local: not found [No such file or directory]
sh[458]: local: not found [No such file or directory]
sh[202]: local: not found [No such file or directory]
sh[62]: local: not found [No such file or directory]
sh[65]: local: not found [No such file or directory]
sh: line 72: _ext: parameter not set

Fine, whatever, let’s pipe it to bash then,

vi@box:~$ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | bash
ld.so.1: rustup-init: fatal: libgcc_s.so.1: open failed: No such file or directory

Excuse me the fuck? I uh,

vi@box:~$ find / -name 'libgcc_s.so.1'
/usr/sfw/lib/libgcc_s.so.1
/usr/sfw/lib/amd64/libgcc_s.so.1
/usr/pkgsrc/lang/rust/work/rust-1.55.0-x86_64-unknown-illumos/lib/pkgsrc/libgcc_s.so.1
/usr/gcc/7/lib/libgcc_s.so.1
/usr/gcc/7/lib/amd64/libgcc_s.so.1
/usr/gcc/11/lib/amd64/libgcc_s.so.1
/usr/gcc/11/lib/libgcc_s.so.1
/usr/gcc/3.4/lib/libgcc_s.so.1
/usr/gcc/3.4/lib/amd64/libgcc_s.so.1
/usr/gcc/10/lib/libgcc_s.so.1
/usr/gcc/10/lib/amd64/libgcc_s.so.1

What do you want from me, rustup? Well you see, it’s very simple:

vi@box:~$ pkg search file:basename:libgcc_s.so.1
INDEX      ACTION VALUE                                 PACKAGE
basename   file   usr/gcc/8/lib/amd64/libgcc_s.so.1     pkg:/system/library/gcc-8-runtime@8.4.0-2020.0.1.2
basename   file   usr/gcc/8/lib/libgcc_s.so.1           pkg:/system/library/gcc-8-runtime@8.4.0-2020.0.1.2
[... a bunch of other gcc versions skipped ...]
basename   file   usr/lib/amd64/libgcc_s.so.1           pkg:/system/library/gcc-4-runtime@4.9.4-2021.0.0.8
basename   file   usr/lib/libgcc_s.so.1                 pkg:/system/library/gcc-4-runtime@4.9.4-2021.0.0.8

We need gcc-4-runtime! Obviously (/s). Oh we also need g++-4-runtime or we get another missing shared library but I’ll spare you the details.

vi@box:~$ sudo pkg install gcc-4-runtime g++-4-runtime
vi@box:~$ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | bash
info: downloading installer

Welcome to Rust!

FINALLY. Ok.

i have the cargo

What’s a Propolis?

Oxide is making racks of lots of computer. My understanding is that they have a control plane that talks to all the sleds (the blades of computer). Each sled runs a sled agent, and one thing that sled agent can do is start virtual machines. This is where Propolis comes in, as a userspace frontend to the bhyve hypervisor.

This is a great place for us to start because Propolis doesn’t depend on any other services to run. It just sits there and exposes an API to make VMs.

Let’s build it!

vi@box:~/oxide-at-home$ git clone https://github.com/oxidecomputer/propolis
vi@box:~/oxide-at-home$ cd propolis
vi@box:~/oxide-at-home/propolis$ cargo build

This will make target/debug/propolis-cli and target/debug/propolis-server. I copied those over to /usr/local/bin and moved on with my life, just get them on your PATH somehow if you’re following along at home.

Configuring Propolis

Anyway, how do we use this? First we need a config file, and the README provides this helpful example:

bootrom = "/path/to/bootrom/OVMF_CODE.fd"

[block_dev.alpine_iso]
type = "file"
path = "/path/to/alpine-extended-3.12.0-x86_64.iso"

[dev.block0]
driver = "pci-virtio-block"
block_dev = "alpine_iso"
pci-path = "0.4.0"

[dev.net0]
driver = "pci-virtio-viona"
vnic = "vnic_name"
pci-path = "0.5.0"

First question - what the hell is OVMF_CODE.fd? I did a pkg search for it and not a single package has it, but it’s the bootrom used when the VM starts up. Comes from a project called EDK2 I guess? I’m fuzzy on the details, but I followed a trail from the arch linux edk2-ovmf package to this github wiki and eventually this jenkins build artifact folder on the personal website of a qemu dev.

I grabbed the x64 rpm, extracted it a few times with 7zip, and eventually got my hands on OVMF_CODE-pure-efi.fd. This ended up working out so, cool I guess.

EDIT: I have since been informed that the Propolis README has a link to a recommended bootrom. As you’ll soon see, my propensity for not reading READMEs all the way through knows no bounds. I pretty much just copied the example config file out and decided I’d come back to the README if I ran into a problem I couldn’t solve, and unfortunately I’m very good at solving problems. Sorry Oxide folks, thanks for putting up with my bullshit <3.

Next, I downloaded a copy of the alpine-extended iso for 3.15 since that’s the latest right now.

Finally, you see that vnic = line? We need to give it a vnic-type network interface. The README actually explains the correct way to do this but I didn’t bother to read that. I just read the man page and threw stuff at the terminal until it did something useful.

vi@box:~/oxide-at-home$ dladm show-link
LINK        CLASS     MTU    STATE    BRIDGE     OVER
e1000g1     phys      1500   down     --         --
e1000g0     phys      1500   up       --         --
vi@box:~/oxide-at-home/propolis$ sudo dladm create-vnic -l e1000g0 propolis
dladm: invalid link name 'propolis'
vi@box:~/oxide-at-home/propolis$ sudo dladm create-vnic -l e1000g0 e1000g9
vi@box:~/oxide-at-home$ dladm show-link
LINK        CLASS     MTU    STATE    BRIDGE     OVER
e1000g1     phys      1500   down     --         --
e1000g0     phys      1500   up       --         --
e1000g9     vnic      1500   up       --         e1000g0

So this totally breaks naming conventions but I couldn’t figure out what constitutes a “valid link name” from the man page. If I had actually read the README more I would have seen the suggestion of vnic_prop0. You should use that instead! But my config will use my best effort shitpost name instead, since that’s what really happened.

With all that done, my final config file looks a bit like this:

bootrom = "/export/home/vi/oxide-at-home/edk2/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd"

[block_dev.alpine_iso]
type = "file"
path = "/export/home/vi/oxide-at-home/run/alpine-extended-3.15.0-x86_64.iso"

[dev.block0]
driver = "pci-virtio-block"
block_dev = "alpine_iso"
pci-path = "0.4.0"

[dev.net0]
driver = "pci-virtio-viona"
vnic = "e1000g9"
pci-path = "0.5.0"

Using Propolis

With a config file that looked good and all the hubris of a university student on orientation day I started the propolis server.

vi@box:~/oxide-at-home/run$ sudo propolis-server run propolis.toml 127.0.0.1:12400

In another terminal I told propolis to make a VM.

propolis-cli -s 127.0.0.1 new cirno -m 1024 -c 1

Peeking over at the propolis logs, I saw this:

Mar 13 22:36:20.531 INFO Starting server...
Mar 13 22:36:56.204 INFO accepted connection, remote_addr: 127.0.0.1:46363, local_addr: 127.0.0.1:12400
Mar 13 22:36:56.210 INFO request completed, error_message_external: Internal Server Error, error_message_internal: Cannot build instance: No such file or directory (os error 2), response_code: 500, uri: /instances/3915cdd5-3998-4f42-b728-0f8b594afae0, method: PUT, req_id: 0535a501-4467-4f4d-8da5-029e5ed26a20, remote_addr: 127.0.0.1:46363, local_addr: 127.0.0.1:12400

What do you MEAN “No such file or directory”?????

dtrace pony

I tried poking at the code but that went nowhere fast. My usual debugging strategy here is to use strace but we’re in illumos land so we need to use dtrace instead, which is like if someone (Bryan Cantrill) decided strace needed awk built in. Now, I’m usually content to just pipe the whole firehose of strace into awk and filter from there but dtrace is actually pretty neat, if a bit confusing at first. And it’s got a pony. Does strace have a pony? I don’t think so.

I wanted to see all openat invocations, so I grabbed the probe id.

vi@box:~/oxide-at-home/run$ sudo dtrace -l | grep openat
22431    fbt    genunix    openat entry
22432    fbt    genunix    openat return

Then I ran propolis-server with dtrace and tried to make another VM.

vi@box:~/oxide-at-home/run$ sudo dtrace -i '22431 { printf("%s", copyinstr(arg1)) }' -c 'propolis-server run /export/home/vi/oxide-at-home/run/propolis.toml 127.0.0.1:12400' 2>&1 | grep -v -e '/proc' -e '/etc/ttysrch' -e /var/adm/utmpx -e '/dev/pts/3'
dtrace: description '22431 ' matched 1 probe
Mar 13 23:15:02.407 INFO Starting server...
Mar 13 23:15:04.980 INFO accepted connection, remote_addr: 127.0.0.1:40237, local_addr: 127.0.0.1:12400
Mar 13 23:15:04.983 INFO request completed, error_message_external: Internal Server Error, error_message_internal: Cannot build instance: No such file or directory (os error 2), response_code: 500, uri: /instances/b745d636-c8b6-46e5-bb08-839af892b702, method: PUT, req_id: 38dc08e3-e2d5-4561-a328-f48984011a8f, remote_addr: 127.0.0.1:40237, local_addr: 127.0.0.1:12400
 13  22431                     openat:entry /var/ld/64/ld.config
 13  22431                     openat:entry /usr/lib/64/libsqlite3.so.0
 [... snip bunch of random dlls ...]
 13  22431                     openat:entry /etc/certs/ca-certificates.crt
  4  22431                     openat:entry /dev/vmmctl

Hmm what’s /dev/vmmctl? Ha. haha. Remember how I said Propolis is a frontend for bhyve? That’s the bhyve control device. Does it exist?

vi@box:~/oxide-at-home/run$ ls /dev/vmmctl
/dev/vmmctl: No such file or directory

No, no of course it doesn’t, because I forgot to install bhyve. Let’s do that shall we?

vi@box:~/oxide-at-home/run$ sudo pkg install system/bhyve bhyve/firmware brand/bhyve system/library/bhyve

Using Propolis, for real this time

I restarted Propolis, and finally, FINALLY, we can create a VM.

$ vi@box:~/oxide-at-home/run$ propolis-cli -s 127.0.0.1 new cirno -m 1024 -c 1

We have to explicitly turn on VMs after they’re created, and then we can interact with them over serial. propolis-cli can give us a serial connection to the VM, but here I ran into a little snag. The alpine image we’re using attaches the console to VGA by default, so I had to attach to serial first in one terminal, start the VM up in the other, then switch back to the serial connection to stop grub from autobooting.

$ vi@box:~/oxide-at-home/run$ propolis-cli -s 127.0.0.1 serial cirno
$ vi@box:~/oxide-at-home/run$ propolis-cli -s 127.0.0.1 state cirno run

[ grub appears on the serial connection ]

Once grub came up I removed quiet from the linux arguments and added console=ttyS0. I hit the button to boot the system, and at long last, I had victory:

alpine welcome message, it worked

This is where I stopped, to prevent my brain from melting.

What’s next?

I’m not sure! I think sled-agent is the logical next step as we work our way from the ground up trying to build out a fully working deployment (for some definitions of “working” and “deployment”) but we’ll see.

Or maybe I’ll build a server rack out of cardboard. You never know.