An X11 Apologist Tries Wayland
I think it’s only fair to call me an X apologist. I get incredibly frustrated when people talk about dropping support for X11. I fight back against the notion that some day X11 will be dead and unmaintained, a curiosity of a time before. I’ve spoken to people in my circles at-length about the accessibility tools that Wayland simply hasn’t been capable of supporting that X11 has. A lot of times, I’ve ended this conversation with “Maybe 5 years from now it’ll be good”. Well it’s 5 years in since I first said those words, and you know what, I’m actually pleasantly surprised.
I’ve been using sway for most of my testing, since I already had an i3 config handy. Basically all I had to do was copy my config to .config/sway/config
, change some of the services it was launching to get rid of X11-specific stuff, and add some output configuration. It’s pretty great how painless it was (and the default configs are just fine too, but I’m particular).
Latency and VSync
Once I actually started sway though, I was not happy so see that my mouse felt laggy. It’s not the sort of thing that makes it unusable, but it felt like I had just switched to using a bluetooth mouse. I’ve been around the block long enough to know that sway was almost certainly using a software cursor. The thing is, sway, or more accurately, wlroots, supports hardware cursors. This is dependent on linux’s Direct Rendering Manager (DRM, but not the bad kind) stack providing a cursor layer when queried, which I guess it’s not doing for my GPU? I’m not really sure why that is, as my GPU definitely has a cursor layer (I checked the kernel code), but I haven’t chased this rabbit hole to conclusion yet.
What following this rabbit did teach me though, is something extremely cool about wayland’s rendering model. Wayland wants every frame to be perfect. That means no screen tearing, no half-drawn frames, none of that stuff. If you’ve ever turned vsync on in a video game you know that trying to match to vblank almost always incurs a latency penalty, and it can make input feel sluggish. Out of the box, there’s a bit of that in wayland too, but sway has a way out: max_render_time
.
Sway lets you configure exactly when it starts compositing a frame prior to the vblank interval. If you set max_render_time 1
, it will wait to the very last millisecond before the vblank interval to composite a frame- probably a bad idea unless you have a god tier GPU, because if it takes longer than 1ms to composite and misses the vblank interval then you’ve actually added an entire extra frame’s worth of latency to the situation. So what you do is, you keep stepping this number up 1 millisecond at a time while playing a smooth animation, until you’ve eliminated any stuttering in the animation. And now you have done something X11 cannot do- eliminated screen tearing with the absolute minimum latency cost possible.
This fantastic. It feels fantastic. It even made my software cursor not feel so softwarey, which I’ve never experienced with a software cursor before. I have a pretty bad GPU, but on a higher end card you’d get a huge benefit to this in games. If your card can render the game many times faster than your monitor refresh rate, you can unlock your FPS in the game, tune your max_render_time
to the absolute minimum, and get EXTREMELY low latency while still having absolutely no screen tearing whatsoever.
And like, this is the first time I’ve ever seen the vsync setting in a game actually sync the game up with the vblank interval in a way that matters. It works for games in wine. It’s amazing. I have never experienced gaming on Linux that looked this smooth in my life.
wlroots
This probably isn’t limited to sway, by the way. Most of the compositing logic is actually handled by a library called wlroots, a project that spawned from sway, but is now used in a quite a few other compositors out there. wlroots handles all the business of providing different rendering backends, input handling, screen capture, and so on, so that a developer can do what they actually wanted to do, which is write a window manager. As a result, if one wlroots compositor can do it, there’s a good chance the rest can too.
This is something I’ve noticed over and over while looking into replacements for my X11 tools. Broadly speaking, they all talk about supporting wlroots-based compositors, rather than any compositor in particular. From my perspective, wlroots has sort of become the defacto standard for compositor features, much like how most X software assumes they’re running under Xorg (when really, other X servers exist, and have varying degrees of support for the extensions Xorg supports).
Unifying the Fragments
Unlike the X ecosystem though, there’s no illusion that wlroots is the only game in town. Gnome and KDE are both sort of doing their own thing, and the degree to which software written for them will work on wlroots or vice-versa depends heavily on whether they rely on ecosystem-specific extensions, or use more basic wayland features that all of them support. This is a problem. The solution for higher level pieces of software like browsers, OBS, and anything else that doesn’t want to think about this seems to be something called xdg-desktop-portal.
xdg-desktop-portal was born out of the flatpak project as a way to provide controlled hardware access and desktop integrations to flatpak apps. Flatpak apps are sandboxed in a lot of ways, so this was invented as a broker that can also provide the very nice benefit of asking for user consent before providing functionality to the app requesting it. It even provides more basic features like file-choosers, printing, location services, which aren’t particularly relevant to the wayland experience, but I just want to highlight that this wasn’t originally dreamed up just to address the wayland ecosystem. Critically though, xdg-desktop-portal doesn’t actually provide these features directly, but instead hooks the app up to a backend, like xdg-desktop-portal-gtk, xdg-desktop-portal-gnome, xdg-desktop-portal-kde, and xdg-desktop-portal-wlr. Perhaps you can see where this is going.
You’ve got a unified interface (xdg-desktop-portal) over a common protocol (dbus), with backends for different wayland compositors, and you’re already supporting things that require compositors to implement them separately like screen sharing. It’s only natural that this is basically How This Will Be Done going forward. So for example, as I already mentioned, screen sharing. Both OBS and Google Chrome implement this under wayland through xdg-desktop-portal and pipewire- wait, pipewire?!
Yeah. When pipewire first hit the scene I remember my friends talking about how it could also share video, but they sort of treated it like a novelty thing. Now, its video capabilities have now become an important part of the Wayland ecosystem. I’m not even using pipewire for audio- I’m still using pulseaudio. Regardless, I’ve got pipewire working on my system purely for screencasting. The nitty gritty is a bit mired in callbacks, but if you sort out the logic of screencast-portal.c in OBS you’ll find that OBS is using the xdg-desktop-portal interface to negotiate a screencast session, and at the end of that negotiation it gets a pipewire handle it uses to receive the actual screen data. This actually makes a lot of sense. You need some form of IPC to get the pixels from one place to another, you really don’t want to be doing that over dbus, and pipewire was already right there. It’s a perfect fit.
Fragmentation has long been the elephant in the room for wayland in my opinion. Wayland’s history is well worn with ecosystem fragmentation, and devs have to redo their work multiple times in order to support what are in my view the Primary Three of Gnome, KDE, and wlroots. Still, 3 is better than it could be, and standardization is happening through efforts like xdg-desktop-portal and (more slowly) extensions to the wayland protocol. So I have to admit that even this is in a much better state than it was.
Accessibility
I guess we should talk about my final bugbear that I’ve had with wayland now, which is accessibility. And here, well, things still aren’t perfect, but the ecosystem has started figuring things out. To tackle an extremely simple one- gamma control. Gnome and KDE handle this in their own ecosystems ok, and wlroots actually has support for this now too. The actual tools to use it are a little more sparse, but they’re out there. There’s some tools like gammastep and wlsunset that support this, but I’m used to redshift, so I’ve been using minus7’s redshift fork to do both redshifting and gamma control for my monitor. (EDIT: The maintainer of gammastep let me know that it too is a fork of redshift with fairly minimal changes, and config files will even work with it if you change the [redshift]
section to [general]
, so that may be a better option! I admit I hadn’t looked too closely at it or wlsunset while writing this post.)
Another easy one is screen reading. This was already handled entirely by AT-SPI through the dbus protocol, so this is really in exactly the same state it’s in on X: Not great, but workable. I don’t really use screen reading though for anything other than text-to-speeching long text posts, so it’s fine enough for me.
I also rely on mouse and keyboard automation quite a bit, but thankfully there’s a lot of software now that uses the uinput interface to do input emulation. I use antimicrox for controller mapping and have for years, so I was pleasantly surprised to learn they support it. There’s also ydotool, which provides xdotool functionality in a more generic way, and also has a uinput backend. The effort you need to go through to actually use these depends on how your distribution handles the file permissions of /dev/uinput
. Some of them have it as root:input
, in which case you just need to usermod -a -G input <yourusername>
and then relog to get it working. Others have it as root:root
so you either need to go do some reconfigurations to change its permissions or live with running the software using it as root.
I remember talking at length with the developer of Talon Voice, a voice control/eyetracking tool that works quite well on linux, about the challenges of supporting wayland. The other big thing, aside from how to do input emulation, was whether it was possible to query the list of windows and active focus for context-specific voice commands. I have definitely seen software that does this at this point, since most app panels are their own pieces of software independent of the compositor. And, as you’d expect from them, they can focus windows too. So given that these two problems are solved, it’s now on my list to try and help Talon get Wayland support when I have the energy.
There’s one thing I haven’t been able to solve though, and that’s dwell-click for wlroots. Gnome implements this themselves in the compositor. KDE has KMouseTool, but that’s heavily X-only so I suspect it doesn’t work on wayland. I have my own implementation, rtmouse, which is also X-only. If you’re not aware, this sort of software waits for the user to stop moving their mouse, and then automatically performs a click, typically with audio and/or visual feedback. Naturally, any software doing this needs to know if the user is moving their mouse or performing clicks. There’s not really a good way for me as a software developer to query this data in real time from the compositor. This is a bit more real-time than simple user idle detection, which they do provide. I also need to know when mouse clicks happen so I can cancel my autoclick, to avoid an accidental double-click if the user manually inputs a click. So basically, I’m asking for the mouse equivalent of a keylogger, which is a thing wayland stuff really tries hard not to provide for security reasons.
I think if I wanted to implement this I’d need to try to add some features into wlroots in a way that doesn’t make people die of infosec, and then modify my software to use that, but even then I don’t think I’d ever be able to provide this in a way that’s universal across gnome, KDE, and wlroots.
Wayland Without a GPU
My final interest then, is support for devices without hardware acceleration. I use an 800MHz netbook without a working GPU. Can wayland deal with this? Yes! Gnome and KDE are sort of out of the question by default here since they’re much more GPU-reliant regardless of X or Wayland, so once again I’ll look at wlroots. wlroots as a handful of rendering backends it can use, the primary being GLES2, with a Vulkan backend that’s in there now. But the third backend, pixman, is what we want here. render/pixman/renderer.c has the goods here- this is a software compositing backend for wlroots that uses the pixman library, which has very optimized software routines for pixel manipulation. It’s even got SSE support, which means my little netbook should be in good shape- that thing has up through SSSE3.
I haven’t had the chance to actually test this out yet, but I’m very much looking forward to trying it to see how wlroots compares against X on my netbook. I suspect it’ll do better. There’s a few TODOs in wlroots’ pixman renderer about some places for potential optimizations too, which I’m interested in implementing once I have a setup where I might actually notice a difference.
I expect video playback to be reasonable here too. mpv
has a wlshm
(WayLand SHared Memory) backend that copies pixels into a shared memory buffer, skipping the formalities of creating an OpenGL context, calling glTexSubImage2D
, and hoping llvmpipe will do the right thing and make it fast enough to be usable (it probably won’t). SDL though, I’m a bit worried about SDL. SDL’s wayland backend only supports creating an OpenGL context, and doesn’t have any support for wlshm, so I’m worried that it might be much slower than it needs to be even with applications that are entirely doing software rendering. I might be able to work around this by forcing it though XWayland, and maybe that path will be a bit faster, but we’ll see.
That’s a lot of guessing by me here, but I have a number of other things I need to get done before I can test this on the actual hardware. I’ll be writing a followup whenever I do that.
Closing Thoughts
All in all, I’m very impressed with the work the wayland community has done since I last did a serious look at the state of things. I’m still waiting for a stacking window manager that scratches the same itch for me that icewm does, but I’m following labwc with great interest. At this point though, I’ve established that I can live my life on wayland, and for the time being I am. Not everyone can yet though, and there’s still work to be done. Part of why I’m feeling the urge to transition to wayland is performance benefits, but the other part is so that I’ll be able to help solve the unsolved problems to make it viable for more people.
I don’t think X is ever going to die. Even if it fades away on Linux, there’s a lot of old video hardware that will probably only ever be well supported with real Xorg, on Linux and other OSes such as NetBSD. That stuff is already seeing support dropped in more recent versions of Xorg, and preservationists will need to do digging to find versions that still take advantage of everything the hardware has to offer. But, I understand now why the wayland folks have been talking so highly of it, and how drastically it simplifies the userland stack, and I’m no longer concerned that I’ll wake up to find my netbook has become unusable for modern software.
If you’re into Gnome, Wayland is probably a good experience today out of the box, even if you aren’t a power user. I’m not into Gnome, which is why I haven’t looked at it in this post. If you’re not into Gnome either, but you want to give Wayland a shot, just know what you’re getting into. KDE, I have heard mixed things about, but can’t speak to.
If you want to try a wlroots-based compositor, know that you can probably do everything you want to do, but it’ll be some effort. You’ll need to do some diving into search engines, but the solutions are out there. arewewaylandyet is a good starting point that I’ve been referencing throughout this process, along with the Arch Linux Wayland Pages and Gentoo’s Wayland Desktop Landscape page. You may also need to hop on IRC and ask some questions. It’s early adopter territory, but if that’s what you’re into, a lot of developers have already put a lot of work into paving the way forward, and it’s worth trying out!