swick's blog

Developing Gnome Shell on Fedora Silverblue


Fedora Silverblue is one of the immutable operating systems. Somewhere the OS is being built, resulting in an ostree commit that gets distributed to the machines running Silverblue. On the machines this results in a new deployment which will become active on the next boot, while the previous deployment is still around, ready to be booted into if anything goes wrong. The deployments are read-only.

The resulting stabilility and fault-tolerance is, compared with the traditional single directory tree, mutated by packages, nothing short of incredible.

There are drawbacks, though. The read-only nature of the OS, and /usr in particular, makes some development work-flows unusable. Want to install a -devel package to build your software or just need a compiler? Technically you could manage to do so with rpm-ostree’s layering capabilities (man rpm-ostree) but this is slow, requires a reboot (the apply-live option exists but there are restrictions) and is generally frowned upon.

The general solution to this problem has been two-fold:

  1. Embrace containers
  2. Toolbx

Silverblue ships with podman and other container tools out of the box which allows you to build your software using Dockerfiles or other container native technologies. It allows you to run containers with podman run. For a lot of uses cases this is sufficient.

When this is not sufficient or otherwise undesirable Toolbx will come to the rescue! It allows you to create a pet container with toolbox create $name and to enter it with toolbox enter $name. The great thing about it: it looks and feels just like any fedora system (other operating systems are supported as well), so you can dnf install your compiler and -devel packages. It also shares your home directory and is integrated with the host.

Building and running mutter and gnome-shell inside a toolbox has been possible for a while now thanks to work both on Toolbx and mutter. I’ve continued making sure everything is working as smoothly as possible. It’s possible to run mutter on a TTY and even run all the complicated testing setups (some patches pending).

As great as this is, there are situations where the developed version of gnome-shell needs to to be run in an actual, real session. Maybe even as part of an actual boot, maybe to run it long enough to find some issue that shows up very infrequently. In those cases Toolbx is not enough and we need to somehow get our gnome-shell running on the host.

systemd-sysext and rpm-ostree usroverlay

There are two more mechanisms to modify files in /usr besides rpm-ostree overlays:

  1. rpm-ostree usroverlay
  2. systemd-sysext

Conceptually they are very similar. They use some overlay filesystem on top of the otherwise read-only /usr tree. With rpm-ostree usroverlay this is really all there is to it. You enable it and now you can modify files in /usr.

If we build gnome-shell for example in our toolbox with --prefix=/usr and then call ninja install we’re installing it into /usr of the toolbox container. The host OS is mounted in /run/host in the toolbox so if we just use DESTDIR=/run/host/usr ninja install then it… still doesn’t work because we can’t get the right permissions to modify files on the host from inside the toolbox (root is not mapped to any user, /run/host is nobody). But with some clever use of DESTDIR=/tmp/gnome-shell-install ninja install and rsyncing /tmp/gnome-shell-install to /usr, it is possible to install our gnome-shell from inside the toolbox onto our host OS. At least temporarily. The rpm-ostree usroverlay disappears when the machine powers down.

If we need persistence across reboots, then systemd-sysext seems exactly like what we need. We can just create a new folder in /var/lib/extensions and drop our files in there (it needs a special file to be picked up as extension), use systemd-sysext list to show the available extensions, status to show if they are currently active and merge and unmerge to overlay and remove the overlay of the extensions. Enabling systemd-sysext.service makes systemd overlay the extensions on boot. We can use the same process as before to get our installation from the toolbox to the right place on the host.

The tiny problem

Now that we’ve successfully installed gnome-shell to our host, we can for example just run gnome-shell --nested and see… missing dependencies?

In this GNOME development cycle mutter grew a dependency on libei and this is no issue in the toolbox. We just dnf install libei-devel and we’re done. The problem though: this is not our host and libei will only be available in the toolbox. We can clone, build and install the dependency just like we did for gnome-shell but this gets ugly fast when the project you’re building has a lot of dependencies that are not installed on the host. In this case, not a big issue though, so we keep pushing ahead and install what we need.

This time gnome-shell starts, just to exit again. This cycle mutter gained the cancel-input-capture keybinding which is defined in a gschema file, which gets installed correctly, but needs to be compiled with all the other gschemas on the host system and that’s not happening. It can’t be happening because the toolbox has its own set of gschemas. As part of the build process, the compiler will build them into the toolbox, instead of compiling the host gschemas and the new gschema to the host. We can be clever and run the compiler on the host and install the results either to /usr for rpm-ostree usroverlay or the extension directory for systemd-sysext.

This doesn’t seem very robust.

Where is the root issue here? We’re building on one operating system and installing the result to another. This can’t work well.

Embracing containers again

We can’t build gnome-shell in our toolbox! Instead, we have to build it on our host. At least, something that looks exactly like our host. The nice thing about the immutable system here is that we know exactly how our host looks like and there is an ostree commit which describes it. Not only that, we can also build a native container (OCI) image from that ostree commit and someone already did that.

We can create our own Dockerfile which builds gnome-shell and mutter based on the Silverblue image:

ARG IMAGE_NAME="${IMAGE_NAME:-silverblue}"
ARG SOURCE_IMAGE="${SOURCE_IMAGE:-silverblue}"
ARG BASE_IMAGE="quay.io/fedora-ostree-desktops/${SOURCE_IMAGE}"
ARG FEDORA_MAJOR_VERSION="${FEDORA_MAJOR_VERSION:-38}"

FROM ${BASE_IMAGE}:${FEDORA_MAJOR_VERSION} AS builder

ARG IMAGE_NAME="${IMAGE_NAME}"
ARG FEDORA_MAJOR_VERSION="${FEDORA_MAJOR_VERSION}"

# setup dnf
RUN rpm-ostree install -y dnf
RUN dnf install -y 'dnf-command(builddep)'

# install development packages
RUN dnf groupinstall -y 'Development Tools'
RUN ln -s /usr/bin/ld.gold /usr/bin/ld # ???
RUN dnf install -y meson strace gdb valgrind sysprof

# install gnome shell and mutter specific dependencies
RUN dnf builddep -y gnome-shell mutter
RUN dnf install -y libei-devel libeis-devel asciidoc sassc

# dnf won't work on the running system, let's not confuse ourselves
RUN rpm-ostree uninstall -y dnf

# build mutter
ADD ./mutter /tmp/mutter
RUN cd /tmp/mutter && \
    meson setup _container_build/ --prefix=/usr && \
    ninja -C _container_build/ && \
    ninja -C _container_build/ install

# build gnome-shell
ADD ./gnome-shell /tmp/gnome-shell
RUN cd /tmp/gnome-shell && \
    meson setup _container_build/ --prefix=/usr && \
    ninja -C _container_build/ && \
    ninja -C _container_build/ install

RUN rm -rf /tmp/* /var/*
RUN ostree container commit
RUN mkdir -p /var/tmp && chmod -R 1777 /var/tmp

And then build our own Silverblue based OS which includes our own gnome-shell: podman build -f GnomeShell.containerfile -t silverblue-gnome-shell-main.

Pretty neat, but how do we get the changes between the Silverblue image and our own image deployed onto our host?

OCI images consist of layers. Each layer can add, remove and modify files from the previous layer. When we build our own image based on another image, our changes become new layers in the new image. With a bit of massaging we can extract those layers into a filesystem tree, such as /var/lib/extensions:

#! /bin/bash

set -ouex pipefail

BASE_IMAGE=${BASE_IMAGE-quay.io/fedora-ostree-desktops/silverblue:38}
EXT_IMAGE=${EXT_IMAGE-localhost/silverblue-gnome-shell-main:latest}
SYSEXT_NAME=${SYSEXT_NAME-test-ext}

SYSEXT_PATH=/var/lib/extensions/$SYSEXT_NAME

function fail {
    printf '%s\n' "$1" >&2 ## Send message to stderr.
    exit "${2-1}" ## Return a code specified by $2, or 1 by default.
}

tmpdir="$(mktemp -d)"
trap 'rm -rf -- "$tmpdir"' EXIT

skopeo copy containers-storage:$BASE_IMAGE dir:$tmpdir/base-image
skopeo copy containers-storage:$EXT_IMAGE dir:$tmpdir/ext-image

base_info=$(skopeo inspect dir:$tmpdir/base-image --raw)
ext_info=$(skopeo inspect dir:$tmpdir/ext-image --raw)

base_layer_count=$(jq '.layers | length' <<< "$base_info")
ext_layer_count=$(jq '.layers | length' <<< "$ext_info")

if [[ $ext_layer_count -le $base_layer_count ]]; then
  fail "ext image needs more layers than base image"
fi

for (( i=0; i<$base_layer_count; ++i)); do
  base_digest=$(jq -r ".layers[${i}].digest" <<< "$base_info")
  ext_digest=$(jq -r ".layers[${i}].digest" <<< "$ext_info")

  if [[ "$base_digest" != "$ext_digest" ]]; then
    fail "layer $i digest mismatch: base $base_digest, ext $ext_digest"
  fi
done

sudo rm -rf "$SYSEXT_PATH"
sudo mkdir -p $SYSEXT_PATH/usr/lib/extension-release.d/
echo "ID=_any" | sudo tee $SYSEXT_PATH/usr/lib/extension-release.d/extension-release.$SYSEXT_NAME

for (( i=$base_layer_count; i<$ext_layer_count; ++i)); do
  ext_digest=$(jq -r ".layers[${i}].digest" <<< "$ext_info")
  ext_digest="${ext_digest#sha256:}"
  
  sudo tar xf $tmpdir/ext-image/$ext_digest -C $SYSEXT_PATH
done

This script is horribly inefficient, copies around all files multiple times. It also doesn’t deal with so-called whiteout files (the real filename, prefixed with .wh.) which are used to indicate a file from the previous layers was removed.

One could expand on this concept and create a real program which doesn’t have those inefficiencies and handles removal of files properly. However, we can’t handle whiteout files which remove files from the base image (i.e. our Silverblue system) with systemd-sysext simply because it doesn’t support removing files from the base system. It only works with the temporary rpm-ostree usroverlay.

All in all, this isn’t too bad. With a bit more investment this workflow could become usable. Can we do better?

Embracing new Operating Systems

Usually rpm-ostree pulls updates/commits from a ostree repository but did you know that it can also pull OCI images from container registries?

Did you notice we have an OCI image laying around which contains exactly everything we want?

#! /bin/sh

set -ouex pipefail

IMAGE_NAME=$1
OCI_IMAGE_STORAGE="/var/development-images/${IMAGE_NAME}"

sudo rm -rf "${OCI_IMAGE_STORAGE}"
sudo mkdir -p "${OCI_IMAGE_STORAGE}"
podman save --format=oci-archive "${IMAGE_NAME}" | sudo tar -x -C "${OCI_IMAGE_STORAGE}"
sudo rpm-ostree rebase "ostree-unverified-image:oci:${OCI_IMAGE_STORAGE}"
# revert: rpm-ostree rebase fedora:fedora/38/x86_64/silverblue

We can deploy this image using the script above with rpm-ostree-deploy-image silverblue-gnome-shell-main and reboot into it.

This is easy and robust! The only thing that’s annoying is that everything here takes a bit of time and requires a reboot. In a lot of cases this is fine. You develop in the toolbox and if a specific case comes up that requires testing in the entire session or under real conditions for a long amount of time, you can build your own OS like this and boot into it. When everything is done we can just rebase onto the upstream silverblue repository using rpm-ostree rebase fedora:fedora/38/x86_64/silverblue.

Can we do even better and get rid of the slow development loop?

Instead of installing gnome-shell directly into the image, we can stop after installing the dependencies and build tools, boot into this image, then build gnome-shell on our new host, install it into /var/lib/extensions and then activate the extension with systemd-sysext refresh. After the initial build of the image the development loop is almost the same as on a traditional system and just as fast.

Let’s start by copying the small helper scripts to ~/.local/bin.

rpm-ostree-deploy-image

#! /bin/sh

set -ouex pipefail

if [ "$#" -ne 1 ]; then
    echo "Illegal number of parameters"
fi

IMAGE_NAME=$1
OCI_IMAGE_STORAGE="/var/development-images/${IMAGE_NAME}"

sudo rm -rf "${OCI_IMAGE_STORAGE}"
sudo mkdir -p "${OCI_IMAGE_STORAGE}"
podman save --format=oci-archive "${IMAGE_NAME}" | sudo tar -x -C "${OCI_IMAGE_STORAGE}"
sudo rpm-ostree rebase "ostree-unverified-image:oci:${OCI_IMAGE_STORAGE}"
# revert: rpm-ostree rebase fedora:fedora/38/x86_64/silverblue

sysext-install

#! /bin/bash

set -eux

if [ "$#" -ne 2 ]; then
    echo "Illegal number of parameters"
fi

BUILDDIR="$1"
EXTENSION_NAME="$2"

DESTDIR="/var/lib/extensions/$EXTENSION_NAME"
RELEASE_DIR="$DESTDIR/usr/lib/extension-release.d"

sudo mkdir -p "$DESTDIR"
sudo meson install --destdir="$DESTDIR" -C "$BUILDDIR" --no-rebuild

sudo mkdir -p "$RELEASE_DIR"
echo ID=_any | sudo tee "$RELEASE_DIR/extension-release.$EXTENSION_NAME"

Adjust the Dockerfile a bit (make sure to put it in a directory containing both gnome-shell and mutter): GnomeShellDevelopment.containerfile.

ARG IMAGE_NAME="${IMAGE_NAME:-silverblue}"
ARG SOURCE_IMAGE="${SOURCE_IMAGE:-silverblue}"
ARG BASE_IMAGE="quay.io/fedora-ostree-desktops/${SOURCE_IMAGE}"
ARG FEDORA_MAJOR_VERSION="${FEDORA_MAJOR_VERSION:-38}"

FROM ${BASE_IMAGE}:${FEDORA_MAJOR_VERSION} AS base

ARG IMAGE_NAME="${IMAGE_NAME}"
ARG FEDORA_MAJOR_VERSION="${FEDORA_MAJOR_VERSION}"

# setup dnf
RUN rpm-ostree install -y dnf
RUN dnf install -y 'dnf-command(builddep)'

# install development packages
RUN dnf groupinstall -y 'Development Tools'
RUN ln -s /usr/bin/ld.gold /usr/bin/ld # ???
RUN dnf install -y meson strace gdb valgrind sysprof

# install gnome shell and mutter specific dependencies
RUN dnf builddep -y gnome-shell mutter
RUN dnf install -y libei-devel libeis-devel asciidoc sassc

# dnf won't work on the running system, let's not confuse ourselves
RUN rpm-ostree uninstall -y dnf

RUN rm -rf /tmp/* /var/*
RUN ostree container commit
RUN mkdir -p /var/tmp && chmod -R 1777 /var/tmp


FROM base AS build

# build mutter
ADD ./mutter /tmp/mutter
RUN cd /tmp/mutter && \
    meson setup _container_build/ --prefix=/usr && \
    ninja -C _container_build/ && \
    ninja -C _container_build/ install

# build gnome-shell
ADD ./gnome-shell /tmp/gnome-shell
RUN cd /tmp/gnome-shell && \
    meson setup _container_build/ --prefix=/usr && \
    ninja -C _container_build/ && \
    ninja -C _container_build/ install

RUN rm -rf /tmp/*

Then run the entire Dockerfile to make sure we actually got all the dependencies we need, tag the base stage as silverblue-38-gnome-shell-development, deploy this image, and then finally boot into it.

podman build -f GnomeShellDevelopment.containerfile
podman build -f GnomeShellDevelopment.containerfile --target=base -t silverblue-38-gnome-shell-development
rpm-ostree-deploy-image silverblue-38-gnome-shell-development
systemctl reboot

After the reboot we can then actually build mutter and gnome-shell on the host, install it into the extension with the sysext-install script and make it active with systemd-sysext refresh. Finally we can enable systemd-sysext.service to make it persist across reboots.

cd mutter

meson setup buildhost --prefix=/usr
ninja -C buildhost/
sysext-install buildhost/ gnome-shell-test
sudo systemd-sysext refresh

cd ../gnome-shell

meson setup buildhost --prefix=/usr
ninja -C buildhost/
sysext-install buildhost/ gnome-shell-test
sudo systemd-sysext refresh

gnome-shell --version
sudo systemctl enable systemd-sysext.service

Back to the basics

Did you know that rpm-ostree usroverlay is not the only way to get a mutable overlay on top of the read-only /usr tree? Me neither, until Ivan Molodetskikh pointed this out to me. ostree admin unlock --hotfix gets us a persistent overlay even!

This way we can get around the biggest drawback of using systemd-sysext: we can install directly into the filesystem tree and install-time build system integration such as the gschema compiler will do its job perfectly.

So, instead of installing all the dependencies and build tools into our OS, we just layer rpm on top of it, enable the persistent overlay with ostree admin unlock --hotfix, and start using it like a traditional system. We can install the dependencies with dnf builddep, anything else we might need with dnf install and then build and install everything as usual.

rpm-ostree install dnf
systemctl reboot
sudo ostree admin unlock --hotfix
sudo dnf install 'dnf-command(builddep)'
sudo dnf builddep gnome-shell mutter --allowerasing
sudo dnf install libei-devel libeis-devel asciidoc sassc

cd mutter
meson setup buildhost --prefix=/usr
ninja -C buildhost/
sudo ninja -C buildhost/ install

cd ../gnome-shell
meson setup buildhost --prefix=/usr
ninja -C buildhost/
sudo ninja -C buildhost/ install

The dependencies in the dnf repo and in the silverblue image can be slightly different which might require passing --allowerasing when installing new stuff with dnf.

ostree admin unlock --hotfix creates a new rollback target. To abandon the experiment and rollback to our previous state, we can simply use rpm-ostree rollback -r and clean up our messy deployment with rpm-ostree cleanup -r after successfully booting into the previous, good version.

Tomáš Popela pointed out that you can get dnf even without overlaying it using rpm-ostree by downloading and installing microdnf and install dnf with it. This way we can keep our base system clean, don’t have to reboot and don’t accidentally run dnf when /usr is read-only.

#!/bin/sh

set -e

tmp=$(mktemp -d)
cleanup () {
	rm -rf $tmp
}
trap cleanup EXIT

set -x

curl --silent --show-error --remote-name-all --output-dir $tmp \
  https://kojipkgs.fedoraproject.org//packages/microdnf/3.9.0/2.fc38/x86_64/microdnf-3.9.0-2.fc38.x86_64.rpm \
  https://kojipkgs.fedoraproject.org//packages/libpeas/1.34.0/3.fc38/x86_64/libpeas-1.34.0-3.fc38.x86_64.rpm \
  https://kojipkgs.fedoraproject.org//packages/dnf/4.14.0/2.fc38/noarch/dnf-data-4.14.0-2.fc38.noarch.rpm \
  https://kojipkgs.fedoraproject.org//packages/libdnf/0.68.0/2.fc38/x86_64/libdnf-0.68.0-2.fc38.x86_64.rpm

successful=$?

if [ $successful != 0 ] ; then
  exit 1
fi

if ! sudo rpm -i "$tmp/*.rpm"; then
  echo "Can't install microdnf!"
  exit 1
fi

if ! sudo microdnf --assumeyes install dnf python3-dnf-plugins*; then
  exit 1
fi

All in all, this seems to be the best option. It transforms the silverblue setup temporarily into a traditional package based, mutable directory tree like system which can be rolled back at any point. It’s easy to use, works well and doesn’t require any further kind of trickery.


Do you have a comment?

Toot at me on mastodon or send me a mail!