09 Mar 2025, 17:13

Developing pkgsrc with git

I stopped developing pkgsrc with CVS.

Quick bit of background: NetBSD is still using CVS as its version control system. The decision to move to something else has been taken long ago, but the switch has not happened as of today.

Working with CVS is painful for many reasons. For instance, there is no way to see your local changes without waiting several minutes for a cvs up -n. A full tree update (cvs up) churns for quite a while before it even starts updating any files.

I met Taylor (riastradh@) last year, and he told me about his git-based workflow. I must say I have become a convert! I use the GitHub mirror as my main source tree. Yes, it adds GitHub (and thus Microsoft) as an intermediary, but I don’t mind. Below, I am going to describe my workflow and play through an example.

My setup

I have two pkgsrc source trees:

  • a CVS checkout at ~/pkgsrc-cvs, based on the writable repo and only used for writing
  • a git checkout at ~/pkgsrc

The git checkout is relatively quick when skipping history:

$ git clone --depth 1 https://github.com/NetBSD/pkgsrc

The first thing is to create a local/${hostname} branch. Every time I rebase this branch onto the trunk, I run pkgrrxx -u to update all packages to that state. This gives me a stable base from which to work. For work on packages, I create a new branch (e.g. update/${pkgname}) and delete it when done. Or I do the change directly in the local branch if I am lazy :)

For committing to CVS, I use the git-cvsexportcommit script from the devel/git-perlscripts package.

Worked example: a package update

Here, I am updating the net/gh package (the GitHub CLI) to a new version. Once I have made my changes, I commit:

$ git commit -a
[local/fedorakumori 668ebe1fa] gh: update to 2.68.1
 3 files changed, 473 insertions(+), 509 deletions(-)

Note the commit hash; we will use it for exporting the commit.

$ cd ~/pkgsrc-cvs
$ GIT_DIR=~/pkgsrc/.git git cvsexportcommit -v -c 668ebe1fa
Applying to CVS commit 668ebe1fa32d778ce19fe81bc3529f15488a048d from parent 7fa1acd873b88c6b36796133c7b692023a611f61
Checking if patch will apply
error: patch failed: net/gh/Makefile:2
error: net/gh/Makefile: patch does not apply
error: patch failed: net/gh/distinfo:1
error: net/gh/distinfo: patch does not apply
error: patch failed: net/gh/go-modules.mk:1
error: net/gh/go-modules.mk: patch does not apply
cannot patch at /usr/pkg/libexec/git-core/git-cvsexportcommit line 338.

There has been some other commit in pkgsrc since my last sync. To resolve, let’s first switch over to a new branch, based on the trunk:

$ cd ~/pkgsrc
$ git checkout trunk
$ git pull
$ git checkout -b update/gh
$ git cherry-pick 668ebe1fa
Auto-merging net/gh/Makefile
CONFLICT (content): Merge conflict in net/gh/Makefile
Auto-merging net/gh/distinfo
CONFLICT (content): Merge conflict in net/gh/distinfo
Auto-merging net/gh/go-modules.mk
CONFLICT (content): Merge conflict in net/gh/go-modules.mk
error: could not apply 668ebe1fa... gh: update to 2.68.1
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
$ cd net/gh
$ git status
On branch update/gh
You are currently cherry-picking commit 668ebe1fa.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Unmerged paths:
  (use "git add <file>..." to mark resolution)
        both modified:   Makefile
        both modified:   distinfo
        both modified:   go-modules.mk

no changes added to commit (use "git add" and/or "git commit -a")

Now I open the three files in my editor and merge, thanks to my years-long routine of doing this. Confusingly, the messages above show that there are two ways to continue the cherrypick: either with git commit or with git cherry-pick --continue. As far as I can tell, they do the exact same thing.

So, after resolving the conflict and running git commit -a, I have a commit with a new hash:

$ git log | head -1
commit 2b51308c74912f41974d2267f58bb35987bde7e8

Of course, it is a good idea to build the package again at this point and verify that it still works.

Let’s commit this one to CVS now. You don’t need to copy the whole hash, just the first 6-8 characters.

$ cd ~/pkgsrc-cvs
$ GIT_DIR=~/pkgsrc/.git git cvsexportcommit -v -c 2b51308c7
Applying to CVS commit 2b51308c74912f41974d2267f58bb35987bde7e8 from parent 44a7067c8ab30c8ec0e58668ec86ed0edaaba1de
Checking if patch will apply
Patch applied successfully. Adding new files and directories to CVS
Commit to CVS
Patch title (first comment line): gh: update to 2.68.1
  cvs commit -F .msg 'net/gh/Makefile' 'net/gh/distinfo' 'net/gh/go-modules.mk'
/cvsroot/pkgsrc/net/gh/Makefile,v  <--  net/gh/Makefile
new revision: 1.90; previous revision: 1.89
/cvsroot/pkgsrc/net/gh/distinfo,v  <--  net/gh/distinfo
new revision: 1.45; previous revision: 1.44
/cvsroot/pkgsrc/net/gh/go-modules.mk,v  <--  net/gh/go-modules.mk
new revision: 1.39; previous revision: 1.38
Committed successfully to CVS

The change is now live in CVS. The normal thing to do now is to add to the changelog. I do this directly in CVS for simplicity.

$ cd ~/pkgsrc-cvs/net/gh
$ bmake changes-entry && bmake changes-entry-commit
=> Updating CHANGES-2025 and TODO
=> Adding the change
=> Committing the change
/cvsroot/pkgsrc/doc/CHANGES-2025,v  <--  CHANGES-2025
new revision: 1.2125; previous revision: 1.2124

This works well for cross-cutting changes, like revbumps, too.


In the beginning, I alluded to some of the pain points with CVS. But this also enables me to do some new things. In CVS, working copies are expensive, so you use the same one each time. It has happened to me more than once that I committed something I did not intend to because I had the change lying around in my tree, uncommitted.

Uncommitted changes are liabilities.

In a DVCS tree, you can commit everything right away and upstream when you are ready to. This means that you write change descriptions while you still remember what you did :) For a cross-cutting or independent change, you can just “reset” your tree to a clean state in a few seconds. This is a great thing.

19 Jan 2025, 21:46

"Founder Mode"

My current work project started last summer, as a bit of experimentation. A few of us sat together in a room and started writing down a hypothetical piece of configuration. Within less than a week, we had actually written a prototype-quality piece of software accepting exactly the configuration we had brainstormed.

A few months later however, this project went through a difficult phase where we realized that we actually needed to write down a plan for bringing the software to a stable, usable state. This ended up a painful couple of weeks of planning ahead and realizing that our goal of shipping this was slipping.

I had a conversation about this with my Engineering Director the other day, and I thought what he told me was intriguing. He said that we were struggling with going from what he called “founder mode” to an actual planned development process.

As an analogy, imagine you are painting a wall green. In the beginning, you can just put paint wherever you like. After a while, you can make a break and proudly exclaim that you already painted 30% of that wall. But sooner or later, you will have to check carefully where there is no paint yet. And you will need to go with a smaller brush, so that you can paint the corners properly.

In software development, some people can only do the first phase. They have awesome ideas and they can churn out an early version in record time, maybe even working nights. But they are not good at doing the other 90%: bug fixes, support, identifying areas for improvement, and continual maintenance.

Some other people can only do that second phase. They are very good at maintaining something, sometimes for years, but they might not have the sort of idea to jump-start something completely new. Of course, that’s perfectly fine.

And some people (hopefully) can do both, but they might need to be reminded when it is time to switch to the refinement phase :)

As a leader, if you have people on your team who are the perpetual “founders” (startup, open source, big tech, it doesn’t matter), there is a trick you can pull to keep development going while not antagonizing that “10x engineer”. The trick is to get them interested in some new, shiny problem at the right moment. They might fall in love with solving the next problem and come back wanting to hand off their current thing to someone who will finish building it. And thus, they can do their magic one more time. Even better is when the people who are part of this rochade don’t even notice it happening. –

… Now that I wrote the previous paragraph, I actually wonder if one of my leads has done that with me before, and if so, how many times it has happened!

23 Dec 2024, 11:41

Emulating *BSD on ARM, Part 3: OpenBSD

This is part 3 of my blog post series about emulating BSD operating systems for 32-bit ARM with QEMU. Buckle up, today we will need to do an actual OS installation!

In OpenBSD/armv7, the miniroot image is an installer, so we also need a new empty drive image to install to. I recommend the qcow2 format, since it consumes only the space that is actually occupied. The 10G image created below is only 192 kilobytes initially. Here is how you create the root image:

qemu-img create -f qcow2 root.qcow2 10G

Next, we will need the pflash0.img and pflash1.img files created in the previous part.

As for the install image, there are several choices on the download page for OpenBSD 7.6/armv7. I went with miniroot-cubox-76.img but I don’t think it matters much which one you choose. They only differ in the platform-specific bootloader, and we will be booting from UEFI instead anyway.

Note that when booting, you might get an error message complaining that the size of the image is not a power of two. If that happens, just resize it to the next bigger one:

qemu-img resize miniroot-cubox-76.img 64M

Running the installer

Here is the command line to launch the installer then:

qemu-system-arm \
	-M virt \
	-m 1024 \
	-nographic \
	-drive file=miniroot-cubox-76.img,format=raw \
	-drive file=root.qcow2,format=qcow2 \
	-drive file=pflash0.img,format=raw,if=pflash,readonly=on \
	-drive file=pflash1.img,format=raw,if=pflash \
	-device virtio-gpu-pci \
	-nic user,model=rtl8139

This time, we don’t have to type anything in the UEFI shell, it loads the bootloader directly, followed by the OpenBSD installer! At the boot> prompt, just press Enter.

Booting OpenBSD from UEFI

I won’t go over the steps to install OpenBSD itself. The only thing to keep in mind is that OpenBSD calls the boot image (the miniroot) sd0 and the root disk we created sd1, so be sure to install onto sd1.

As always, you can get out of QEMU and back to the terminal by “powering down”, e.g. running halt -p from a shell.

After installation

To boot your newly installed system, use the exact same QEMU command line as before, except that you remove the miniroot disk entry:

qemu-system-arm \
	-M virt \
	-m 1024 \
	-nographic \
	-drive file=root.qcow2,format=qcow2 \
	-drive file=pflash0.img,format=raw,if=pflash,readonly=on \
	-drive file=pflash1.img,format=raw,if=pflash \
	-device virtio-gpu-pci \
	-nic user,model=rtl8139

A dmesg output of this freshly installed system can be found here. Notably, it looks like the virtio GPU is not supported, so I am not sure if you can get X11 to work.

When I did this installation, a short time after the OpenBSD 7.6 release, there were no binary packages avaiable for 32-bit ARM. But it looks like this has changed! There is a full set of packages in https://cdn.openbsd.org/pub/OpenBSD/7.6/packages/arm/ now, so that a command like

pkg_add vim

will just work. That’s nice!

Some thoughts on machine types and emulation

When I started fiddling with this, I tried to get qemu to emulate a real machine corresponding to one of the OpenBSD miniroot image. But that’s fairly limiting: these boards came with too little RAM, too slow storage, too few cores, all of it.

The nice property of an emulator like QEMU is that you can kind of build your own machine from scratch. The virt machine type is a generic thing that you can build out just like you want it. Want 8G of RAM? More cores? No worries!

You can add a bunch of virtio devices that are virtualization-aware; if one of them doesn’t work right, replace with some other emulated hardware. This is also why my command lines above use an rtl8139 network device instead of a virtio-net.

And for booting this contraption, using the EDKII UEFI firmware allows just booting a generic kernel and bootloader, which is supported almost everywhere.

And that concludes my series of blog posts on emulating ARM systems with QEMU. I hope you enjoyed it!

For reference, here are links to the other parts in the series:

11 Nov 2024, 20:52

Emulating *BSD on ARM, Part 2: FreeBSD

In Part 1 of this blog post series, I explained how I recently spent some time getting various BSD OSes to run on QEMU, for 32-bit ARM (ARMv7). This part deals with FreeBSD.

Spoiler: it was easier than the others.

I started by downloading an image from https://download.freebsd.org/releases/arm/armv7/ISO-IMAGES/. Mine was called FreeBSD-14.1-RELEASE-arm-armv7-GENERICSD.img.xz.

Preparing UEFI

Warner Losh has thankfully written a blog post about how to get it running in QEMU, which we are going to follow.

In the post, he explains that QEMU contains a version of Tianocore EDK2 for 32-bit ARM. If you don’t know about Tianocore, it is a UEFI firmware – similar to the BIOS of a PC, except it’s not built in. It abstracts away the low-level hardware bits and presents a standardized interface to the user and the bootloader.

So let’s create two virtual flash devices – one read-only containing EDK2, one r/w for any UEFI variables that should be saved. The read-only one can be shared across VMs, for the other one you should have a copy per VM.


dd if=/dev/zero of=pflash0.img bs=1m count=64
dd if=/dev/zero of=pflash1.img bs=1m count=64
dd if=/opt/pkg/share/qemu/edk2-arm-code.fd of=pflash0.img conv=notrunc
dd if=/opt/pkg/share/qemu/edk2-arm-vars.fd of=pflash1.img conv=notrunc

Here, QEMU is installed through pkgsrc in /opt/pkg. If your installation is in a different directory, you may have to adjust the paths.

Creating the QEMU startup script

QEMU is fully configured through command-line options. Thus, the command lines are long and difficult to get just right. QEMU command lines are typically passed on from user to user, like a cargo cult. Here is mine for FreeBSD-ARM:


qemu-system-arm \
	-M virt \
	-m 4096 \
	-drive file=pflash0.img,format=raw,if=pflash,readonly=on \
	-drive file=pflash1.img,format=raw,if=pflash \
	-drive file=FreeBSD-14.1-RELEASE-arm-armv7-GENERICSD.img,format=raw,if=virtio,cache=writethrough \
	-device virtio-net,netdev=net0,mac="52:54:00:12:34:55" \
	-netdev type=user,id=net0 \
	-device virtio-gpu-pci \
	-usb -device nec-usb-xhci \
	-device usb-kbd -device usb-mouse \

EFI wrangling

Upon launch, we get a UEFI shell prompt:

Welcome to the UEFI shell! Remember DOS?

The EFI shell (here is a primer from Intel) is strangely reminiscent of a DOS prompt. First, set the current device by typing


You can now explore the contents with dir and change directories with cd. To boot, run this command:


To get this setting to stick in future reboots, create a startup.nsh file – the EFI equivalent to autoexec.bat – on the running system, as root:

echo 'fs0:\efi\boot\bootarm' > /boot/efi/startup.nsh

That’s it!

The FreeBSD/arm system from the image is immediately ready for action! No installation is needed. I was able to install some binary packages for armv7 simply by running

pkg update
pkg install python311 go123

All in all, this FreeBSD experience was about as frictionless as it gets, once you have cleared the hurdle of getting QEMU to run with the right settings.

You can quit the emulator and go back to the host shell by running halt -p as root.

In the next installment, we are going to install OpenBSD/armv7.

28 Oct 2024, 20:30

Emulating *BSD on ARM, Part 1: Introduction

In my copious spare time, I maintain the Go CI system for certain platforms. These days, Go uses LUCI, the same CI pipeline that Chromium is using.

My current rabbit hole is making the “swarming bot” work on NetBSD/arm – that’s 32-bit ARM, not aarch64. When building Go code for 32-bit ARM, the GOARM environment variable can be set to the instruction set version, i.e. GOARM=6 (ARMv6, the default) and GOARM=7 (ARMv7). One of the things that ARMv7 has that v6 doesn’t is atomic synchronisation instructions. At least on BSD systems, Go needs those for multi-core systems (which is more or less all systems these days). Thus, ARMv6 Go binaries exit on startup if run on a system with multiple cores.

The Chromium CI system used to only support ARMv6 – they call it armv6l, for little-endian – not ARMv7. This meant that the whole thing was pretty much broken on BSD on ARM.

So I added code for netbsd-armv7l in the LUCI swarming bot. Again, that’s the LUCI platform name, Go calls this netbsd-arm. The feedback I got was basically “Great, now can you also do FreeBSD and OpenBSD?”

Time to install some VMs to find out what they use as uname, or rather what Python’s platform module returns for them. It turns out the three OSes are wildly inconsistent in this!

As a spoiler, here is what I found out:

OS platform.machine() platform.processor()
FreeBSD/armv7 arm armv7
OpenBSD/armv7 armv7 arm
NetBSD/evbarm evbarm earmv7hf

In the next posts, I will detail how I got these platforms to run in qemu.

27 Aug 2024, 11:46

The code is not enough

In my job, I read a lot of code. I read more code than I write. I suppose that’s true for many engineers at the senior level or above.

For instance, a new piece of code integrates with a library, or with another code base, and I want to understand how the integration works. Or I am the supplier of the infrastructure/library/framework and I need to debug someone’s (mis-)use.

  • Why does this not compile?
  • Why does it compile but then crashes on startup with a cryptic error message?
  • Why does it seem to work but then all requests return the same error?


Code is nothing without the corresponding configuration, particularly when you are looking at builds or deployments.

For instance, the code may contain a bunch of #ifdef preprocessor macros. Which of the paths is taken and which one is not depends on the settings of the build. So to understand what’s happening, you need the actual build settings. In the case of autotools, that’s config.status – which also contains a ton of noise, but you can find the list of variables easily enough. Or you look at config.h, which specifically has preprocessor defines.

But that’s not the only type of configuration. When a program is running, its behavior changes based on command-line flags, or the contents of a configuration file. Or some settings (like secrets) are passed in from an environment variable, or looked up in some sort of secret storage.

Sometimes, command-line flags contain a whole configuration message, e.g. a bit of JSON as the flag value. I have seen this in libraries that control authentication requirements – i.e. whether a handler is public, or needs a password, or needs some kind of strong user identity. If you are investigating why no one seems to be able to call your HTTP endpoint, that’s where you need to start.

What and why

The point I was trying to make though is this: when reading code, the two main questions are what and why:

  1. What is the code doing?
  2. Why is it doing it?

I found that you need to understand both aspects to understand some code. As for #1, sure you can start looking at source files in less but you can do better.

I personally find working cross-references super useful. As an example, you see that some code calls AddPeer, and you want to know what it does, so you jump to its definition. That particular function adds an entry to a table. But what uses this table? So you highlight that identifier and look for usages.

Getting cross-references within a code base is not hard. For C code, you can use ctags, which is decades old. Both vim and emacs support these.

The modern approach is an LSP server interacting with your text editor. Neovim for example has LSP support built in, as does VS Code. There are LSP implementations for all sorts of programming languages. For instance, for Go code, there is the excellent gopls.

The second question, the why, is best answered from the history of the code. It is vastly better to read the code with history available – e.g. looking at a VCS checkout rather than an extracted release tarball.

When reading a bit of code, wondering why it is doing a particular thing, you can start from the “blame” or “annotate” layer. Some folks have called the existence of this layer a mistake – I disagree! Here is an example from the NetBSD kernel, an excerpt of the result of running cvs annotate sys/dev/pci/pcireg.h:

1.1          (mycroft  09-Aug-94): /*
1.60         (jakllsch 17-Aug-09):  * Size of each function's configuration space.
1.60         (jakllsch 17-Aug-09):  */
1.60         (jakllsch 17-Aug-09):
1.124        (msaitoh  28-Mar-17): #define	PCI_CONF_SIZE		0x100
1.124        (msaitoh  28-Mar-17): #define	PCI_EXTCONF_SIZE	0x1000
1.60         (jakllsch 17-Aug-09):
1.60         (jakllsch 17-Aug-09): /*
1.1          (mycroft  09-Aug-94):  * Device identification register; contains a vendor ID and a device ID.
1.1          (mycroft  09-Aug-94):  */
1.124        (msaitoh  28-Mar-17): #define	PCI_ID_REG		0x00
1.1          (mycroft  09-Aug-94):
1.3          (cgd      18-Jun-95): typedef u_int16_t pci_vendor_id_t;
1.3          (cgd      18-Jun-95): typedef u_int16_t pci_product_id_t;

It shows who last touched a line and when. You see that adjacent lines might have very different dates!

From this, you can go back to the commit that made the change, look at its log message and the other files that are part of the same commit. If your commit log messages are good, they tell a story. They give the context needed to understand the “why”.


Combining cross-references and a history panel is a great way to read and understand code. Many IDE developers understand this and offer a view just like this.

An exemplary UI for this approach is Google Code Search. For example, look at mutex.go from the Go standard library. All identifiers are clickable and serve as cross-references. The “History” panel at the bottom offers all the commits where something was changed.

A similar web UI (but open source) is OpenGrok. Look at the pcireg.h example in NXR, the NetBSD OpenGrok instance. The file history is a separate page but that works too. What’s nice about OpenGrok is clicking on a revision in this view shows the blame layer at that version, so you can see who edited the line before.

These tools really provide a lot of added value when reading and understanding code. Use them.

07 May 2024, 22:21

Fedora Update: btrfs self-destruct

A while ago, I installed Fedora Asahi Remix on my M2 MacBook Air, and I was very positive about it. So positive, in fact, that I ended up making it the default partition in the bootloader. I haven’t used macOS in weeks. But then, a few days ago, something weird happened:

In the middle of some development work, running cvs update on the pkgsrc repository, the screen suddenly filled with a bunch of “read-only file system” errors. It turned out that both / and /home had remounted themselves read-only, without my intervention.

Running journalctl --system showed some warning messages in an alarming orange-red: btrfs, the file system that the system was running from, had found inconsistencies and had chosen to remount itself read-only. Now what?

I tried running btrfsck /dev/nvme0n1p6. Now, by default, btrfsck will only check, not repair. But even the check aborted with an error about “errors found in FS roots”.

From my BSD experience, I thought booting into single-user mode and running fsck with a r/o root file system would do the trick. A quick init S later … it turns out that you can only get a shell in single-user mode if you have a root password set. By default, logins as root are forbidden, even in single-user mode. But good luck setting a password while / is read-only!

After several reboots, I got the system to work r/w for a while, so I could set a root password. So within single-user mode, I ran

# btrfsck --repair --force /dev/nvme0n1p6

Decent idea, but like the check before, the repair option also aborted with an error, saying “parent transid verify failed”. According to some random search results, this means that the file system is likely damaged beyond repair. Or, as @ParadeGrotesque aptly remarked on Mastodon:

It’s dead, Jim.

Don’t blame me, I don’t make the rules.

So in the end, I managed to back up my home directory with Restic (good thing I had restic installed before!), then wiped the entire installation and re-installed. Bummer.

More on btrfs

Why is btrfs the default file system in the first place? The Fedora Asahi Remix installer doesn’t even let you choose the file system type! I wish it did!

The way I understand btrfs is that the developers are aiming for a feature set on par with ZFS, except fully under the GPL and more “Linux-native” – ZFS was originally developed for Solaris, after all, and its licensing situation is at least a bit dodgy.

btrfs has long had a reputation of playing fast and loose with your data. In the early days, I heard several stories of spectacularly losing data. People have assured me that those days are over, and that btrfs is a reliable system these days.

Honestly, this experience has shown me that btrfs, even today, is anything but reliable. The corruption I witnessed is surely not due to broken hardware. The hardware is absolutely fine.

I suspect that the root cause is probably several unclean shutdowns. Asahi Linux does have an unfortunate bug where it will consume too much battery when in standby, so I ran out of battery several times. And, to be fair, finding the corruption and going read-only is sensible. But what I find absolutely infuriating is the utter inability of its fsck to actually repair anything. The fs roots are damaged, so what? There are incorrect transids, so what? Telling me my FS is fucked and I need to recreate it from scratch is not useful!

As it is, what happened with btrfs made me recall the previous low point of Linux file systems: years ago, I ran a file server with ReiserFS in production. It corrupted itself to such a degree that reiserfsck kept segfaulting. We lost so much user data. Good times.

31 Mar 2024, 20:32

The XZ Backdoor

Over the Easter weekend 2024, there was a big kerfuffle around a compression tool named xz. Honestly, the story is so amazing that it could be a gripping novel. In fact, what happened is not dissimilar to the book $ git commit murder My Michael Warren Lucas.

A PostgreSQL developer, Andres Freund, was running some benchmarks and trying to reduce the noise from other programs on the system. While doing so, he noticed that sshd would take more CPU than expected during logins, for about 0.5 seconds at a time.

Side note: The reason that this came up at all is that any system exposed to the internet gets regular failed login attempts via ssh, from potential attackers and security scanners of some sort.

So Andres starts looking into where the CPU time is spent and discovers that most of it is in a part of liblzma (the compression format library) and not covered by a symbol – i.e. without an indication in which function it is. This is very suspect. Digging further, it turns out that sshd is backdoored through the installation of xz version 5.6.0 or 5.6.1!

At this point, I highly recommend heading over to https://www.openwall.com/lists/oss-security/2024/03/29/4 and reading Andres’ original investigation posting. Below, I want to resume some more salient points around this story.

Why does OpenSSH even depend on xz?

Normally, it doesn’t, however several Linux distributions patch sshd to integrate systemd notifications. The systemd library to do that also happens to link against liblzma from the xz project. So perhaps the folks complaining about how systemd is so pervasive in modern Linux systems do have a point.

Who wrote the backdoor?

The malicious code has not been written by the original author and maintainer of xz, Lasse Collin. In fact, he appears to be the victim here.

The backdoor was inserted by a second xz maintainer named Jia Tan, who joined the project several years ago. The circumstances of him joining are a story that happens all over open source projects:

At some point, the main development of xz was more or less done. As lcamtuf argued, ongoing maintenance of a stable project “is as interesting as watching paint dry”. So I imagine Lasse struggling with motivation, then someone coming along and offering to become the co-maintainer. They show up with some patches ready to merge. At the same time, several people put pressure on Lasse, saying that “patches sit forever in the queue”, that they are doing a bad job and that this person should just be admitted.

Note that this was years before this backdoor.

They then do the following:

  • Jia Tan creates https://github.com/tukaani-project containing Lasse and themselves. The original xz code was hosted off GitHub on git.tukaani.org.
  • The homepage moves from Lasse’s server to a subdomain hosted on GitHub Pages, so that Jia Tan has full control over what’s on the page.
  • A workaround for a Valgrind failure is added to various distributions. Apparently, the Valgrind failure is caused by code added to later support the backdoor.
  • Jia Tan adds plenty of test files (binary archives) with innocent-sounding names to the repo. Curiously, some of them are not actually used in tests.
  • Jia Tan creates the official release tarballs on their machine.

The last two points are important for how the backdoor is hidden.

xz versions 5.6.0 and 5.6.1, both containing the backdoor, are released in late March 2024, apparently while Lasse Collin is on vacation.

How is the backdoor hidden?

The payload is effectively contained in some of the binary test archives added to the repository before. But the way they are enabled is amazingly subtle.

In open source projects using Autotools, it is common to not check in generated files (like the configure script) to the repository. After all, they can be generated at any time, right? So what happens is that the maintainer of the code (the perpetrator in this case) checks out the source at a certain tag, runs autoreconf and creates the release archive from the result (e.g. using make dist, which is part of automake). So whatever changes they do to their local copies of autoconf and/or its macros is not checked in to source control! If you look into the git repo, the code enabling the backdoor is not there at all!

What does the backdoor do?

On a Linux x86 system (and only on Linux x86), it hooks up into the signature verification routines used when checking the credentials of someone trying to log in via ssh. The exact way of working is still not fully understood as of the time of this article. According to preliminary analysis by Filippo Valsorda, the backdoor becomes active when the login attempt contains a certain client key (I was not even aware that a client key is sent in ssh logins) and runs some code by calling system. So it appears to be Remote Code Execution, a Pre-Auth Bypass, or both.

In this context, it is probably not a coincidence that some folks saw a much larger number of failed ssh login attempts over the internet:

xz 5.6.0 was released 2024-02-24, around one month ago, and it contained a backdoor.


There has been an incredible amount of SSH attacks on my server since early 2024. To give you an idea:

root@udon:~# grep -ic 2024 /etc/hosts.deny

root@udon:~# grep -ic 2023 /etc/hosts.deny

Who is Jia Tan?

In short, we don’t know. It is probably a fake identity, just like the sock puppet accounts that were used to pressure Lasse to admit Jia into the project.

The back story is that they are Chinese and living in California. There is some speculation that they were legitimate and a second person took over their identity. But this is not confirmed, and I think it is unlikely.

Jia Tan’s GitHub account is still active, though the xz-related repos have been blocked. This does not make it easier to follow along the technical write-ups, or to check out the source for ourselves.

Jia also made 9 commits into private repositories in the last month. I wonder what those are?

In the meantime, Lasse found at least one other malicious commit in the xz source, disabling the use of Landlock sandboxing on Linux. Look at that diff and think about whether you would have spotted this in code review!

Jia also made suspicious commits to other projects, such as libarchive. Again, some of this suspicion may be overblown, but we don’t know.

Would SBOM have prevented this?

Haha lol no.

These were normal-looking releases, uploaded and signed by the regular release manager. If anything, it is helpful to be aware that a larger product contains a version of xz with the backdoor.

What are the consequences?

This could be a watershed moment for open source development. Open source development is currently a high-trust environment, and I fear that suspicion about this case is going to color all sorts of future interactions.

The thing is, all the interactions that Jia Tan had look absolutely legit and standard for open source. It is completely normal that someone who you never met shows up and offers some help and patches. Maybe you meet them years later at FOSDEM, or something. Maybe you never do.

For instance, in NetBSD, there are dozens of people who contribute under pseudonyms at this very moment. To become a developer, we require that another developer meet you in person and sign your PGP key. Maybe that’s a good idea after all.

But for instance, multiple times, I have shown up in some other open source project with a fully formed patch, saying “Here is some code to support my OS or my architecture. You probably cannot verify it as you don’t have this machine. Please apply it, I promise it’s fine.” Until now, I assumed that they would do just that, and was annoyed when they didn’t. But now? They don’t know if I am trying to insert a backdoor.

In the end, this is probably going to introduce more friction and more suspicion of contributors’ motivations into the distributed, open development process. This is bad news for all users of Free software. I would hate it if this destroys the trust-first (you might say naive) culture in open source development.

But let me also offer two more hopeful takes.

  • Diversity works. The backdoor targets the largest majority platform for servers: Linux, with systemd, on x86. Running NetBSD or FreeBSD, or perhaps another architecture such as ARM, prevents the attack from working. So by virtue of being on a less common platform, you can lower your risk. This is a basic economic argument: just like other closed source vendors (hah!), exploit writers primarily target the majority.
  • Finally, the open source community did not actually do too bad in this! Sure, it was found by a stroke of pure luck. But the malicious code had been in the wild for only about a month. And it had not made it into long-term support releases of major distributions, where it would have made it into a ton of server systems. Compared to the lead time for this attach of 3-4 years, this ended up not being a very successful operation by whichever organization is behind this. So in sum, we got away okay this time.

03 Feb 2024, 10:21

Fedora Asahi Remix

I have been following Asahi Linux for a while. Linux for my MacBook Air M2 – sure, why not? But I wasn’t particularly interested in a distribution based on Arch Linux.

In late 2023, the Asahi folks presented a new distro that they called Fedora Asahi Remix. The promise is to combine the ground-breaking Kernel development of Asahi with the polish of Fedora Linux. I thought I would give it a go.

tl;dr for the rest of this post: the installer feels very hackish but once installed, it’s great!


The installer follows the infamous curl | sudo bash paradigm. I don’t trust this sort of thing so I took a look at what is downloaded: it is mostly a launcher for the rest of the installation, which is also a collection of shell scripts. It turns out that Fedora has a graphical installer, which appears very late in the installation routine, for about two minutes, the rest happens in the terminal.

The first thing the installer does is shrink the macOS partition to make space. You choose how much to give it – I chose a roughly 60:40 split. Aside: technically, it’s not a partition but more of a pool, as APFS is similar to ZFS in its interface.

Step number two is really the special sauce of Asahi Linux: installing a UEFI environment, the U-Boot bootloader and making it bootable from the boot selector screen, including some firmware setting changes.

Unlike a PC, an ARM Mac has a specialized firmware which is made to boot macOS and nothing else. On other types of ARM machines such as the Pinebook Pro, U-Boot (the Universal Bootloader) takes care of running the OS. On the Mac, Asahi sets some firmware variables that allow this behavior as well. The Linux boot partition needs to be “blessed”.

Here is where the most hackish part of the installation happens: the installer reboots into the macOS recovery system. Before it does that, it prints instructions asking you to run a series of commands inside the rescue system and accept the warnings that it prints. I don’t think you can brick your machine if you do it wrong but I would not give this to someone who is not good at this computer thing.

Scary installer message

The next reboot is the first time that the machine is actually running a Linux kernel. The graphical Fedora installer appears and does a bit of configuration. That’s it.

After installation, my system still boots into macOS by default. To boot Fedora, I hold the power button in the first boot phase, so that the firmware shows the boot selector screen. Then, the options are macOS, Fedora and the macOS rescue system.

Using the system

This my first time using Fedora. The only RPM-based distro that I used in the past was SuSE, about 25 years ago. But from afar, Fedora always seemed to be fairly polished. So I was curious.

There are several flavors of desktops that can be selected in the installer: KDE, GNOME, or no desktop / do-it-yourself. The documentation says that KDE is the most polished option, so I went with KDE Plasma. Again, many years have passed since I last used KDE.

KDE still feels cluttered to me, as if its designers think what power users want is lots of buttons in lots of toolbars everywhere. But luckily, you can disable them and gain some space.

On the other hand, the UI in general (based on Wayland) looks gorgeous on the 2x Hi-DPI screen of the MacBook Air! I have struggled to configure Hi-DPI X11 desktops properly in the past (both on NetBSD and Debian), but Fedora has really nailed the setup out of the box. Kudos!

Another thing I noticed is the amount of updates. Fedora’s equivalent to apt used to be called yum and is now dnf, though there is a symlink – yum update still works. Contrary to apt, dnf update fetches package descriptions and updates packages.

New updates are added all the time. As I quipped on Mastodon the other day:

I have not booted this Fedora system in a few days, so 750 MB of updates it is

This was literally the amount of updates I had at one point. I am not sure if this is good or bad. I guess it’s not a big deal unless you are concerned about the amount of SSD writes, or the speed of your internet connection (FTTH FTW!).

Installing more software

  • Bootstrapping pkgsrc went fine, so that’s another 35000 packages at my fingertips :)
  • I tried installing Steam but it seems like there are no Linux/aarch64 packages available. This works better on macOS with the Game Development kit doing some kind of ad-hoc x86 emulation, though I was able to make it work exactly once and it stopped working after a reboot.
  • I wanted Visual Studio Code (don’t judge me!), and the graphical software catalog proposed a Flatpak version. As I quickly discovered, the VS Code Flatpak is useless, at least for my purposes. I suppose you can make it work with a generous sprinkling of Dev Containers, but I don’t want to.

Hardware support

Almost everything works! I cannot overstate what a big deal this is, and I expected the experience to be much less refined in that regard. The function keys do the right thing, you can control brightness and keyboard backlight, suspend/resume works perfectly, etc.

Using Fedora on this hardware feels super snappy. File system operations are shockingly fast. Compiling software seems much faster than on macOS, though I did not benchmark this. My hunch is that the process management is a bit slower in macOS, due to the Mach / XNU underpinnings. 3D graphics performance is good too.

I only found two minor caveats in my testing:

  • External displays over USB-C do not work, which means I cannot do my conference presentation from Linux. This is a known limitation.
  • The system uses more power while it is suspended. In macOS, you can close the lid and have almost no battery use during the time. On Linux, the battery is empty after a few days.
  • In general, battery life is a bit worse but still amazing. This may be due to me running more demanding workloads while on Linux. I did not do a scientific comparison.


I really like it!

I might find myself using Linux as my main OS on the MacBook – though in the last couple of days, I have been using macOS more, maybe half the time. Still, I had not expected this distro to be so good. And I had not expected that I would like it so much.

So is 2024 finally the year of Linux on the desktop? I guess for me it is.

14 Jan 2024, 18:47

The VS Code Flatpak is useless

I installed Fedora 39 the other day. (More on that in one of the next posts.) It has a nifty software installer thing named “Discover”. When I typed “Visual Studio Code” into the search box, it dutifully installed VS Code. As a Flatpak.

This was the first time I interacted with Flatpak, and it did not go well.

Aside: Why VS Code

I want to use VS Code for editing Go code, with gopls, since it provides a really good integration. It turns out that a majority of Go developers use VS Code, so the language server integration is well tested and complete. In short, it’s the well-lit path.

Specifically for myself, there is a second reason: At work, we use a bespoke web-based IDE based on VS Code, with all sorts of integrations to make developers more productive. I have gotten very familiar with this setup, so it makes sense for me to write open source code in this way too and benefit from the muscle memory, so to speak.

There is a philosophical argument that you should avoid VS Code because it is controlled by Microsoft, and it gives Microsoft a certain leverage over the open source Go ecosystem. For what it’s worth, the same argument applies to using GitHub: Such a large percentage of open source code is developed on GitHub these days, and Microsoft could “enshittify” it at any moment if they wanted to. But for both of these, I personally think that it would be easily doable to switch away from them to something free – for instance, move over to Neovim’s LSP integration. In the meantime, I remain pragmatic and use what works for me.

Why not Flatpak

Back to Fedora. I installed Go and gopls from pkgsrc, as you do. However, the Go plugin tells me that it cannot find gopls, or any of the toolchain. Why!? Issue golang/vscode-go#263 has a bunch of people rather confused about this failure mode.

Ultimately, it comes down to this: Flatpaks run isolated from the host OS, which is fundamentally incompatible with developing code in an IDE. Despite the allowlist for access to OS paths, there is no way to allow access to /usr, /lib, etc. This is fundamental to the Flatpak security model. The app runs in a container that is cordoned off from the main OS.

But an IDE for developing native code does need host OS access! It needs to run a build tool, toolchain, native tools, etc. Not only does the VS Code Flatpak not allow doing this, it also does not provide an easy way to install a toolchain into the container. Never mind that I do not want a second copy of my tools. An IDE also needs to be able to run a shell in a terminal.

At this point, I am left wondering: who can actually use this Flatpak, if native compilation is all but impossible? Is it made for frontend devs writing only JS, with all the tooling in JS as well?


I am also wondering why the Flatpak is the default type of installation for VS Code on Fedora, given that Upstream has a repository of perfectly fine RPM packages. See https://code.visualstudio.com/docs/setup/linux#_rhel-fedora-and-centos-based-distributions.

The other container-like option would be a Snap, if you are into additional packaging systems on your distro. You can install the Snap in “classic” mode, where it has file system access:

sudo snap install --classic code

My choice has been the native package, and now everything is running fine.