03 Feb 2023, 09:54

pkgsrc and a Call for Action

I have been a pkgsrc developer for several years. For what it’s worth, I think pkgsrc is wonderful: a large selection of third-party software, packaged so that it is easy to install with a single command – either building everything from source, or relying on binary packages. pkgsrc supports dozens of OSes – not just NetBSD but also other BSDs, macOS, Linux, Illumos and more.

On the other hand, unfortunately, pkgsrc and NetBSD in general are suffering from what I would call a loss of mindshare. When I joined the NetBSD Foundation as a developer, I had the impression that NetBSD had more users and community than OpenBSD or, say, Dragonfly. In recent years however, it seems to me that people have more or less forgotten about NetBSD and pkgsrc.

Here is where you may come in

Here is a simple idea that occurred to me a while ago and that I finally am getting around to writing up, now that I am sitting in the train to FOSDEM 2023:

These days, a lot of upstream software has a manual with a section on installation that does not give instructions for actually building it. (It seems like building from source is so 1995, or something.) Instead, it goes roughly like this:

If you are using a Mac or a Linux/x86_64 machine, see our own binary packages here. Or use distro packages:

For Debian, run apt-get install somepackage.

For Fedora Linux, run some yum command.

etc. etc.

There is typically a long list of install commands for a bunch of Linux distributions and OSes. pkgsrc is almost never mentioned.

So if you are looking for a simple way of helping out pkgsrc and NetBSD, look for the README of your favorite Open Source tools and add a section for installing from pkgsrc. Typically, for a package from pkgsrc itself, the simplest is to add instructions for using pkgin:

$ pkgin install somepackage

For packages in pkgsrc-wip, you could first ask on the mailing list for an import so that there may be binary packages in the future :) Then, give the typical instructions for installing from source:

$ cd /usr/pkgsrc/wip/somepackage
$ make package-install

I have created a few pull requests for such changes in the past, but I think we need a lot more of that!

07 Nov 2021, 11:27

go-modules.mk

The BSD build system in general, and pkgsrc in particular, have a large number of Makefiles ending in .mk.

Recently, I was looking at a commit message in Gmail and noticed that these names are linkified. At the time, I was looking at a Go module package, where there is a go-modules.mk file containing details about dependencies. This got me thinking: Why is this file name turned into a link?

It turns out that .mk is the ccTLD of the Republic of North Macedonia! So I did what I had to do: I went to the website of a registrar in Skopje and reserved the go-modules.mk domain.

For the DNS, I created a zone in Google Cloud DNS, which was simple and extremely cheap, on the order of a few cents per month. For the contents, I added the domain as a redirect to my existing blog on the Firebase console.

So there you have it: next time, you see one of those emails, clicking the file name will bring you directly to this blog.

I have not checked if other standard Makefile names are still free as domains. So if you are interested in putting your pages at bsd.prog.mk or similar, here is your chance! :D

12 May 2021, 18:31

More Go modules in pkgsrc

This weekend, I made a series of somewhat unusual changes to pkgsrc.

I removed a bunch of Go packages.

Why? Because of Go modules.

What are Go modules?

Since my series of design-ish blog posts(part 1, part 2), Go module builds have fully landed in pkgsrc, to the point that they are now the preferred way to build Go packages.

To recap: There are two ways to use the go tool to build Go code.

  • The old way is to have a tree, below $GOPATH, that has all the dependencies in a directory tree according to their import path. For instance, the golang.org/x/net package would be placed in a $GOPATH/src/golang.org/x/net directory. This is what lang/go/go-package.mk implements in pkgsrc.
  • The new way is to extract the source code wherever you want, just like any C source. The top-level source directory contains a go.mod file that specifies dependencies and their versions. The go tool then downloads a bunch of .zip and .mod files for those dependencies and unpacks them as needed. This is similar to how Cargo works for Rust code.

Like with Rust, in pkgsrc, we specify a list of dependent module files to be downloaded from the module proxy.

In actual practice, a useful pattern has emerged, where the list of modules is in a separate file named go-modules.mk in the package directory. To create or update the file, simply run

$ make patch
$ make show-go-modules > go-modules.mk

and then .include the file from the main Makefile.

But why remove all these packages?

A pkgsrc package built with lang/go/go-module.mk does not install any source code or .a files. Only the binaries are packaged, just like for C. Go packages that just correspond to intermediate libraries and do not contain any useful binaries are simply no longer needed. They can be deleted as soon as nothing depends on them any more.

In particular, I changed all the packages depending on golang.org/x/tools to be modules, then migrated the go-tools package itself. go-tools depends on a number of other libraries that nothing else depends on.

By the way, it is fairly simple to make a non-module into a module, even if the source does not contain a go.mod:

  1. Change go-package.mk to go-module.mk.
  2. Run make patch and change into the top-level source directory.
  3. Run go mod init github.com/foo/bar or whatever the import path is.
  4. Update the file with go get and/or go mod tidy.
  5. Copy the generated go.mod and go.sum files to a files directory and copy them into place in pre-patch.

Future

Some future Go release will deprecate GOPATH builds, so we must convert all Go code in pkgsrc to modules at some point. By the way, if upstream has not made the jump to modules yet, they might be happy about your pull request :)

08 Sep 2020, 18:19

pkgsrc Developer Monotony

Somehow, my contributions to NetBSD and pkgsrc have become monotonous. Because I am busy with work, family and real life, the amount of time I can spend on open source is fairly limited, and I have two commitments that I try to fulfill:

  1. Member of pkgsrc-releng: I process most of the pull-ups to the stable quarterly branch.
  2. Maintainer of Go and its infrastructure.

Unfortunately, these things are always kinda the same.

For the pull-ups, each ticket requires a verification build to see if the package in question actually works. That time tends to be dominated by Firefox builds, of all things: we fortunately have people that maintain the very regular updates to LTS versions of both firefox and tor-browser, but that means regular builds of those.

As for Go, there are regular updates to the two supported branches (1.14 and 1.15 as of now), some of which are security updates. This means: change version, sync PLIST, commit, revbump all Go packages. Maybe file a pull-up.

This becomes somewhat uninteresting after a while.

What To Do

Honestly, I am not sure. Give off some of the responsibility? There is only one person in the pkgsrc-releng team that actually does pull-ups, and they are busy as well.

03 Feb 2020, 11:30

How to do Pull-ups to pkgsrc-stable

I am part of the pkgsrc releng (release engineering) team. My main task there is handling pull-ups into the stable branch.

pkgsrc creates a stable branch every three months and names it after the respective quarter – for example, the last branch was called 2019Q4. Pull-ups are tickets to “pull up” one or more commit from the development branch into the stable branch. Typical justifications for pull-ups are:

  • security updates
  • build fixes
  • important bug fixes (such as when the package crashes on startup)

In addition, sometimes we pull up updates to packages if they are “leaves” and stop working without regular updates. Some web scrapers, for example, need regular updates to keep up with changes in the sites they scrape.

Pull-up tickets can only be sent by pkgsrc developers, to a special mail alias. Unfortunately, the mails come in all kinds of formats, which makes things harder for me and the others on the team.

There is a Python script, in an internal repository, that we use for pull-ups to pkgsrc.

The ideal input is just the commit messages, concatenated, up to and including the blurb about diffs not being public domain. Links to https://mail-index.netbsd.org/ are okay too, but I have to copy/paste the corresponding messages manually. Patches typically mean more manual work because they typically do not contain an appropriate commit message.

When there have been intermediate commits, most of the time they need to be pulled up too. For example, if the stable branch is at version 1.0 and you want version 1.2 pulled up, you typically need to add the 1.1 update commit, for whatever patch or PLIST changes.

If the intermediate commit is a revbump touching a million packages, it is probably better to leave that out. PKGREVISION merge conflicts are almost trivial to fix.

Finally, if you are interested, there is a public web interface tracking the status of pkgsrc pull-ups at http://releng.netbsd.org/cgi-bin/req-pkgsrc.cgi.

15 Jul 2019, 20:44

A Tale of Two Spellcheckers

This is a transcript of the talk I gave at pkgsrcCon 2019 in Cambridge, UK. It is about spellcheckers, but there are much more general software engineering lessons that we can learn from this case study.

The reason I got into this subject at all was my paternal leave last year, when I finally had some more time to spend working on pkgsrc. It was a tiny item in the enormous TODO file at the top of the source tree (“update enchant to version 2.2”) that made me go into this rabbit hole.

A short history of spellchecking

spell

The oldest spellchecker, spell, appeared in version 6 AT&T Unix, but it was actually written before that, in 1975. The great Doug McIlroy (who is also the inventor of the concept of a “pipe”, by the way) worked on spell, added it to UNIX and wrote a 1982 paper. Today, NetBSD still contains a version of spell(1) in the base system.

To say that spell is not user-friendly is an understatement. You give it a text (or troff!) file to check, and it outputs a list of all the misspelled words on stdout. It supports both kinds of languages, British and American. In American mode, it flags all British spelling as incorrect, and vice versa. This includes verbs ending in -ize needing to be written with an -ise ending, which is highly questionable from a linguistic point of view too.

ispell and aspell

Next came a program called ispell, which stands for “interactive spell”. Its main innovation was interactive operation: it would stop when it found a misspelled word and present you with suggestions for what you meant. You choose the correct spelling, and ispell replaces it in the text. It supports different languages, and there is a comprehensive set of dictionaries.

aspell (advanced spell?) set out to replace ispell as the standard spellchecker. Its main distinctive feature was that its suggestions are far better than the ones that ispell provides (“even better than Word 97!”, its documentation claims). It also understands encodings, including UTF-8, which is a big deal for most languages.

Both ispell and aspell are in active use today. Their dictionary formats are different.

A digression on agglutinating languages

Imagine that you would like to spellcheck a text that is written in Finnish. The problem with writing a dictionary for Finnish though is the near-infinite number of words that it would need to contain. It is an agglutinating language, which means that you can stick words together without a space. As an example, consider the word “kasvihuoneilmiƶ”, which is composedof the individual words for “plant”, “room” and “phenomenon” and means “greenhouse effect”. Such composites do not obey vocal harmony rules (indeed, this is one way to see where the separation is). In addition, there are about 15 different cases for the noun, as a number of prepositions is replaced by a case (as in, a suffix). Some cases use the strong stem, some the weak stem of the word. Et cetera.

So what do you do if you would like to keep the dictionary as small as possible? Easy: you take a team of linguists and have them carefully model all the rules of word construction as a library! Such a thing exists in fact for Finnish (voikko), for Turkish (zemberek) and for Hungarian (hunspell).

Hunspell is particularly interesting: while it does contain special word formation rules for Hungarian, it is also an excellent spellchecker for other languages. It can use aspell dictionaries but is faster and gives even better suggestions, apparently. So using hunspell is a fairly popular choice among users, no matter what language.

Needs more abstraction

The consequence of the previous section is:

  • multiple spellcheckers are in active use;
  • users would like to be able to choose which one to use;
  • that choice may depend on the language of the document.

So what should you do as an application developer? You need an abstraction library over all these different programs. Such a library exists, and it is called Enchant.

Enchant gives you a uniform interface over all spellcheckers (including the system one on macOS, and a few more), handles the user choice of backend and the user dictionaries.

The messy Enchant 2 transition

Enchant 2.0.0 was released in August 2017. Its release notes contain this (emphasis mine):

The major version number has been incremented owing to API/ABI changes, but in practice upgrading from 1.6.x should be easy.

Previously-deprecated APIs have been removed.

The little-used enchant_broker_get/set_param calls have been removed.

Some trivial API changes have been made to fix otherwise-unavoidable compilation warnings both in libenchant and in application code. This is strictly an ABI change (although the ABI may not actually have changed, depending on the platform).

So there is a new major release, and it is incompatible with the previous release for ABI and API. In a surprising development, uptake of the new version was extremely low. For someone to adopt the new version, they have to replace the old version with it, at which point all programs that use Enchant stop working until you fix them. Thus, the developers have created a chicken-and-egg problem.

They spent the next couple releases re-adding bits that had been removed and declared that the new release was now API-compatible to 1.x, “except for some really deprecated calls”. It just so happened that many Enchant-using programs actually used these calls, since they were more convenient than their non-deprecated replacements!

Then, in November 2017, Enchant 2.1.3 had this in its release notes (again, emphasis mine):

This release adds support for parallel installation with other major versions of Enchant, and fixes a crash in the Voikko provider when it has no supported languages.

2.2.0 fixed parallel installation fully. You can now install Enchant versions 1 and 2 in parallel, since they go into different subdirectories and have different pkg-config files (enchant.pc vs. enchant-2.pc).

Adoption by other software is still really low: no one checks for the enchant-2 package, and for an application developer, there has never been a compelling reason to use version 2 rather than 1.

Enchant in pkgsrc

Back to pkgsrc. I tried to make Enchant 2 the only Enchant version in our tree.

And failed.

As stated above, almost no software has explicit support for checking enchant-2.pc, so I resorted to a trick to not have to patch all those configure scripts. textproc/enchant2/buildlink3.mk has this bit:

# Lots of older software looks for enchant.pc instead of enchant-2.pc.
${BUILDLINK_DIR}/lib/pkgconfig/enchant.pc:
        ${MKDIR} ${BUILDLINK_DIR}/lib/pkgconfig
        cd ${BUILDLINK_DIR}/lib/pkgconfig && ${LN} -sf enchant-2.pc enchant.pc

buildlink-enchant2-cookie: ${BUILDLINK_DIR}/lib/pkgconfig/enchant.pc

What this does is symlink enchant-2.pc to enchant.pc within the buildlink tree that is created for a single package build. We can do that because no enchant1 files are present in that tree.

But what broke the whole thing was PHP. Of course.

php-enchant supports only enchant1. Worse, it translates the entire API, including those deprecated bits, to PHP. So there is no way to make it use the newer version: if you were to remove the APIs that are no longer provided, software using php-enchant might break, at runtime. This is not acceptable for web applications.

So this is where I am stuck.

General advice for library authors

In lieu of a conclusion, I would like to offer some general advice if you are the author or maintainer of a library.

The most important is this: An incompatible V2 of a library is like a new product.

Importantly, this means that if you stop maintaining V1 the moment you release V2, it is as if you had abandoned your library and created a new one.

Think of other projects that depend on you as customers. Think about migration paths. Think about the cost-benefit ratio of an upgrade by your customers.

Consider sending pull requests to your customers! If you look at pkgsrc, Debian, etc., it is easy to see what other projects depend on your library. Many of them are on github. All of them probably have a way of sending patches. Send them a patch to upgrade the dependency. Do the work for them.

Otherwise, you are developing for no one.

02 Jul 2019, 19:29

pkgsrccon 2019: Talk Announcement

In a few weeks, on the weekend of July 13 and 14, the annual pkgsrc conference, pkgsrcCon 2019, will take place in Cambridge, UK. Whether you are a user or developer of pkgsrc, this is a really nice place to meet the developers and spend some time hacking together and listening to talks.

My talk this year was originally supposed to be about Go module support in pkgsrc, but that work did not get done in time. So instead, I will talk about something entirely different:

A Tale of Two Spellcheckers

This talk is about the obscure and intricate world of spell checkers and how they are packaged in pkgsrc, the NetBSD package collection.

There are many general-purpose spell checkers in existence. ispell and aspell are the most famous ones, but hunspell, originally written for Hungarian, has become the spell checker of choice. In addition, there are a number of specialized checkers tailored to the idiosyncrasies of a single language, e.g. voikko (Finnish) and zemberek (Turkish).

There is a separate library, called Enchant, that aims to abstract away the spell checker implementation from the application code. Enchant went through a botched transition of major versions, from V1 to V2. To this day, most apps only support V1. We’ll talk about general lessons from this case.

30 Apr 2019, 20:19

Supporting Go Modules in pkgsrc (Part 2)

This announcement dropped today:

I realized that this is the missing piece for supporting Go modules in pkgsrc. If you go back and reread the “fetch” section in Supporting Go Modules in pkgsrc, it seems a bit awkward compared to a standard fetch action. The reason is that go mod download re-packs the source into its own zip format archive.

The module proxy (https://proxy.golang.org/) solves this problem and enables a simple solution for modules, very similar to lang/rust/cargo.mk. Basically, a target similar to show-cargo-depends that outputs a Makefile fragment containing the names of modules that the current package depends upon. All these become distfiles fetched from a hypothetical $MASTER_SITES_GOPROXY. Crucially, this means that the distfiles do not have to be stored in a LOCAL_PORTS subdirectory but can use the normal fetch infrastructure.

Now all that remains is implementing this :) There is some more time to do that: Go 1.13 (to be released some time in summer) will use module support by default. What’s more, a bunch of new software (including the various golang.org/x/* repositories) has go.mod files these days, using module-based builds by default.

04 Feb 2019, 17:15

Pkgsrc Buildbots

After talking to Sijmen Mulder on IRC (thanks, TGV Wi-Fi!), I began thinking more about how you could automate the pkgsrc release engineers away.

The basic idea for a buildbot would be this:

  1. Download and unpack latest pkgsrc.tar.gz for the stable branch.
  2. Run the pullup script with the ticket number, then run whatever pullup script it outputs.
  3. Figure out the package that this concerns (perhaps from filenames).
  4. Go to the package in question, install its dependencies from binary packages.
  5. Build (make package is probably enough, or perhaps also install?).
  6. Upload build log to Cloud Storage.
  7. Post an email to the pullup thread with status and a link to the log.

For extra points, do this in a fresh, ephemeral VM, triggered by an incoming mail.

You would also need a buildbot supervisor that receives mails (to know that it should build something) and that launches the VM. I know that Google App Engine could do it, as it can receive emails. But maybe Cloud Functions would be the way to go?

In any case, this would be a cool project for someone, maybe myself :)

Issues with Pull-up Ticket Tracking

This project is largely orthogonal to improvements in the pullup script. Right now, there are a number of issues with it that make it require manual intervention in many cases:

  • The tracker (req) doesn’t do MIME, so sometimes mails are encoded with base64 or quoted-printale. This breaks parsing the commit mails.
  • Sometimes, submitters of tickets insert mail-index.netbsd.org URLs instead of copies of the message.
  • Some pullup tickets include a patch instead of, or in addition to, a list of commits. For instance, this may happen when backporting a fix to an older release instead of pulling up a bigger update.
  • Sometimes, commit messages are truncated, or there are merge conflicts. This mostly happens when there has been a revbump before the change that is to be committed – in the majority of cases, the merge conflicts only concern PKGREVISION lines.

I am wondering how much we could gain, e.g. in terms of MIME support, from changing the request tracking software. admins@ uses RT, which has more features. Perhaps that could be brought to pullup tickets?

29 Dec 2018, 13:12

Supporting Go Modules in pkgsrc, a Proposal

Go 1.11 introduced a new way of building Go code that no longer needs a GOPATH at all. In due course, this will become the default way of building. What’s more, sooner or later, we are going to want to package software that only builds with modules.

There should be some package-settable variable that controls whether you want to use modules or not. If you are going to use modules, then the repo should have a go.mod file. Otherwise (e.g. if there is a dep file or something), the build could start by doing go mod init (which needs to be after make extract).

fetch

There can be two implementations of the fetch phase:

  1. Run go mod download.

    It should download required packages into a cache directory, $GOPATH/pkg/mod/cache/download. Then, I propose tarring up the whole tree into a single .tar.gz and putting that into the distfile directory for make checksum. Alternatively, we could have the individual files from the cache as “distfiles”. Note however (see below) that the filenames alone do not contain the module name, so there will be tons of files named v1.0.zip and so on.

  2. “Regular fetch”

    Download the .tar.gz (or the set of individual files) from above from the LOCAL_PORTS directory on ftp.n.o, as usual.

The files that go mod download creates are different from any of the ones that upstream provides. Notably, the zip files are based on a VCS checkout followed by re-zipping. Here is an example for the piece of a cache tree corresponding to a single dependency (ignore the lock files):

./github.com/nsf/termbox-go/@v:
list                                           v0.0.0-20180613055208-5c94acc5e6eb.lock        v0.0.0-20180613055208-5c94acc5e6eb.ziphash
list.lock                                      v0.0.0-20180613055208-5c94acc5e6eb.mod
v0.0.0-20180613055208-5c94acc5e6eb.info        v0.0.0-20180613055208-5c94acc5e6eb.zip

As an additional complication, (2) needs to run after “make extract”. Method (1) cannot always be the default, as it needs access to some kind of hosting. A non-developer cannot easily upload the distfile.

extract

In a GOPATH build, we do some gymnastics to move the just-extracted source code into the correct place in a GOPATH. This is no longer necessary, and module builds can just use the same $WRKSRC logic as other software.

build

The dependencies tarball (or individual dependencies files) should be extracted into $GOPATH, which in non-mod builds is propagated through buildlink3.mk files of dependent packages. After this, in all invocations of the go tool, we set GOPROXY=file://$GOPATH/pkg/mod/cache/download, as per this comment from the help:

A Go module proxy is any web server that can respond to GET requests for URLs of a specified form. The requests have no query parameters, so even a site serving from a fixed file system (including a file:/// URL) can be a module proxy.

Even when downloading directly from version control systems, the go command synthesizes explicit info, mod, and zip files and stores them in its local cache, $GOPATH/pkg/mod/cache/download, the same as if it had downloaded them directly from a proxy.