Title:New "keyword" system to incorporate various userlands/kernels/archs
Last-Modified:2004/03/07 02:20:32
Author:Grant Goodyear <g2boojum at>
Type:Standards Track



I'm withdrawing this GLEP. It is clear from the discussions on gentoo-dev that although breaking they keywords into four components is probably a good idea, the four components are not independent. Thus, the "keyword explosion" that this GLEP tries to prevent is inevitable. The real issue, then, is how to make the keyword explosion reasonably manageable, but that's a topic for another GLEP.


This GLEP originated from the concerns that Daniel Robbins had with the x86obsd keyword, and his desire to make the KEYWORDS variable more "feature-rich". Drobbins' original idea was that we should allow compound keywords such as gnu/x86, gnu/ppc, and macos/ppc (which would be explicit versions of the more familiar x86, ppc, and macos keywords). Method noted that userland/arch failed to capture the full range of possibilities (what about a GNU userland on a BSD kernel+libc?), and the issue has languished due to a lack of reasonable solutions.


As Gentoo branches out to support non-Linux and non-GNU systems (such as Hurd or the *BSDs), the potential for an "explosion" of possible keywords becomes rather large, since each new userland/kernel/arch/whatever combination would require a new keyword. This GLEP proposes replacing the current KEYWORDS variable with four variables, ARCH, USERLAND, KERNEL, and LIBC, along with sensible defaults to keep the new system manageable.


Since the beginning, Gentoo Linux has been conceived as a "metadistribution" that combines remarkable flexibility with sensible defaults and exceptional maintainablilty. The goal of the Gentoo-Alt [1] project has been to extend that flexibility to include systems other than GNU/Linux. For example, the author of this GLEP has been working to create a version [3] of Gentoo that uses OpenBSD [2] as the underlying kernel, userland, and libc. OpenBSD [2] supports a variety of different architectures, so, in principle, we would need a new openbsd-arch keyword for each supported architecture. In fact, the situation is even more complicated, because the Gentoo-Alt [1] project would eventually like to support the option of "mixing-and-matching" GNU/*BSD/whatever userlands and libcs irrespective of the underlying kernel. (Debian [4], for example has a similar BSD project [5], except that they have replaced the BSD userland with a GNU userland.) The net result is that we would need keywords that specified all possible permutations of arch, userland, kernel and libc. Not fun.


New Variables

I suggest that we replace the single KEYWORDS variable in ebuilds with four separate variables: ARCH, USERLAND, LIBC, and KERNEL.

x86, amd64, cobalt, mips64, arm, hppa, ia64, ppc64, sparc
gnu, bsd
glibc, openbsd, freebsd, netbsd, macosx
linux, selinux, openbsd, freebsd, netbsd, macosx

(The above examples are not meant to be complete. Hurd, for example is not included because I know very little about Hurd.) For each variable the standard "-,-*,~" prefixes would be allowed. Similarly, /etc/make.conf would have ACCEPT_ARCH, ACCEPT_USERLAND, ACCEPT_LIBC, and ACCEPT_KERNEL variables.

Reasonable Defaults

To keep this system manageable, we need sensible defaults. An ebuild that has missing USERLAND, KERNEL, or LIBC variables is provided with implicit USERLAND="gnu", KERNEL="linux", and/or LIBC="glibc" variables. However, once a variable is explicitly added (such as KERNEL="openbsd"), the default is no longer assumed. That is, one would need KERNEL="openbsd linux" if the ebuild is stable on both openbsd and linux kernels.

The ARCH variable, on the other hand, does not have a default, per se. Instead, if no ARCH variable exists then portage would automatically add the ebuild's KEYWORD entries to ARCH. Thus, all current ebuilds would still work without changes, allowing for a gradual transition to the new system as the new variables are needed.


Along with an explosion of keywords comes a concomitant explosion of potential profiles. The good news is that profiles show up only in a single directory, so an explosion there is easier to contain. I suggest an arch-kernel-userland-libc-version naming scheme, with the kernel-userland-libc terms defaulting to linux-gnu-glibc if absent. (Yes, Chemists do tend to be fond of systematic naming systems.)

One drawback to having a large number of profiles is that maintainance becomes a significant problem. In fact, one could reasonably argue that the current number of profiles is already too many to be easily maintained. One proposal that has been raised to simplify matters is the idea of stackable, or cascading, profiles, so that only differences between profiles would have to be maintained.


The proposed new "keywording" system is far from elegant, which is a substantial drawback. On the other hand, it is simple, it requires relatively minor changes (albeit ones that eventually would impact every ebuild in the portage tree), and the changes can be implemented gradually over time.


Implementation of this GLEP would divide into adding Portage functionality to support the new system and modifying ebuilds to comply with the new system. The Portage support involves hacking Portage to assemble and check a four-state arch-userland-kernel-libc variable instead of the simpler KEYWORD variable. One might quibble over algorithmic issues, but the actual concept is pretty straightforward. Rewriting ebuilds, on the other hand, is a massive undertaking. Fortunately, it is also a process that can be done over whatever length of time is required, since "legacy" ebuilds should work with no changes.

Backwards Compatibility

Backwards compatibility has already been addressed in some detail, with the stated goal being a system that would leave all current ebuilds in a still-functioning state after the portage modifications have been made. However, we are already using an ARCH variable for some arcane purpose in Portage, and that issue would still need to be resolved.


[1](1, 2)
[2](1, 2)