Title: Export PMS's cached VDB information
Author: Anthony G. Basile <email@example.com>
Type: Standards Track
During build time, important information is generated by the package
manager (PM) about the package(s) being built. When Portage is used
as the PM, this information is cached on a per package basis in directories
under /var/db/pkg/<cat>/<pkg> (VDB). While this information can be
regenerated on the fly, doing so may be expensive or impractical. Examples
of such information include a complete list of all files belonging to
a particular installed package or the dynamical linking information about
a package's executable and/or shared objects. To avoid the unnecessary cost of
regenerating, and to facilitate interoperability between all PM's and other
tools that could use this information, all PM's should cache a standard set
of information and provide a common API for exporting it. In this GLEP, we
specify what information should be cached and exported.
Information generated by the PM at build time spans the spectrum from easy to
difficult to regenerate. Some information, like a package's HOMEPAGE may be
trivially regenerated by simply grepping the package's ebuild in portage tree.
Despite this ease, however, even this information needs to be cached in case
the ebuild is removed from the tree, but the package is still installed on the
system. But even if the installed package and the ebuild in the tree are not
"out of sync", there is yet another reason to cache information generated by
the PM at build time. Some information, like the list of all installed files
belonging to a particular package, cannot be trivially regenerated. If such
a list were not cached, the PM would have to rebuild the package in order to
regenerate it, and even then this regenerated list is not guaranteed to
represent the actual state of the installed package because of possible
changes in the environment of the rest of the system between builds. Apart
from the fact that the PM itself needs this list when uninstalling, and so
much cache it for itself, listing a package's files is useful for other
utilities. For example, at the time of this writing, sys-apps/elfix,
app-portage/gentoolkit, app-portage/portage-utils and app-portage/eix, are
some examples of utilities that make use of portage's VDB to obtain this
Another example of information which is usefeul and expensive to regenerate,
but perhaps less obvious than the previous example, is linking information
such as that reported by running ``readelf`` or ``scanelf`` on ELF objects, or
similar utilities for other executable formats like Mach-O or COFF. On a
"rolling release" such as Gentoo, tracing forward and reverse dependencies
between executable objects and their libraries is critical to avoid breakage
during upgrade. The need to trace these dependencies is evident in PMS
features like sub-slotting which aim to make sure that executables are always
consistently built against libraries: upgrading a library which breaks
backwards compatibility automatically triggers rebuilding of its dependent
executable(s) [#PMS-SPEC]_ [#SUBSLOTS]_. While sufficient in their own scope,
these PMS features have limitations: 1) this information is calculated to
ensure consistency at build time, but is not cached and exported afterwards
for use by other tools, such as ``revdep-pax`` which uses the same information
to consistently apply PaX markings between executables objects and libraries
[#REVDEP-PAX]_; and, 2) such information is not sufficiently fine grained for
tools which require discrimination on the basis of ABI, SONAME, library path
name etc. By caching and exporting this formation, an entire "linkage graph"
of executables objects and libraries on a system can be constructed
[#LINKAGE-GRAPH]_ to facilitate quick traversal of both forwards and
backwards dependencies. Questions like "what are the path names of all the
executables on this system which link against libssl.so.1.0.0 for ABI=x32?"
can be quickly answered without having to reread the dynamic section of every
object on the system in a search for those which are x32 and need libssl.so.
The above examples motivate us to created a uniform standard for any utility
that would like to make use of this generated information. Below, we specify
a standard minimum set of information that should be generated by any PM at
build time, cached and then exported by an common API.
For each package installed, the following information should be generated
at build time, cached, and later exported:
* All portage variables as specified as part of the Metadata Cache as defined
in PMS 13.2 [#METADATA-CACHE]_. Note that, as with the Metadata Cache, these
variable should be stored with all the conditionals evaluated.
* A list of all files belonging to the package, along with a designation of
the file type (regular, directory, symlink, pipe, etc), MD5SUM or other
checksum, and mtime time.
* A list of all executable or shared objects for each package and the
corresponding linking information, including full path to the object, its
architecture and ABI, SONAME, RPATH and any NEEDED objects they link
against, as reported by ``readelf`` on ELF systems, or similar tools for
other executable formats. Currently this information is being cached by
Portage in NEEDED.ELF.2, NEEDED.MACHO.3, NEEDED.XCOFF, NEEDED.PECOFF, etc.
* Flags affecting the package's build system behavior, including at least
CHOST, CBUILD, CTARGET, CFLAGS, CXXFLAGS, CPPFLAGS, and LDFLAGS. In case a
fortran compiler is used, FFLAGS should also be included. These may be
empty in the case of packages where compiling/linking is unnecessary.
* Flags affecting the PM's behavior which are not already specified
in PMS 13.2, including at least USE and KEYWORDS.
* Dependency between packages calculated by the PM, including at least DEPEND,
RDEPEND, and PDEPEND.
* Miscellaneous information including the time the packages was built, the
repository name, DEFINED_PHASES, EAPI, INHERITED eclasses and SLOT.
It is not the purpose of this GLEP to specify the details of a common API for
exporting the above information. Even less so is it our purpose to delineate
the implemenatation details for each PM. However, a common API for exporting
the above information should be developed and specified by the PM teams and be
included in future PMS documentation. Any changes to API should be versioned
to allow for consistency as it develops over time.
As a guide, we recommend a plain CLI API which answers questions as follows:
What is the SLOT number of a particular version of webkit-gtk?
query-installed metadata =net-libs/webkit-gtk-2.4.4-r200 SLOT
What is the ABI and of a particular file and the libraries it links against?
query-installed file /usr/bin/timeout ABI NEEDED
1. Portage has cached all the above information since v2.2_pre7 2008-05-21;
however, it is not exported via a consistent API. Versions of portage with
the above specified API implemented can make use of caches built as far
back as 2008.
2. For PM's that do not cache any of above, a migration scheme should be
implemented to generate the cache without having to rebuild world.
.. [#COUNCIL-RATIFICATION] This has been ratified by the Council. See
.. [#PMS-SPEC] This is specified in PMS. See
.. [#SUBSLOTS] Sub-slots_and_Slot-Operators
.. [#REVDEP-PAX] http://git.overlays.gentoo.org/gitweb/?p=proj/elfix.git;a=blob;f=scripts/revdep-pax
The man page can be viewed at http://www.linuxhowtos.org/manpages/1/revdep-pax.htm
.. [#LINKAGE-GRAPH] An example of such a class is at
Portage itself constructs such a graph internally when evaluating emerge
.. [#METADATA-CACHE] https://projects.gentoo.org/pms/6/pms.html#x1-16300013
This work is licensed under the Creative Commons Attribution-ShareAlike 3.0
Unported License. To view a copy of this license, visit