GLEP:44
Title:Manifest2 format
Version:1.1
Last-Modified:2005/12/06 03:34:21
Author:Marius Mauch <genone at gentoo.org>,
Status:Draft
Type:Standards Track
Content-Type:text/x-rst
Created:04-Dec-2005
Post-History:05-Dec-2005

Contents

Abstract

This GLEP proposes a new format for the Portage Manifest and digest file system by unifying both filetypes into one to improve functional and non-functional aspects of the Portage Tree.

Motivation

Please see [1] for a general overview. The main long term goals of this proposal are to:

Specification

The new Manifest format would change the existing format in the following ways:

Each line in the new format has the following format:

<filetype> <filename> <filesize> <chksumtype1> <chksum1> ... <chksumtypen> <chksumn>

However theses entries will be stored in the existing Manifest files.

An actual example for a (pure) Manifest2 file could look like this (using indentation to indicate line continuation):

AUXFILE ldif-buffer-overflow-fix.diff 5007 RMD160 1354a6bd2687430b628b78aaf43f5c793d2f0704 
        SHA1 424e1dfca06488f605b9611160020227ecdd03ac MD5 06d23c04b3d6ddfb1431c22ecc5b28f6
AUXFILE procmime.patch 977 RMD160 39a51a4d654759b15d1644a79fb6e8921130df3c 
        SHA1 d76929f6dfc2179281f7ccee5789aab4e970ba9e MD5 bf4c9cd9cb7cdc6ece7d4d327910f0cf
EBUILD sylpheed-claws-1.0.5-r1.ebuild 3906 RMD160 cdd546c128db2dea7044437de01ec96e12b4f5bf 
        SHA1 a84b49e76961d7a9100852b64c2bfbf9b053d45e MD5 b9fe79135a475458ef1b2240ee302ebd
EBUILD sylpheed-claws-1.9.100.ebuild 4444 RMD160 89326038bfc694dafd22f10400a08d3f930fb2bd 
        SHA1 8895342f3f0cc6fcbdd0fdada2ad8e23ce539d23 MD5 0643de736b42d8c0e1673e86ae0b7f80
EBUILD sylpheed-claws-1.9.15.ebuild 4821 RMD160 ec0ff811b893084459fe5b17b8ba8d6b35a55687 
        SHA1 358278a43da244e1f4803ec4b04d6fa45c41ab4d MD5 15b5c9348ba0b0a416892588256b4cbc
MISCFILE ChangeLog 25770 RMD160 0e69dd7425add1560d630dd3367342418e9be776 
        SHA1 1210160f7baf0319de3b1b58dc80d7680d316d28 MD5 732cdc3b41403a115970d497a9ec257e
MISCFILE metadata.xml 269 RMD160 39d775de55f9963f8946feaf088aa0324770bacb 
        SHA1 4fd7b285049d0e587f89e86becf06c0fd77bae6d MD5 82e806ed62f0596fb7bef493d225712f
SRCURI sylpheed-claws-1.0.5.tar.bz2 3268626 RMD160 f2708b5d69bc9a5025812511fde04eca7782e367 
        SHA1 d351d7043eef7a875df18a8c4b9464be49e2164b MD5 ef4a1a7beb407dc7c31b4799bc48f12e
SRCURI sylpheed-claws-1.9.100.tar.bz2 3480063 RMD160 72fbcbcc05d966f34897efcc1c96377420dc5544 
        SHA1 47465662b5470af5711493ce4eaad764c5bf02ca MD5 863c314557f90f17c2f6d6a0ab57e6c2
SRCURI sylpheed-claws-1.9.15.tar.bz2 3481018 RMD160 b01d1af2df55806a8a8275102b10e389e0d98e94 
        SHA1 a17fc64b8dcc5b56432e5beb5c826913cb3ad79e MD5 0d187526e0eca23b87ffa4981f7e1765

Compability Entries

To maintain compability with existing portage versions a transition period after is the introduction of the Manifest2 format is required during which portage will not only have to be capable of using existing Manifest and digest files but also generate them in addition to the new entries. Fortunately this can be accomplished by simply mixing old and new style entries in one file for the Manifest files, existing portage versions will simply ignore the new style entries. For the digest files there are no new entries to care about.

Scope

It is important to note that this proposal only deals with a change of the format of the digest and Manifest system.

It does not expand the scope of it to cover eclasses, profiles or anything else not already covered by the Manifest system, it also doesn't affect the Manifest signing efforts in any way (though the implementations of both might be coupled).

Also while multiple hash functions will become standard with the proposed implementation they are not a specific feature of this format [2].

Rationale

The main goals of the proposal have been listed in the Motivation, here now the explanation why they are improvements and how the proposed format will accomplish them.

Removal of digest files

Normal users that don't use a "tuned" filesystem for the portage tree are wasting several dozen to a few hundred megabytes of discspace with the current system, largely caused by the digest files. This is due to the filesystem overhead present in most filesystem that have a standard blocksize of four kilobytes while most digest files are under one kilobyte in size, so this results in approximately a waste of three kilobytes per digest file (likely even more). At the time of this writing the tree contains roughly 22.000 digest files, so the overall waste caused by digest files is estimated at about 70-100 megabytes. Furthermore it is assumed that this will also reduce the discspace wasted by the Manifest files as they now contain more content, but this hasn't been verified yet.

By unifying the digest files with the Manifest these tiny files are eliminated (in the long run), reducing the apparent tree size by about 20%, benefitting both users and the Gentoo infrastructure.

Reducing redundancy

When multiple hashes are used with the current system both the filename and filesize are repeated for every checksum type used as each checksum is standalone. However this doesn't add any functionality and is therefore useless, so the new format removes this redundancy. This is a theoretical improvement at this moment as only one hash function is in use, but expected to change soon (see [2]).

Removal of checksum collisions

The current system theoretically allows for a SRCURI type file to be recorded in multiple digest files with different sizes and/or checksums. In such a case one version of a package would report a checksum violation while another one would not. This could create confusion and uncertainity among users. So far this case hasn't been observed, but it can't be ruled out with the existing system. As the new format lists each file exactly once this would be no longer possible.

Flexible verification system

Right now portage verifies the checksum of every file listed in the Manifest before using any file of the package and all SRCURI files of an ebuild before using that ebuild. This is unnecessary in many cases:

Backwards Compatibility

Switching the Manifest system is a task that will need a long transition period like most changes affecting both portage and the tree. In this case the implementation will be rolled out in several phases:

  1. Add support for verification of Manifest2 entries in portage
  2. Enable generation of Manifest2 entries in addition to the current system
  3. Ignore digests during emerge --sync to get the size-benefit clientside. This step may be ommitted if the following steps are expected to follow soon.
  4. Disable generation of entries for the current system
  5. Remove all traces of the current system from the tree (serverside)

Each step has its own issues. While 1) and 2) can be implemented without any compability problems all later steps have a major impact:

Another problem is that some steps affect different targets:

While it is relatively easy to get all devs to use a new portage version this is practically impossible with users as some don't update their systems regulary. While six months are probably sufficient to reach a 95% coverage one year is estimated to reach an almost-complete coverage. All times are relative to the stable-marking of a compatible portage version.

No timeframe for implementation is presented here as it is highly dependent on the completion of each step.

In summary it can be said that while a full conversion will take over a year to be completed due to compability issues mentioned above some benefits of the system can be selectively be used as soon as step 2) is completed.

Other problems

Impacts on infrastructure

While one long term goal of this proposal is to reduce the size of the tree and therefore make life for the Gentoo Infrastructure this will only take effect once the implementation is rolled out completely. In the meantime however it will increase the tree size due to keeping checksums in both formats. It's not possible to give a usable estimate on the degree of the increase as it depends on many variables such as the exact implementation timeframe, propagation of Manifest2 capable portage versions among devs or the update rate of the tree. It has been suggested that Manifest files that are not gpg signed could be mass converted in one step, this could certainly help but only to some degree (according to a recent research [3] about 40% of all Manifests in the tree are signed, but this number hasn't been verified).

Reference Implementation

A patch for a prototype implementation of Manifest2 verification and partial generation has been posted at [4], it will be reworked before being considered for inclusion in portage. However it shows that adding support for verification is quite simple, but generation is a bit tricky and will therefore be implemented later.

Options

Some things have been considered for this GLEP but aren't part of the proposal yet for various reasons:

Credits

Thanks to the following persons for their input on or related to this GLEP (even though they might not have known it): Ned Ludd (solar), Brian Harring (ferringb), Jason Stubbs (jstubbs), Robin H. Johnson (robbat2), Aron Griffis (agriffis)

Also thanks to Nicholas Jones (carpaski) to make the current Manifest system resistent enough to be able to handle this change without too many transition problems.

References

[1]http://thread.gmane.org/gmane.linux.gentoo.devel/21920
[2](1, 2) http://thread.gmane.org/gmane.linux.gentoo.devel/33434
[3]gentoo-core mailing list, topic "Gentoo key signing practices and official Gentoo keyring", Message-ID <20051117075838.GB15734@curie-int.vc.shawcable.net>
[4]http://thread.gmane.org/gmane.linux.gentoo.portage.devel/1374