Contents of /xml/htdocs/proj/en/glep/glep-0044.txt

Parent Directory Parent Directory | Revision Log Revision Log

Revision 1.5 - (hide annotations) (download)
Fri Feb 10 23:30:46 2006 UTC (9 years, 7 months ago) by genone
Branch: MAIN
Changes since 1.4: +3 -3 lines
File MIME type: text/plain
change example filename to get past the QA checker

1 g2boojum 1.1 GLEP: 44
2     Title: Manifest2 format
3 genone 1.5 Version: $Revision: 1.4 $
4     Last-Modified: $Date: 2006/02/10 23:26:55 $
5 g2boojum 1.1 Author: Marius Mauch <genone@gentoo.org>,
6     Status: Draft
7     Type: Standards Track
8     Content-Type: text/x-rst
9     Created: 04-Dec-2005
10 genone 1.3 Post-History: 06-Dec-2005, 23-Jan-2006
11 g2boojum 1.1
13     Abstract
14     ========
16     This GLEP proposes a new format for the Portage Manifest and digest file system
17     by unifying both filetypes into one to improve functional and non-functional
18     aspects of the Portage Tree.
21     Motivation
22     ==========
24     Please see [#reorg-thread]_ for a general overview.
25     The main long term goals of this proposal are to:
27     - Remove the tiny digest files from the tree. They are a major annoyance as on a
28 genone 1.3 typical configuration they waste a lot of disk space and the simple transmission
29 g2boojum 1.1 of the names for all digest files during a ``emerge --sync`` needs a substantial
30     amount of bandwidth.
31     - Reduce redundancy when multiple hash functions are used
32     - Remove potential for checksum collisions if a file is recorded in more than one
33     digest file
34     - Difference between filetypes for a more flexible verification system
37     Specification
38     =============
40     The new Manifest format would change the existing format in the following ways:
42     - Addition of a filetype specifier, currently planned are
44 genone 1.4 * ``AUX`` for files directly used by ebuilds (e.g. patches or initscripts),
45 g2boojum 1.1 located in the ``files/`` subdirectory
47     * ``EBUILD`` for all ebuilds
49 genone 1.4 * ``MISC`` for files not directly used by ebuilds like ``ChangeLog`` or
50 g2boojum 1.1 ``metadata.xml`` files
52 genone 1.4 * ``DIST`` for release tarballs recorded in the ``SRC_URI`` variable of an ebuild,
53 g2boojum 1.1 these were previously recorded in the digest files
55     Future portage improvements might extend this list (for example with types
56     relevant for eclasses or profiles)
58     - Only have one line per file listing all information instead of one line per
59     file and checksum type
61     - Remove the separated digest-* files in the ``files/`` subdirectory
63     Each line in the new format has the following format:
65     ::
67     <filetype> <filename> <filesize> <chksumtype1> <chksum1> ... <chksumtypen> <chksumn>
70     However theses entries will be stored in the existing Manifest files.
72 genone 1.4 An `actual example`__ for a (pure) Manifest2 file..
73 g2boojum 1.1
74 genone 1.5 .. __: glep-0044-extras/manifest2-example.txt
75 g2boojum 1.1
77     Compability Entries
78     -------------------
80     To maintain compability with existing portage versions a transition period after
81     is the introduction of the Manifest2 format is required during which portage
82     will not only have to be capable of using existing Manifest and digest files but
83     also generate them in addition to the new entries.
84     Fortunately this can be accomplished by simply mixing old and new style entries
85     in one file for the Manifest files, existing portage versions will simply ignore
86     the new style entries. For the digest files there are no new entries to care
87     about.
89     Scope
90     -----
92     It is important to note that this proposal only deals with a change of the
93     format of the digest and Manifest system.
95     It does not expand the scope of it to cover eclasses, profiles or anything
96     else not already covered by the Manifest system, it also doesn't affect
97     the Manifest signing efforts in any way (though the implementations of both
98     might be coupled).
100     Also while multiple hash functions will become standard with the proposed
101     implementation they are not a specific feature of this format [#multi-hash-thread]_.
103 genone 1.4 Number of hashes
104     ----------------
106     While using multiple hashes for each file is a major feature of this proposal
107     we have to make sure that the number of hashes listed is limited to avoid
108     an explosion of the Manifest size that would revert the main benefit of this proposal
109     (reduzing tree size). Therefore the number of hashes that will be generated
110     will be limited to three different hash functions. For compability though we
111     have to rely on at least one hash function to always be present, this proposal
112     suggest to use SHA1 for this purpose (as it is supposed to be more secure than MD5
113     and currently only SHA1 and MD5 are directly available in python, also MD5 doesn't
114     have any benefit in terms of compability).
115 g2boojum 1.1
116     Rationale
117     =========
119     The main goals of the proposal have been listed in the `Motivation`_, here now
120     the explanation why they are improvements and how the proposed format will
121     accomplish them.
123     Removal of digest files
124     -----------------------
126     Normal users that don't use a "tuned" filesystem for the portage tree are
127 genone 1.3 wasting several dozen to a few hundred megabytes of disk space with the current
128 g2boojum 1.1 system, largely caused by the digest files.
129     This is due to the filesystem overhead present in most filesystem that
130     have a standard blocksize of four kilobytes while most digest files are under
131     one kilobyte in size, so this results in approximately a waste of three kilobytes
132     per digest file (likely even more). At the time of this writing the tree contains
133     roughly 22.000 digest files, so the overall waste caused by digest files is
134     estimated at about 70-100 megabytes.
135 genone 1.3 Furthermore it is assumed that this will also reduce the disk space wasted by
136 g2boojum 1.1 the Manifest files as they now contain more content, but this hasn't been
137     verified yet.
139     By unifying the digest files with the Manifest these tiny files are eliminated
140     (in the long run), reducing the apparent tree size by about 20%, benefitting
141     both users and the Gentoo infrastructure.
143     Reducing redundancy
144     -------------------
146     When multiple hashes are used with the current system
147     both the filename and filesize are repeated for every checksum type used as each
148     checksum is standalone. However this doesn't add any functionality and is
149     therefore useless, so the new format removes this redundancy.
150     This is a theoretical improvement at this moment as only one hash function is in
151     use, but expected to change soon (see [#multi-hash-thread]_).
153     Removal of checksum collisions
154     ------------------------------
156 genone 1.4 The current system theoretically allows for a ``DIST`` type file to be recorded
157 g2boojum 1.1 in multiple digest files with different sizes and/or checksums. In such a case
158     one version of a package would report a checksum violation while another one
159     would not. This could create confusion and uncertainity among users.
160     So far this case hasn't been observed, but it can't be ruled out with the
161     existing system.
162     As the new format lists each file exactly once this would be no longer possible.
164     Flexible verification system
165     ----------------------------
167     Right now portage verifies the checksum of every file listed in the Manifest
168 genone 1.4 before using any file of the package and all ``DIST`` files of an ebuild
169 g2boojum 1.1 before using that ebuild. This is unnecessary in many cases:
171     - During the "depend" phase (when the ebuild metadata is generated) only
172     files of type ``EBUILD`` are used, so verifying the other types isn't
173     necessary. Theoretically it is possible for an ebuild to include other
174 genone 1.4 files like those of type ``AUX`` at this phase, but that would be a
175 g2boojum 1.1 major QA violation and should never occur, so it can be ignored here.
176     It is also not a security concern as the ebuild is verified before parsing
177     it, so each manipulation would show up.
179 genone 1.4 - Generally files of type ``MISC`` don't need to be verified as they are
180 g2boojum 1.1 only used in very specific situations, aren't executed (just parsed at most)
181     and don't affect the package build process.
183 genone 1.4 - Files of type ``DIST`` only need to be verified directly after fetching and
184 g2boojum 1.1 before unpacking them (which often will be one step), not every time their
185     associated ebuild is used.
188     Backwards Compatibility
189     =======================
191     Switching the Manifest system is a task that will need a long transition period
192     like most changes affecting both portage and the tree. In this case the
193     implementation will be rolled out in several phases:
195     1. Add support for verification of Manifest2 entries in portage
197     2. Enable generation of Manifest2 entries in addition to the current system
199     3. Ignore digests during ``emerge --sync`` to get the size-benefit clientside.
200     This step may be ommitted if the following steps are expected to follow soon.
202     4. Disable generation of entries for the current system
204     5. Remove all traces of the current system from the tree (serverside)
206     Each step has its own issues. While 1) and 2) can be implemented without any
207     compability problems all later steps have a major impact:
209     - Step 3) can only be implemented when the whole tree is Manifest2 ready
210     (ideally speaking, practically the requirement will be more like 95% coverage
211     with the expectation that for the remaining 5% either bugs will be filed after
212     step 3) is completed or they'll be updated at step 5).
214     - Steps 4) and 5) will render all portage versions without Manifest2 support
215     basically useless (users would have to regenerate the digest and Manifest
216     for each package before being able to merge it), so this requires a almost
217     100% coverage of the userbase with Manifest2 capabale portage versions
218     (with step 1) completely implemented).
220     Another problem is that some steps affect different targets:
222     - Steps 1) and 3) target portage versions used by users
224     - Steps 2) and 4) target portage versions used by devs
226     - Step 5) targets the portage tree on the cvs server
228     While it is relatively easy to get all devs to use a new portage version this is
229     practically impossible with users as some don't update their systems regulary.
230     While six months are probably sufficient to reach a 95% coverage one year is
231     estimated to reach an almost-complete coverage. All times are relative to the
232     stable-marking of a compatible portage version.
234     No timeframe for implementation is presented here as it is highly dependent
235     on the completion of each step.
237     In summary it can be said that while a full conversion will take over a year
238     to be completed due to compability issues mentioned above some benefits of the
239 genone 1.2 system can selectively be used as soon as step 2) is completed.
240 g2boojum 1.1
242     Other problems
243     ==============
245     Impacts on infrastructure
246     -------------------------
248     While one long term goal of this proposal is to reduce the size of the tree
249 genone 1.2 and therefore make life for the Gentoo Infrastructure easier this will only
250     take effect once the implementation is rolled out completely. In the meantime
251     however it will increase the tree size due to keeping checksums in both formats.
252     It's not possible to give a usable estimate on the degree of the increase as
253     it depends on many variables such as the exact implementation timeframe,
254     propagation of Manifest2 capable portage versions among devs or the update
255     rate of the tree. It has been suggested that Manifest files that are not gpg
256     signed could be mass converted in one step, this could certainly help but only
257     to some degree (according to a recent research [#gpg-numbers]_ about 40% of
258     all Manifests in the tree are signed, but this number hasn't been verified).
259 g2boojum 1.1
261     Reference Implementation
262     ========================
264     A patch for a prototype implementation of Manifest2 verification and partial
265     generation has been posted at [#manifest2-patch]_, it will be reworked before
266     being considered for inclusion in portage. However it shows that adding support
267     for verification is quite simple, but generation is a bit tricky and will
268     therefore be implemented later.
271     Options
272     =======
274     Some things have been considered for this GLEP but aren't part of the proposal
275     yet for various reasons:
277     - timestamp field: the author has considered adding a timestamp field for
278     each entry to list the time the entry was created. However so far no practical
279     use for such a feature has been found.
281     - convert size field into checksum: Another idea was to treat the size field
282     like any other checksum. But so far no real benefit (other than a slightly
283     more modular implementation) for this has been seen while it has several
284     drawbacks: For once, unlike checksums, the size field is definitely required
285 genone 1.4 for all ``DIST`` files, also it would slightly increase the length of
286 g2boojum 1.1 each entry by adding a ``SIZE`` keyword.
288 genone 1.4 - removal of the ``MISC`` type: It has been suggested to completely drop
289     entries of type ``MISC``. This would result in a minor space reduction
290 g2boojum 1.1 (its rather unlikely to free any blocks) but completely remove the ability
291     to check these files for integrity. While they don't influence portage
292     or packages directly they can contain viable information for users, so
293     the author has the opinion that at least the option for integrity checks
294     should be kept.
296     Credits
297     =======
299     Thanks to the following persons for their input on or related to this GLEP
300     (even though they might not have known it):
301     Ned Ludd (solar), Brian Harring (ferringb), Jason Stubbs (jstubbs),
302     Robin H. Johnson (robbat2), Aron Griffis (agriffis)
304     Also thanks to Nicholas Jones (carpaski) to make the current Manifest system
305     resistent enough to be able to handle this change without too many transition
306     problems.
308     References
309     ==========
311     .. [#reorg-thread] http://thread.gmane.org/gmane.linux.gentoo.devel/21920
313     .. [#multi-hash-thread] http://thread.gmane.org/gmane.linux.gentoo.devel/33434
315     .. [#gpg-numbers] gentoo-core mailing list, topic "Gentoo key signing practices
316     and official Gentoo keyring", Message-ID <20051117075838.GB15734@curie-int.vc.shawcable.net>
318     .. [#manifest2-patch] http://thread.gmane.org/gmane.linux.gentoo.portage.devel/1374
320 genone 1.4 .. [#manifest2-example] http://www.gentoo.org/proj/en/glep/glep-0044-extras/manifest2-example
322 g2boojum 1.1 Copyright
323     =========
325     This document has been placed in the public domain.

  ViewVC Help
Powered by ViewVC 1.1.20