/[gentoo]/xml/htdocs/proj/en/glep/glep-0044.txt
Gentoo

Contents of /xml/htdocs/proj/en/glep/glep-0044.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.3 - (hide annotations) (download)
Mon Jan 23 10:24:24 2006 UTC (8 years, 5 months ago) by genone
Branch: MAIN
Changes since 1.2: +14 -14 lines
File MIME type: text/plain
s/SRCURI/DISTFILE/ and a few spelling errors

1 g2boojum 1.1 GLEP: 44
2     Title: Manifest2 format
3 genone 1.3 Version: $Revision: 1.2 $
4     Last-Modified: $Date: 2005/12/06 16:19:37 $
5 g2boojum 1.1 Author: Marius Mauch <genone@gentoo.org>,
6     Status: Draft
7     Type: Standards Track
8     Content-Type: text/x-rst
9     Created: 04-Dec-2005
10 genone 1.3 Post-History: 06-Dec-2005, 23-Jan-2006
11 g2boojum 1.1
12    
13     Abstract
14     ========
15    
16     This GLEP proposes a new format for the Portage Manifest and digest file system
17     by unifying both filetypes into one to improve functional and non-functional
18     aspects of the Portage Tree.
19    
20    
21     Motivation
22     ==========
23    
24     Please see [#reorg-thread]_ for a general overview.
25     The main long term goals of this proposal are to:
26    
27     - Remove the tiny digest files from the tree. They are a major annoyance as on a
28 genone 1.3 typical configuration they waste a lot of disk space and the simple transmission
29 g2boojum 1.1 of the names for all digest files during a ``emerge --sync`` needs a substantial
30     amount of bandwidth.
31     - Reduce redundancy when multiple hash functions are used
32     - Remove potential for checksum collisions if a file is recorded in more than one
33     digest file
34     - Difference between filetypes for a more flexible verification system
35    
36    
37     Specification
38     =============
39    
40     The new Manifest format would change the existing format in the following ways:
41    
42     - Addition of a filetype specifier, currently planned are
43    
44     * ``AUXFILE`` for files directly used by ebuilds (e.g. patches or initscripts),
45     located in the ``files/`` subdirectory
46    
47     * ``EBUILD`` for all ebuilds
48    
49     * ``MISCFILE`` for files not directly used by ebuilds like ``ChangeLog`` or
50     ``metadata.xml`` files
51    
52 genone 1.3 * ``DISTFILE`` for release tarballs recorded in the ``SRC_URI`` variable of an ebuild,
53 g2boojum 1.1 these were previously recorded in the digest files
54    
55     Future portage improvements might extend this list (for example with types
56     relevant for eclasses or profiles)
57    
58     - Only have one line per file listing all information instead of one line per
59     file and checksum type
60    
61     - Remove the separated digest-* files in the ``files/`` subdirectory
62    
63     Each line in the new format has the following format:
64    
65     ::
66    
67     <filetype> <filename> <filesize> <chksumtype1> <chksum1> ... <chksumtypen> <chksumn>
68    
69    
70     However theses entries will be stored in the existing Manifest files.
71    
72     An actual example for a (pure) Manifest2 file could look like this (using
73     indentation to indicate line continuation):
74    
75     ::
76    
77     AUXFILE ldif-buffer-overflow-fix.diff 5007 RMD160 1354a6bd2687430b628b78aaf43f5c793d2f0704
78     SHA1 424e1dfca06488f605b9611160020227ecdd03ac MD5 06d23c04b3d6ddfb1431c22ecc5b28f6
79     AUXFILE procmime.patch 977 RMD160 39a51a4d654759b15d1644a79fb6e8921130df3c
80     SHA1 d76929f6dfc2179281f7ccee5789aab4e970ba9e MD5 bf4c9cd9cb7cdc6ece7d4d327910f0cf
81     EBUILD sylpheed-claws-1.0.5-r1.ebuild 3906 RMD160 cdd546c128db2dea7044437de01ec96e12b4f5bf
82     SHA1 a84b49e76961d7a9100852b64c2bfbf9b053d45e MD5 b9fe79135a475458ef1b2240ee302ebd
83     EBUILD sylpheed-claws-1.9.100.ebuild 4444 RMD160 89326038bfc694dafd22f10400a08d3f930fb2bd
84     SHA1 8895342f3f0cc6fcbdd0fdada2ad8e23ce539d23 MD5 0643de736b42d8c0e1673e86ae0b7f80
85     EBUILD sylpheed-claws-1.9.15.ebuild 4821 RMD160 ec0ff811b893084459fe5b17b8ba8d6b35a55687
86     SHA1 358278a43da244e1f4803ec4b04d6fa45c41ab4d MD5 15b5c9348ba0b0a416892588256b4cbc
87     MISCFILE ChangeLog 25770 RMD160 0e69dd7425add1560d630dd3367342418e9be776
88     SHA1 1210160f7baf0319de3b1b58dc80d7680d316d28 MD5 732cdc3b41403a115970d497a9ec257e
89     MISCFILE metadata.xml 269 RMD160 39d775de55f9963f8946feaf088aa0324770bacb
90     SHA1 4fd7b285049d0e587f89e86becf06c0fd77bae6d MD5 82e806ed62f0596fb7bef493d225712f
91 genone 1.3 DISTFILE sylpheed-claws-1.0.5.tar.bz2 3268626 RMD160 f2708b5d69bc9a5025812511fde04eca7782e367
92 g2boojum 1.1 SHA1 d351d7043eef7a875df18a8c4b9464be49e2164b MD5 ef4a1a7beb407dc7c31b4799bc48f12e
93 genone 1.3 DISTFILE sylpheed-claws-1.9.100.tar.bz2 3480063 RMD160 72fbcbcc05d966f34897efcc1c96377420dc5544
94 g2boojum 1.1 SHA1 47465662b5470af5711493ce4eaad764c5bf02ca MD5 863c314557f90f17c2f6d6a0ab57e6c2
95 genone 1.3 DISTFILE sylpheed-claws-1.9.15.tar.bz2 3481018 RMD160 b01d1af2df55806a8a8275102b10e389e0d98e94
96 g2boojum 1.1 SHA1 a17fc64b8dcc5b56432e5beb5c826913cb3ad79e MD5 0d187526e0eca23b87ffa4981f7e1765
97    
98    
99     Compability Entries
100     -------------------
101    
102     To maintain compability with existing portage versions a transition period after
103     is the introduction of the Manifest2 format is required during which portage
104     will not only have to be capable of using existing Manifest and digest files but
105     also generate them in addition to the new entries.
106     Fortunately this can be accomplished by simply mixing old and new style entries
107     in one file for the Manifest files, existing portage versions will simply ignore
108     the new style entries. For the digest files there are no new entries to care
109     about.
110    
111     Scope
112     -----
113    
114     It is important to note that this proposal only deals with a change of the
115     format of the digest and Manifest system.
116    
117     It does not expand the scope of it to cover eclasses, profiles or anything
118     else not already covered by the Manifest system, it also doesn't affect
119     the Manifest signing efforts in any way (though the implementations of both
120     might be coupled).
121    
122     Also while multiple hash functions will become standard with the proposed
123     implementation they are not a specific feature of this format [#multi-hash-thread]_.
124    
125    
126     Rationale
127     =========
128    
129     The main goals of the proposal have been listed in the `Motivation`_, here now
130     the explanation why they are improvements and how the proposed format will
131     accomplish them.
132    
133     Removal of digest files
134     -----------------------
135    
136     Normal users that don't use a "tuned" filesystem for the portage tree are
137 genone 1.3 wasting several dozen to a few hundred megabytes of disk space with the current
138 g2boojum 1.1 system, largely caused by the digest files.
139     This is due to the filesystem overhead present in most filesystem that
140     have a standard blocksize of four kilobytes while most digest files are under
141     one kilobyte in size, so this results in approximately a waste of three kilobytes
142     per digest file (likely even more). At the time of this writing the tree contains
143     roughly 22.000 digest files, so the overall waste caused by digest files is
144     estimated at about 70-100 megabytes.
145 genone 1.3 Furthermore it is assumed that this will also reduce the disk space wasted by
146 g2boojum 1.1 the Manifest files as they now contain more content, but this hasn't been
147     verified yet.
148    
149     By unifying the digest files with the Manifest these tiny files are eliminated
150     (in the long run), reducing the apparent tree size by about 20%, benefitting
151     both users and the Gentoo infrastructure.
152    
153     Reducing redundancy
154     -------------------
155    
156     When multiple hashes are used with the current system
157     both the filename and filesize are repeated for every checksum type used as each
158     checksum is standalone. However this doesn't add any functionality and is
159     therefore useless, so the new format removes this redundancy.
160     This is a theoretical improvement at this moment as only one hash function is in
161     use, but expected to change soon (see [#multi-hash-thread]_).
162    
163     Removal of checksum collisions
164     ------------------------------
165    
166 genone 1.3 The current system theoretically allows for a ``DISTFILE`` type file to be recorded
167 g2boojum 1.1 in multiple digest files with different sizes and/or checksums. In such a case
168     one version of a package would report a checksum violation while another one
169     would not. This could create confusion and uncertainity among users.
170     So far this case hasn't been observed, but it can't be ruled out with the
171     existing system.
172     As the new format lists each file exactly once this would be no longer possible.
173    
174     Flexible verification system
175     ----------------------------
176    
177     Right now portage verifies the checksum of every file listed in the Manifest
178 genone 1.3 before using any file of the package and all ``DISTFILE`` files of an ebuild
179 g2boojum 1.1 before using that ebuild. This is unnecessary in many cases:
180    
181     - During the "depend" phase (when the ebuild metadata is generated) only
182     files of type ``EBUILD`` are used, so verifying the other types isn't
183     necessary. Theoretically it is possible for an ebuild to include other
184     files like those of type ``AUXFILE`` at this phase, but that would be a
185     major QA violation and should never occur, so it can be ignored here.
186     It is also not a security concern as the ebuild is verified before parsing
187     it, so each manipulation would show up.
188    
189     - Generally files of type ``MISCFILE`` don't need to be verified as they are
190     only used in very specific situations, aren't executed (just parsed at most)
191     and don't affect the package build process.
192    
193 genone 1.3 - Files of type ``DISTFILE`` only need to be verified directly after fetching and
194 g2boojum 1.1 before unpacking them (which often will be one step), not every time their
195     associated ebuild is used.
196    
197    
198     Backwards Compatibility
199     =======================
200    
201     Switching the Manifest system is a task that will need a long transition period
202     like most changes affecting both portage and the tree. In this case the
203     implementation will be rolled out in several phases:
204    
205     1. Add support for verification of Manifest2 entries in portage
206    
207     2. Enable generation of Manifest2 entries in addition to the current system
208    
209     3. Ignore digests during ``emerge --sync`` to get the size-benefit clientside.
210     This step may be ommitted if the following steps are expected to follow soon.
211    
212     4. Disable generation of entries for the current system
213    
214     5. Remove all traces of the current system from the tree (serverside)
215    
216     Each step has its own issues. While 1) and 2) can be implemented without any
217     compability problems all later steps have a major impact:
218    
219     - Step 3) can only be implemented when the whole tree is Manifest2 ready
220     (ideally speaking, practically the requirement will be more like 95% coverage
221     with the expectation that for the remaining 5% either bugs will be filed after
222     step 3) is completed or they'll be updated at step 5).
223    
224     - Steps 4) and 5) will render all portage versions without Manifest2 support
225     basically useless (users would have to regenerate the digest and Manifest
226     for each package before being able to merge it), so this requires a almost
227     100% coverage of the userbase with Manifest2 capabale portage versions
228     (with step 1) completely implemented).
229    
230     Another problem is that some steps affect different targets:
231    
232     - Steps 1) and 3) target portage versions used by users
233    
234     - Steps 2) and 4) target portage versions used by devs
235    
236     - Step 5) targets the portage tree on the cvs server
237    
238     While it is relatively easy to get all devs to use a new portage version this is
239     practically impossible with users as some don't update their systems regulary.
240     While six months are probably sufficient to reach a 95% coverage one year is
241     estimated to reach an almost-complete coverage. All times are relative to the
242     stable-marking of a compatible portage version.
243    
244     No timeframe for implementation is presented here as it is highly dependent
245     on the completion of each step.
246    
247     In summary it can be said that while a full conversion will take over a year
248     to be completed due to compability issues mentioned above some benefits of the
249 genone 1.2 system can selectively be used as soon as step 2) is completed.
250 g2boojum 1.1
251    
252     Other problems
253     ==============
254    
255     Impacts on infrastructure
256     -------------------------
257    
258     While one long term goal of this proposal is to reduce the size of the tree
259 genone 1.2 and therefore make life for the Gentoo Infrastructure easier this will only
260     take effect once the implementation is rolled out completely. In the meantime
261     however it will increase the tree size due to keeping checksums in both formats.
262     It's not possible to give a usable estimate on the degree of the increase as
263     it depends on many variables such as the exact implementation timeframe,
264     propagation of Manifest2 capable portage versions among devs or the update
265     rate of the tree. It has been suggested that Manifest files that are not gpg
266     signed could be mass converted in one step, this could certainly help but only
267     to some degree (according to a recent research [#gpg-numbers]_ about 40% of
268     all Manifests in the tree are signed, but this number hasn't been verified).
269 g2boojum 1.1
270    
271     Reference Implementation
272     ========================
273    
274     A patch for a prototype implementation of Manifest2 verification and partial
275     generation has been posted at [#manifest2-patch]_, it will be reworked before
276     being considered for inclusion in portage. However it shows that adding support
277     for verification is quite simple, but generation is a bit tricky and will
278     therefore be implemented later.
279    
280    
281     Options
282     =======
283    
284     Some things have been considered for this GLEP but aren't part of the proposal
285     yet for various reasons:
286    
287     - timestamp field: the author has considered adding a timestamp field for
288     each entry to list the time the entry was created. However so far no practical
289     use for such a feature has been found.
290    
291     - convert size field into checksum: Another idea was to treat the size field
292     like any other checksum. But so far no real benefit (other than a slightly
293     more modular implementation) for this has been seen while it has several
294     drawbacks: For once, unlike checksums, the size field is definitely required
295 genone 1.3 for all ``DISTFILE`` files, also it would slightly increase the length of
296 g2boojum 1.1 each entry by adding a ``SIZE`` keyword.
297    
298     - removal of the ``MISCFILE`` type: It has been suggested to completely drop
299     entries of type ``MISCFILE``. This would result in a minor space reduction
300     (its rather unlikely to free any blocks) but completely remove the ability
301     to check these files for integrity. While they don't influence portage
302     or packages directly they can contain viable information for users, so
303     the author has the opinion that at least the option for integrity checks
304     should be kept.
305    
306     Credits
307     =======
308    
309     Thanks to the following persons for their input on or related to this GLEP
310     (even though they might not have known it):
311     Ned Ludd (solar), Brian Harring (ferringb), Jason Stubbs (jstubbs),
312     Robin H. Johnson (robbat2), Aron Griffis (agriffis)
313    
314     Also thanks to Nicholas Jones (carpaski) to make the current Manifest system
315     resistent enough to be able to handle this change without too many transition
316     problems.
317    
318     References
319     ==========
320    
321     .. [#reorg-thread] http://thread.gmane.org/gmane.linux.gentoo.devel/21920
322    
323     .. [#multi-hash-thread] http://thread.gmane.org/gmane.linux.gentoo.devel/33434
324    
325     .. [#gpg-numbers] gentoo-core mailing list, topic "Gentoo key signing practices
326     and official Gentoo keyring", Message-ID <20051117075838.GB15734@curie-int.vc.shawcable.net>
327    
328     .. [#manifest2-patch] http://thread.gmane.org/gmane.linux.gentoo.portage.devel/1374
329    
330     Copyright
331     =========
332    
333     This document has been placed in the public domain.

  ViewVC Help
Powered by ViewVC 1.1.20