/[gentoo]/xml/htdocs/proj/en/glep/glep-0044.txt
Gentoo

Diff of /xml/htdocs/proj/en/glep/glep-0044.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

Revision 1.1 Revision 1.7
1GLEP: 44 1GLEP: 44
2Title: Manifest2 format 2Title: Manifest2 format
3Version: $Revision: 1.1 $ 3Version: $Revision: 1.7 $
4Last-Modified: $Date: 2005/12/06 03:34:21 $ 4Last-Modified: $Date: 2006/10/14 02:55:39 $
5Author: Marius Mauch <genone@gentoo.org>, 5Author: Marius Mauch <genone@gentoo.org>,
6Status: Draft 6Status: Accepted
7Type: Standards Track 7Type: Standards Track
8Content-Type: text/x-rst 8Content-Type: text/x-rst
9Created: 04-Dec-2005 9Created: 04-Dec-2005
10Post-History: 05-Dec-2005 10Post-History: 06-Dec-2005, 23-Jan-2006, 3-Sep-2006
11 11
12 12
13Abstract 13Abstract
14======== 14========
15 15
23 23
24Please see [#reorg-thread]_ for a general overview. 24Please see [#reorg-thread]_ for a general overview.
25The main long term goals of this proposal are to: 25The main long term goals of this proposal are to:
26 26
27- Remove the tiny digest files from the tree. They are a major annoyance as on a 27- Remove the tiny digest files from the tree. They are a major annoyance as on a
28 typical configuration they waste a lot of discspace and the simple transmission 28 typical configuration they waste a lot of disk space and the simple transmission
29 of the names for all digest files during a ``emerge --sync`` needs a substantial 29 of the names for all digest files during a ``emerge --sync`` needs a substantial
30 amount of bandwidth. 30 amount of bandwidth.
31- Reduce redundancy when multiple hash functions are used 31- Reduce redundancy when multiple hash functions are used
32- Remove potential for checksum collisions if a file is recorded in more than one 32- Remove potential for checksum collisions if a file is recorded in more than one
33 digest file 33 digest file
39 39
40The new Manifest format would change the existing format in the following ways: 40The new Manifest format would change the existing format in the following ways:
41 41
42- Addition of a filetype specifier, currently planned are 42- Addition of a filetype specifier, currently planned are
43 43
44 * ``AUXFILE`` for files directly used by ebuilds (e.g. patches or initscripts), 44 * ``AUX`` for files directly used by ebuilds (e.g. patches or initscripts),
45 located in the ``files/`` subdirectory 45 located in the ``files/`` subdirectory
46 46
47 * ``EBUILD`` for all ebuilds 47 * ``EBUILD`` for all ebuilds
48 48
49 * ``MISCFILE`` for files not directly used by ebuilds like ``ChangeLog`` or 49 * ``MISC`` for files not directly used by ebuilds like ``ChangeLog`` or
50 ``metadata.xml`` files 50 ``metadata.xml`` files
51 51
52 * ``SRCURI`` for release tarballs recorded in the ``SRC_URI`` variable of an ebuild, 52 * ``DIST`` for release tarballs recorded in the ``SRC_URI`` variable of an ebuild,
53 these were previously recorded in the digest files 53 these were previously recorded in the digest files
54 54
55 Future portage improvements might extend this list (for example with types 55 Future portage improvements might extend this list (for example with types
56 relevant for eclasses or profiles) 56 relevant for eclasses or profiles)
57 57
67 <filetype> <filename> <filesize> <chksumtype1> <chksum1> ... <chksumtypen> <chksumn> 67 <filetype> <filename> <filesize> <chksumtype1> <chksum1> ... <chksumtypen> <chksumn>
68 68
69 69
70However theses entries will be stored in the existing Manifest files. 70However theses entries will be stored in the existing Manifest files.
71 71
72An actual example for a (pure) Manifest2 file could look like this (using 72An `actual example`__ for a (pure) Manifest2 file..
73indentation to indicate line continuation):
74 73
75:: 74.. __: glep-0044-extras/manifest2-example.txt
76
77 AUXFILE ldif-buffer-overflow-fix.diff 5007 RMD160 1354a6bd2687430b628b78aaf43f5c793d2f0704
78 SHA1 424e1dfca06488f605b9611160020227ecdd03ac MD5 06d23c04b3d6ddfb1431c22ecc5b28f6
79 AUXFILE procmime.patch 977 RMD160 39a51a4d654759b15d1644a79fb6e8921130df3c
80 SHA1 d76929f6dfc2179281f7ccee5789aab4e970ba9e MD5 bf4c9cd9cb7cdc6ece7d4d327910f0cf
81 EBUILD sylpheed-claws-1.0.5-r1.ebuild 3906 RMD160 cdd546c128db2dea7044437de01ec96e12b4f5bf
82 SHA1 a84b49e76961d7a9100852b64c2bfbf9b053d45e MD5 b9fe79135a475458ef1b2240ee302ebd
83 EBUILD sylpheed-claws-1.9.100.ebuild 4444 RMD160 89326038bfc694dafd22f10400a08d3f930fb2bd
84 SHA1 8895342f3f0cc6fcbdd0fdada2ad8e23ce539d23 MD5 0643de736b42d8c0e1673e86ae0b7f80
85 EBUILD sylpheed-claws-1.9.15.ebuild 4821 RMD160 ec0ff811b893084459fe5b17b8ba8d6b35a55687
86 SHA1 358278a43da244e1f4803ec4b04d6fa45c41ab4d MD5 15b5c9348ba0b0a416892588256b4cbc
87 MISCFILE ChangeLog 25770 RMD160 0e69dd7425add1560d630dd3367342418e9be776
88 SHA1 1210160f7baf0319de3b1b58dc80d7680d316d28 MD5 732cdc3b41403a115970d497a9ec257e
89 MISCFILE metadata.xml 269 RMD160 39d775de55f9963f8946feaf088aa0324770bacb
90 SHA1 4fd7b285049d0e587f89e86becf06c0fd77bae6d MD5 82e806ed62f0596fb7bef493d225712f
91 SRCURI sylpheed-claws-1.0.5.tar.bz2 3268626 RMD160 f2708b5d69bc9a5025812511fde04eca7782e367
92 SHA1 d351d7043eef7a875df18a8c4b9464be49e2164b MD5 ef4a1a7beb407dc7c31b4799bc48f12e
93 SRCURI sylpheed-claws-1.9.100.tar.bz2 3480063 RMD160 72fbcbcc05d966f34897efcc1c96377420dc5544
94 SHA1 47465662b5470af5711493ce4eaad764c5bf02ca MD5 863c314557f90f17c2f6d6a0ab57e6c2
95 SRCURI sylpheed-claws-1.9.15.tar.bz2 3481018 RMD160 b01d1af2df55806a8a8275102b10e389e0d98e94
96 SHA1 a17fc64b8dcc5b56432e5beb5c826913cb3ad79e MD5 0d187526e0eca23b87ffa4981f7e1765
97 75
98 76
99Compability Entries 77Compability Entries
100------------------- 78-------------------
101 79
120might be coupled). 98might be coupled).
121 99
122Also while multiple hash functions will become standard with the proposed 100Also while multiple hash functions will become standard with the proposed
123implementation they are not a specific feature of this format [#multi-hash-thread]_. 101implementation they are not a specific feature of this format [#multi-hash-thread]_.
124 102
103Number of hashes
104----------------
105
106While using multiple hashes for each file is a major feature of this proposal
107we have to make sure that the number of hashes listed is limited to avoid
108an explosion of the Manifest size that would revert the main benefit of this proposal
109(reduzing tree size). Therefore the number of hashes that will be generated
110will be limited to three different hash functions. For compability though we
111have to rely on at least one hash function to always be present, this proposal
112suggest to use SHA1 for this purpose (as it is supposed to be more secure than MD5
113and currently only SHA1 and MD5 are directly available in python, also MD5 doesn't
114have any benefit in terms of compability).
125 115
126Rationale 116Rationale
127========= 117=========
128 118
129The main goals of the proposal have been listed in the `Motivation`_, here now 119The main goals of the proposal have been listed in the `Motivation`_, here now
132 122
133Removal of digest files 123Removal of digest files
134----------------------- 124-----------------------
135 125
136Normal users that don't use a "tuned" filesystem for the portage tree are 126Normal users that don't use a "tuned" filesystem for the portage tree are
137wasting several dozen to a few hundred megabytes of discspace with the current 127wasting several dozen to a few hundred megabytes of disk space with the current
138system, largely caused by the digest files. 128system, largely caused by the digest files.
139This is due to the filesystem overhead present in most filesystem that 129This is due to the filesystem overhead present in most filesystem that
140have a standard blocksize of four kilobytes while most digest files are under 130have a standard blocksize of four kilobytes while most digest files are under
141one kilobyte in size, so this results in approximately a waste of three kilobytes 131one kilobyte in size, so this results in approximately a waste of three kilobytes
142per digest file (likely even more). At the time of this writing the tree contains 132per digest file (likely even more). At the time of this writing the tree contains
143roughly 22.000 digest files, so the overall waste caused by digest files is 133roughly 22.000 digest files, so the overall waste caused by digest files is
144estimated at about 70-100 megabytes. 134estimated at about 70-100 megabytes.
145Furthermore it is assumed that this will also reduce the discspace wasted by 135Furthermore it is assumed that this will also reduce the disk space wasted by
146the Manifest files as they now contain more content, but this hasn't been 136the Manifest files as they now contain more content, but this hasn't been
147verified yet. 137verified yet.
148 138
149By unifying the digest files with the Manifest these tiny files are eliminated 139By unifying the digest files with the Manifest these tiny files are eliminated
150(in the long run), reducing the apparent tree size by about 20%, benefitting 140(in the long run), reducing the apparent tree size by about 20%, benefitting
161use, but expected to change soon (see [#multi-hash-thread]_). 151use, but expected to change soon (see [#multi-hash-thread]_).
162 152
163Removal of checksum collisions 153Removal of checksum collisions
164------------------------------ 154------------------------------
165 155
166The current system theoretically allows for a ``SRCURI`` type file to be recorded 156The current system theoretically allows for a ``DIST`` type file to be recorded
167in multiple digest files with different sizes and/or checksums. In such a case 157in multiple digest files with different sizes and/or checksums. In such a case
168one version of a package would report a checksum violation while another one 158one version of a package would report a checksum violation while another one
169would not. This could create confusion and uncertainity among users. 159would not. This could create confusion and uncertainity among users.
170So far this case hasn't been observed, but it can't be ruled out with the 160So far this case hasn't been observed, but it can't be ruled out with the
171existing system. 161existing system.
173 163
174Flexible verification system 164Flexible verification system
175---------------------------- 165----------------------------
176 166
177Right now portage verifies the checksum of every file listed in the Manifest 167Right now portage verifies the checksum of every file listed in the Manifest
178before using any file of the package and all ``SRCURI`` files of an ebuild 168before using any file of the package and all ``DIST`` files of an ebuild
179before using that ebuild. This is unnecessary in many cases: 169before using that ebuild. This is unnecessary in many cases:
180 170
181- During the "depend" phase (when the ebuild metadata is generated) only 171- During the "depend" phase (when the ebuild metadata is generated) only
182 files of type ``EBUILD`` are used, so verifying the other types isn't 172 files of type ``EBUILD`` are used, so verifying the other types isn't
183 necessary. Theoretically it is possible for an ebuild to include other 173 necessary. Theoretically it is possible for an ebuild to include other
184 files like those of type ``AUXFILE`` at this phase, but that would be a 174 files like those of type ``AUX`` at this phase, but that would be a
185 major QA violation and should never occur, so it can be ignored here. 175 major QA violation and should never occur, so it can be ignored here.
186 It is also not a security concern as the ebuild is verified before parsing 176 It is also not a security concern as the ebuild is verified before parsing
187 it, so each manipulation would show up. 177 it, so each manipulation would show up.
188 178
189- Generally files of type ``MISCFILE`` don't need to be verified as they are 179- Generally files of type ``MISC`` don't need to be verified as they are
190 only used in very specific situations, aren't executed (just parsed at most) 180 only used in very specific situations, aren't executed (just parsed at most)
191 and don't affect the package build process. 181 and don't affect the package build process.
192 182
193- Files of type ``SRCURI`` only need to be verified directly after fetching and 183- Files of type ``DIST`` only need to be verified directly after fetching and
194 before unpacking them (which often will be one step), not every time their 184 before unpacking them (which often will be one step), not every time their
195 associated ebuild is used. 185 associated ebuild is used.
196 186
197 187
198Backwards Compatibility 188Backwards Compatibility
244No timeframe for implementation is presented here as it is highly dependent 234No timeframe for implementation is presented here as it is highly dependent
245on the completion of each step. 235on the completion of each step.
246 236
247In summary it can be said that while a full conversion will take over a year 237In summary it can be said that while a full conversion will take over a year
248to be completed due to compability issues mentioned above some benefits of the 238to be completed due to compability issues mentioned above some benefits of the
249system can be selectively be used as soon as step 2) is completed. 239system can selectively be used as soon as step 2) is completed.
250 240
251 241
252Other problems 242Other problems
253============== 243==============
254 244
255Impacts on infrastructure 245Impacts on infrastructure
256------------------------- 246-------------------------
257 247
258While one long term goal of this proposal is to reduce the size of the tree 248While one long term goal of this proposal is to reduce the size of the tree
259and therefore make life for the Gentoo Infrastructure this will only take effect 249and therefore make life for the Gentoo Infrastructure easier this will only
260once the implementation is rolled out completely. In the meantime however it 250take effect once the implementation is rolled out completely. In the meantime
261will increase the tree size due to keeping checksums in both formats. It's not 251however it will increase the tree size due to keeping checksums in both formats.
262possible to give a usable estimate on the degree of the increase as it depends 252It's not possible to give a usable estimate on the degree of the increase as
263on many variables such as the exact implementation timeframe, propagation of 253it depends on many variables such as the exact implementation timeframe,
264Manifest2 capable portage versions among devs or the update rate of the tree. 254propagation of Manifest2 capable portage versions among devs or the update
265It has been suggested that Manifest files that are not gpg signed could be 255rate of the tree. It has been suggested that Manifest files that are not gpg
266mass converted in one step, this could certainly help but only to some degree 256signed could be mass converted in one step, this could certainly help but only
267(according to a recent research [#gpg-numbers]_ about 40% of all Manifests in 257to some degree (according to a recent research [#gpg-numbers]_ about 40% of
268the tree are signed, but this number hasn't been verified). 258all Manifests in the tree are signed, but this number hasn't been verified).
269 259
270 260
271Reference Implementation 261Reference Implementation
272======================== 262========================
273 263
290 280
291- convert size field into checksum: Another idea was to treat the size field 281- convert size field into checksum: Another idea was to treat the size field
292 like any other checksum. But so far no real benefit (other than a slightly 282 like any other checksum. But so far no real benefit (other than a slightly
293 more modular implementation) for this has been seen while it has several 283 more modular implementation) for this has been seen while it has several
294 drawbacks: For once, unlike checksums, the size field is definitely required 284 drawbacks: For once, unlike checksums, the size field is definitely required
295 for all ``SRCURI`` files, also it would slightly increase the length of 285 for all ``DIST`` files, also it would slightly increase the length of
296 each entry by adding a ``SIZE`` keyword. 286 each entry by adding a ``SIZE`` keyword.
297 287
298- removal of the ``MISCFILE`` type: It has been suggested to completely drop 288- removal of the ``MISC`` type: It has been suggested to completely drop
299 entries of type ``MISCFILE``. This would result in a minor space reduction 289 entries of type ``MISC``. This would result in a minor space reduction
300 (its rather unlikely to free any blocks) but completely remove the ability 290 (its rather unlikely to free any blocks) but completely remove the ability
301 to check these files for integrity. While they don't influence portage 291 to check these files for integrity. While they don't influence portage
302 or packages directly they can contain viable information for users, so 292 or packages directly they can contain viable information for users, so
303 the author has the opinion that at least the option for integrity checks 293 the author has the opinion that at least the option for integrity checks
304 should be kept. 294 should be kept.
325.. [#gpg-numbers] gentoo-core mailing list, topic "Gentoo key signing practices 315.. [#gpg-numbers] gentoo-core mailing list, topic "Gentoo key signing practices
326 and official Gentoo keyring", Message-ID <20051117075838.GB15734@curie-int.vc.shawcable.net> 316 and official Gentoo keyring", Message-ID <20051117075838.GB15734@curie-int.vc.shawcable.net>
327 317
328.. [#manifest2-patch] http://thread.gmane.org/gmane.linux.gentoo.portage.devel/1374 318.. [#manifest2-patch] http://thread.gmane.org/gmane.linux.gentoo.portage.devel/1374
329 319
320.. [#manifest2-example] http://www.gentoo.org/proj/en/glep/glep-0044-extras/manifest2-example
321
330Copyright 322Copyright
331========= 323=========
332 324
333This document has been placed in the public domain. 325This document has been placed in the public domain.
334 326

Legend:
Removed from v.1.1  
changed lines
  Added in v.1.7

  ViewVC Help
Powered by ViewVC 1.1.20