| 1 | GLEP: 44 |
1 | GLEP: 44 |
| 2 | Title: Manifest2 format |
2 | Title: Manifest2 format |
| 3 | Version: $Revision: 1.1 $ |
3 | Version: $Revision: 1.7 $ |
| 4 | Last-Modified: $Date: 2005/12/06 03:34:21 $ |
4 | Last-Modified: $Date: 2006/10/14 02:55:39 $ |
| 5 | Author: Marius Mauch <genone@gentoo.org>, |
5 | Author: Marius Mauch <genone@gentoo.org>, |
| 6 | Status: Draft |
6 | Status: Accepted |
| 7 | Type: Standards Track |
7 | Type: Standards Track |
| 8 | Content-Type: text/x-rst |
8 | Content-Type: text/x-rst |
| 9 | Created: 04-Dec-2005 |
9 | Created: 04-Dec-2005 |
| 10 | Post-History: 05-Dec-2005 |
10 | Post-History: 06-Dec-2005, 23-Jan-2006, 3-Sep-2006 |
| 11 | |
11 | |
| 12 | |
12 | |
| 13 | Abstract |
13 | Abstract |
| 14 | ======== |
14 | ======== |
| 15 | |
15 | |
| … | |
… | |
| 23 | |
23 | |
| 24 | Please see [#reorg-thread]_ for a general overview. |
24 | Please see [#reorg-thread]_ for a general overview. |
| 25 | The main long term goals of this proposal are to: |
25 | The main long term goals of this proposal are to: |
| 26 | |
26 | |
| 27 | - Remove the tiny digest files from the tree. They are a major annoyance as on a |
27 | - Remove the tiny digest files from the tree. They are a major annoyance as on a |
| 28 | typical configuration they waste a lot of discspace and the simple transmission |
28 | typical configuration they waste a lot of disk space and the simple transmission |
| 29 | of the names for all digest files during a ``emerge --sync`` needs a substantial |
29 | of the names for all digest files during a ``emerge --sync`` needs a substantial |
| 30 | amount of bandwidth. |
30 | amount of bandwidth. |
| 31 | - Reduce redundancy when multiple hash functions are used |
31 | - Reduce redundancy when multiple hash functions are used |
| 32 | - Remove potential for checksum collisions if a file is recorded in more than one |
32 | - Remove potential for checksum collisions if a file is recorded in more than one |
| 33 | digest file |
33 | digest file |
| … | |
… | |
| 39 | |
39 | |
| 40 | The new Manifest format would change the existing format in the following ways: |
40 | The new Manifest format would change the existing format in the following ways: |
| 41 | |
41 | |
| 42 | - Addition of a filetype specifier, currently planned are |
42 | - Addition of a filetype specifier, currently planned are |
| 43 | |
43 | |
| 44 | * ``AUXFILE`` for files directly used by ebuilds (e.g. patches or initscripts), |
44 | * ``AUX`` for files directly used by ebuilds (e.g. patches or initscripts), |
| 45 | located in the ``files/`` subdirectory |
45 | located in the ``files/`` subdirectory |
| 46 | |
46 | |
| 47 | * ``EBUILD`` for all ebuilds |
47 | * ``EBUILD`` for all ebuilds |
| 48 | |
48 | |
| 49 | * ``MISCFILE`` for files not directly used by ebuilds like ``ChangeLog`` or |
49 | * ``MISC`` for files not directly used by ebuilds like ``ChangeLog`` or |
| 50 | ``metadata.xml`` files |
50 | ``metadata.xml`` files |
| 51 | |
51 | |
| 52 | * ``SRCURI`` for release tarballs recorded in the ``SRC_URI`` variable of an ebuild, |
52 | * ``DIST`` for release tarballs recorded in the ``SRC_URI`` variable of an ebuild, |
| 53 | these were previously recorded in the digest files |
53 | these were previously recorded in the digest files |
| 54 | |
54 | |
| 55 | Future portage improvements might extend this list (for example with types |
55 | Future portage improvements might extend this list (for example with types |
| 56 | relevant for eclasses or profiles) |
56 | relevant for eclasses or profiles) |
| 57 | |
57 | |
| … | |
… | |
| 67 | <filetype> <filename> <filesize> <chksumtype1> <chksum1> ... <chksumtypen> <chksumn> |
67 | <filetype> <filename> <filesize> <chksumtype1> <chksum1> ... <chksumtypen> <chksumn> |
| 68 | |
68 | |
| 69 | |
69 | |
| 70 | However theses entries will be stored in the existing Manifest files. |
70 | However theses entries will be stored in the existing Manifest files. |
| 71 | |
71 | |
| 72 | An actual example for a (pure) Manifest2 file could look like this (using |
72 | An `actual example`__ for a (pure) Manifest2 file.. |
| 73 | indentation to indicate line continuation): |
|
|
| 74 | |
73 | |
| 75 | :: |
74 | .. __: glep-0044-extras/manifest2-example.txt |
| 76 | |
|
|
| 77 | AUXFILE ldif-buffer-overflow-fix.diff 5007 RMD160 1354a6bd2687430b628b78aaf43f5c793d2f0704 |
|
|
| 78 | SHA1 424e1dfca06488f605b9611160020227ecdd03ac MD5 06d23c04b3d6ddfb1431c22ecc5b28f6 |
|
|
| 79 | AUXFILE procmime.patch 977 RMD160 39a51a4d654759b15d1644a79fb6e8921130df3c |
|
|
| 80 | SHA1 d76929f6dfc2179281f7ccee5789aab4e970ba9e MD5 bf4c9cd9cb7cdc6ece7d4d327910f0cf |
|
|
| 81 | EBUILD sylpheed-claws-1.0.5-r1.ebuild 3906 RMD160 cdd546c128db2dea7044437de01ec96e12b4f5bf |
|
|
| 82 | SHA1 a84b49e76961d7a9100852b64c2bfbf9b053d45e MD5 b9fe79135a475458ef1b2240ee302ebd |
|
|
| 83 | EBUILD sylpheed-claws-1.9.100.ebuild 4444 RMD160 89326038bfc694dafd22f10400a08d3f930fb2bd |
|
|
| 84 | SHA1 8895342f3f0cc6fcbdd0fdada2ad8e23ce539d23 MD5 0643de736b42d8c0e1673e86ae0b7f80 |
|
|
| 85 | EBUILD sylpheed-claws-1.9.15.ebuild 4821 RMD160 ec0ff811b893084459fe5b17b8ba8d6b35a55687 |
|
|
| 86 | SHA1 358278a43da244e1f4803ec4b04d6fa45c41ab4d MD5 15b5c9348ba0b0a416892588256b4cbc |
|
|
| 87 | MISCFILE ChangeLog 25770 RMD160 0e69dd7425add1560d630dd3367342418e9be776 |
|
|
| 88 | SHA1 1210160f7baf0319de3b1b58dc80d7680d316d28 MD5 732cdc3b41403a115970d497a9ec257e |
|
|
| 89 | MISCFILE metadata.xml 269 RMD160 39d775de55f9963f8946feaf088aa0324770bacb |
|
|
| 90 | SHA1 4fd7b285049d0e587f89e86becf06c0fd77bae6d MD5 82e806ed62f0596fb7bef493d225712f |
|
|
| 91 | SRCURI sylpheed-claws-1.0.5.tar.bz2 3268626 RMD160 f2708b5d69bc9a5025812511fde04eca7782e367 |
|
|
| 92 | SHA1 d351d7043eef7a875df18a8c4b9464be49e2164b MD5 ef4a1a7beb407dc7c31b4799bc48f12e |
|
|
| 93 | SRCURI sylpheed-claws-1.9.100.tar.bz2 3480063 RMD160 72fbcbcc05d966f34897efcc1c96377420dc5544 |
|
|
| 94 | SHA1 47465662b5470af5711493ce4eaad764c5bf02ca MD5 863c314557f90f17c2f6d6a0ab57e6c2 |
|
|
| 95 | SRCURI sylpheed-claws-1.9.15.tar.bz2 3481018 RMD160 b01d1af2df55806a8a8275102b10e389e0d98e94 |
|
|
| 96 | SHA1 a17fc64b8dcc5b56432e5beb5c826913cb3ad79e MD5 0d187526e0eca23b87ffa4981f7e1765 |
|
|
| 97 | |
75 | |
| 98 | |
76 | |
| 99 | Compability Entries |
77 | Compability Entries |
| 100 | ------------------- |
78 | ------------------- |
| 101 | |
79 | |
| … | |
… | |
| 120 | might be coupled). |
98 | might be coupled). |
| 121 | |
99 | |
| 122 | Also while multiple hash functions will become standard with the proposed |
100 | Also while multiple hash functions will become standard with the proposed |
| 123 | implementation they are not a specific feature of this format [#multi-hash-thread]_. |
101 | implementation they are not a specific feature of this format [#multi-hash-thread]_. |
| 124 | |
102 | |
|
|
103 | Number of hashes |
|
|
104 | ---------------- |
|
|
105 | |
|
|
106 | While using multiple hashes for each file is a major feature of this proposal |
|
|
107 | we have to make sure that the number of hashes listed is limited to avoid |
|
|
108 | an explosion of the Manifest size that would revert the main benefit of this proposal |
|
|
109 | (reduzing tree size). Therefore the number of hashes that will be generated |
|
|
110 | will be limited to three different hash functions. For compability though we |
|
|
111 | have to rely on at least one hash function to always be present, this proposal |
|
|
112 | suggest to use SHA1 for this purpose (as it is supposed to be more secure than MD5 |
|
|
113 | and currently only SHA1 and MD5 are directly available in python, also MD5 doesn't |
|
|
114 | have any benefit in terms of compability). |
| 125 | |
115 | |
| 126 | Rationale |
116 | Rationale |
| 127 | ========= |
117 | ========= |
| 128 | |
118 | |
| 129 | The main goals of the proposal have been listed in the `Motivation`_, here now |
119 | The main goals of the proposal have been listed in the `Motivation`_, here now |
| … | |
… | |
| 132 | |
122 | |
| 133 | Removal of digest files |
123 | Removal of digest files |
| 134 | ----------------------- |
124 | ----------------------- |
| 135 | |
125 | |
| 136 | Normal users that don't use a "tuned" filesystem for the portage tree are |
126 | Normal users that don't use a "tuned" filesystem for the portage tree are |
| 137 | wasting several dozen to a few hundred megabytes of discspace with the current |
127 | wasting several dozen to a few hundred megabytes of disk space with the current |
| 138 | system, largely caused by the digest files. |
128 | system, largely caused by the digest files. |
| 139 | This is due to the filesystem overhead present in most filesystem that |
129 | This is due to the filesystem overhead present in most filesystem that |
| 140 | have a standard blocksize of four kilobytes while most digest files are under |
130 | have a standard blocksize of four kilobytes while most digest files are under |
| 141 | one kilobyte in size, so this results in approximately a waste of three kilobytes |
131 | one kilobyte in size, so this results in approximately a waste of three kilobytes |
| 142 | per digest file (likely even more). At the time of this writing the tree contains |
132 | per digest file (likely even more). At the time of this writing the tree contains |
| 143 | roughly 22.000 digest files, so the overall waste caused by digest files is |
133 | roughly 22.000 digest files, so the overall waste caused by digest files is |
| 144 | estimated at about 70-100 megabytes. |
134 | estimated at about 70-100 megabytes. |
| 145 | Furthermore it is assumed that this will also reduce the discspace wasted by |
135 | Furthermore it is assumed that this will also reduce the disk space wasted by |
| 146 | the Manifest files as they now contain more content, but this hasn't been |
136 | the Manifest files as they now contain more content, but this hasn't been |
| 147 | verified yet. |
137 | verified yet. |
| 148 | |
138 | |
| 149 | By unifying the digest files with the Manifest these tiny files are eliminated |
139 | By unifying the digest files with the Manifest these tiny files are eliminated |
| 150 | (in the long run), reducing the apparent tree size by about 20%, benefitting |
140 | (in the long run), reducing the apparent tree size by about 20%, benefitting |
| … | |
… | |
| 161 | use, but expected to change soon (see [#multi-hash-thread]_). |
151 | use, but expected to change soon (see [#multi-hash-thread]_). |
| 162 | |
152 | |
| 163 | Removal of checksum collisions |
153 | Removal of checksum collisions |
| 164 | ------------------------------ |
154 | ------------------------------ |
| 165 | |
155 | |
| 166 | The current system theoretically allows for a ``SRCURI`` type file to be recorded |
156 | The current system theoretically allows for a ``DIST`` type file to be recorded |
| 167 | in multiple digest files with different sizes and/or checksums. In such a case |
157 | in multiple digest files with different sizes and/or checksums. In such a case |
| 168 | one version of a package would report a checksum violation while another one |
158 | one version of a package would report a checksum violation while another one |
| 169 | would not. This could create confusion and uncertainity among users. |
159 | would not. This could create confusion and uncertainity among users. |
| 170 | So far this case hasn't been observed, but it can't be ruled out with the |
160 | So far this case hasn't been observed, but it can't be ruled out with the |
| 171 | existing system. |
161 | existing system. |
| … | |
… | |
| 173 | |
163 | |
| 174 | Flexible verification system |
164 | Flexible verification system |
| 175 | ---------------------------- |
165 | ---------------------------- |
| 176 | |
166 | |
| 177 | Right now portage verifies the checksum of every file listed in the Manifest |
167 | Right now portage verifies the checksum of every file listed in the Manifest |
| 178 | before using any file of the package and all ``SRCURI`` files of an ebuild |
168 | before using any file of the package and all ``DIST`` files of an ebuild |
| 179 | before using that ebuild. This is unnecessary in many cases: |
169 | before using that ebuild. This is unnecessary in many cases: |
| 180 | |
170 | |
| 181 | - During the "depend" phase (when the ebuild metadata is generated) only |
171 | - During the "depend" phase (when the ebuild metadata is generated) only |
| 182 | files of type ``EBUILD`` are used, so verifying the other types isn't |
172 | files of type ``EBUILD`` are used, so verifying the other types isn't |
| 183 | necessary. Theoretically it is possible for an ebuild to include other |
173 | necessary. Theoretically it is possible for an ebuild to include other |
| 184 | files like those of type ``AUXFILE`` at this phase, but that would be a |
174 | files like those of type ``AUX`` at this phase, but that would be a |
| 185 | major QA violation and should never occur, so it can be ignored here. |
175 | major QA violation and should never occur, so it can be ignored here. |
| 186 | It is also not a security concern as the ebuild is verified before parsing |
176 | It is also not a security concern as the ebuild is verified before parsing |
| 187 | it, so each manipulation would show up. |
177 | it, so each manipulation would show up. |
| 188 | |
178 | |
| 189 | - Generally files of type ``MISCFILE`` don't need to be verified as they are |
179 | - Generally files of type ``MISC`` don't need to be verified as they are |
| 190 | only used in very specific situations, aren't executed (just parsed at most) |
180 | only used in very specific situations, aren't executed (just parsed at most) |
| 191 | and don't affect the package build process. |
181 | and don't affect the package build process. |
| 192 | |
182 | |
| 193 | - Files of type ``SRCURI`` only need to be verified directly after fetching and |
183 | - Files of type ``DIST`` only need to be verified directly after fetching and |
| 194 | before unpacking them (which often will be one step), not every time their |
184 | before unpacking them (which often will be one step), not every time their |
| 195 | associated ebuild is used. |
185 | associated ebuild is used. |
| 196 | |
186 | |
| 197 | |
187 | |
| 198 | Backwards Compatibility |
188 | Backwards Compatibility |
| … | |
… | |
| 244 | No timeframe for implementation is presented here as it is highly dependent |
234 | No timeframe for implementation is presented here as it is highly dependent |
| 245 | on the completion of each step. |
235 | on the completion of each step. |
| 246 | |
236 | |
| 247 | In summary it can be said that while a full conversion will take over a year |
237 | In summary it can be said that while a full conversion will take over a year |
| 248 | to be completed due to compability issues mentioned above some benefits of the |
238 | to be completed due to compability issues mentioned above some benefits of the |
| 249 | system can be selectively be used as soon as step 2) is completed. |
239 | system can selectively be used as soon as step 2) is completed. |
| 250 | |
240 | |
| 251 | |
241 | |
| 252 | Other problems |
242 | Other problems |
| 253 | ============== |
243 | ============== |
| 254 | |
244 | |
| 255 | Impacts on infrastructure |
245 | Impacts on infrastructure |
| 256 | ------------------------- |
246 | ------------------------- |
| 257 | |
247 | |
| 258 | While one long term goal of this proposal is to reduce the size of the tree |
248 | While one long term goal of this proposal is to reduce the size of the tree |
| 259 | and therefore make life for the Gentoo Infrastructure this will only take effect |
249 | and therefore make life for the Gentoo Infrastructure easier this will only |
| 260 | once the implementation is rolled out completely. In the meantime however it |
250 | take effect once the implementation is rolled out completely. In the meantime |
| 261 | will increase the tree size due to keeping checksums in both formats. It's not |
251 | however it will increase the tree size due to keeping checksums in both formats. |
| 262 | possible to give a usable estimate on the degree of the increase as it depends |
252 | It's not possible to give a usable estimate on the degree of the increase as |
| 263 | on many variables such as the exact implementation timeframe, propagation of |
253 | it depends on many variables such as the exact implementation timeframe, |
| 264 | Manifest2 capable portage versions among devs or the update rate of the tree. |
254 | propagation of Manifest2 capable portage versions among devs or the update |
| 265 | It has been suggested that Manifest files that are not gpg signed could be |
255 | rate of the tree. It has been suggested that Manifest files that are not gpg |
| 266 | mass converted in one step, this could certainly help but only to some degree |
256 | signed could be mass converted in one step, this could certainly help but only |
| 267 | (according to a recent research [#gpg-numbers]_ about 40% of all Manifests in |
257 | to some degree (according to a recent research [#gpg-numbers]_ about 40% of |
| 268 | the tree are signed, but this number hasn't been verified). |
258 | all Manifests in the tree are signed, but this number hasn't been verified). |
| 269 | |
259 | |
| 270 | |
260 | |
| 271 | Reference Implementation |
261 | Reference Implementation |
| 272 | ======================== |
262 | ======================== |
| 273 | |
263 | |
| … | |
… | |
| 290 | |
280 | |
| 291 | - convert size field into checksum: Another idea was to treat the size field |
281 | - convert size field into checksum: Another idea was to treat the size field |
| 292 | like any other checksum. But so far no real benefit (other than a slightly |
282 | like any other checksum. But so far no real benefit (other than a slightly |
| 293 | more modular implementation) for this has been seen while it has several |
283 | more modular implementation) for this has been seen while it has several |
| 294 | drawbacks: For once, unlike checksums, the size field is definitely required |
284 | drawbacks: For once, unlike checksums, the size field is definitely required |
| 295 | for all ``SRCURI`` files, also it would slightly increase the length of |
285 | for all ``DIST`` files, also it would slightly increase the length of |
| 296 | each entry by adding a ``SIZE`` keyword. |
286 | each entry by adding a ``SIZE`` keyword. |
| 297 | |
287 | |
| 298 | - removal of the ``MISCFILE`` type: It has been suggested to completely drop |
288 | - removal of the ``MISC`` type: It has been suggested to completely drop |
| 299 | entries of type ``MISCFILE``. This would result in a minor space reduction |
289 | entries of type ``MISC``. This would result in a minor space reduction |
| 300 | (its rather unlikely to free any blocks) but completely remove the ability |
290 | (its rather unlikely to free any blocks) but completely remove the ability |
| 301 | to check these files for integrity. While they don't influence portage |
291 | to check these files for integrity. While they don't influence portage |
| 302 | or packages directly they can contain viable information for users, so |
292 | or packages directly they can contain viable information for users, so |
| 303 | the author has the opinion that at least the option for integrity checks |
293 | the author has the opinion that at least the option for integrity checks |
| 304 | should be kept. |
294 | should be kept. |
| … | |
… | |
| 325 | .. [#gpg-numbers] gentoo-core mailing list, topic "Gentoo key signing practices |
315 | .. [#gpg-numbers] gentoo-core mailing list, topic "Gentoo key signing practices |
| 326 | and official Gentoo keyring", Message-ID <20051117075838.GB15734@curie-int.vc.shawcable.net> |
316 | and official Gentoo keyring", Message-ID <20051117075838.GB15734@curie-int.vc.shawcable.net> |
| 327 | |
317 | |
| 328 | .. [#manifest2-patch] http://thread.gmane.org/gmane.linux.gentoo.portage.devel/1374 |
318 | .. [#manifest2-patch] http://thread.gmane.org/gmane.linux.gentoo.portage.devel/1374 |
| 329 | |
319 | |
|
|
320 | .. [#manifest2-example] http://www.gentoo.org/proj/en/glep/glep-0044-extras/manifest2-example |
|
|
321 | |
| 330 | Copyright |
322 | Copyright |
| 331 | ========= |
323 | ========= |
| 332 | |
324 | |
| 333 | This document has been placed in the public domain. |
325 | This document has been placed in the public domain. |
| 334 | |
326 | |