1 |
g2boojum |
1.1 |
GLEP: 44 |
2 |
|
|
Title: Manifest2 format |
3 |
|
|
Version: $Revision: 1.1 $ |
4 |
|
|
Last-Modified: $Date: 2005/12/04 11:49:20 $ |
5 |
|
|
Author: Marius Mauch <genone@gentoo.org>, |
6 |
|
|
Status: Draft |
7 |
|
|
Type: Standards Track |
8 |
|
|
Content-Type: text/x-rst |
9 |
|
|
Created: 04-Dec-2005 |
10 |
|
|
Post-History: 05-Dec-2005 |
11 |
|
|
|
12 |
|
|
|
13 |
|
|
Abstract |
14 |
|
|
======== |
15 |
|
|
|
16 |
|
|
This GLEP proposes a new format for the Portage Manifest and digest file system |
17 |
|
|
by unifying both filetypes into one to improve functional and non-functional |
18 |
|
|
aspects of the Portage Tree. |
19 |
|
|
|
20 |
|
|
|
21 |
|
|
Motivation |
22 |
|
|
========== |
23 |
|
|
|
24 |
|
|
Please see [#reorg-thread]_ for a general overview. |
25 |
|
|
The main long term goals of this proposal are to: |
26 |
|
|
|
27 |
|
|
- Remove the tiny digest files from the tree. They are a major annoyance as on a |
28 |
|
|
typical configuration they waste a lot of discspace and the simple transmission |
29 |
|
|
of the names for all digest files during a ``emerge --sync`` needs a substantial |
30 |
|
|
amount of bandwidth. |
31 |
|
|
- Reduce redundancy when multiple hash functions are used |
32 |
|
|
- Remove potential for checksum collisions if a file is recorded in more than one |
33 |
|
|
digest file |
34 |
|
|
- Difference between filetypes for a more flexible verification system |
35 |
|
|
|
36 |
|
|
|
37 |
|
|
Specification |
38 |
|
|
============= |
39 |
|
|
|
40 |
|
|
The new Manifest format would change the existing format in the following ways: |
41 |
|
|
|
42 |
|
|
- Addition of a filetype specifier, currently planned are |
43 |
|
|
|
44 |
|
|
* ``AUXFILE`` for files directly used by ebuilds (e.g. patches or initscripts), |
45 |
|
|
located in the ``files/`` subdirectory |
46 |
|
|
|
47 |
|
|
* ``EBUILD`` for all ebuilds |
48 |
|
|
|
49 |
|
|
* ``MISCFILE`` for files not directly used by ebuilds like ``ChangeLog`` or |
50 |
|
|
``metadata.xml`` files |
51 |
|
|
|
52 |
|
|
* ``SRCURI`` for release tarballs recorded in the ``SRC_URI`` variable of an ebuild, |
53 |
|
|
these were previously recorded in the digest files |
54 |
|
|
|
55 |
|
|
Future portage improvements might extend this list (for example with types |
56 |
|
|
relevant for eclasses or profiles) |
57 |
|
|
|
58 |
|
|
- Only have one line per file listing all information instead of one line per |
59 |
|
|
file and checksum type |
60 |
|
|
|
61 |
|
|
- Remove the separated digest-* files in the ``files/`` subdirectory |
62 |
|
|
|
63 |
|
|
Each line in the new format has the following format: |
64 |
|
|
|
65 |
|
|
:: |
66 |
|
|
|
67 |
|
|
<filetype> <filename> <filesize> <chksumtype1> <chksum1> ... <chksumtypen> <chksumn> |
68 |
|
|
|
69 |
|
|
|
70 |
|
|
However theses entries will be stored in the existing Manifest files. |
71 |
|
|
|
72 |
|
|
An actual example for a (pure) Manifest2 file could look like this (using |
73 |
|
|
indentation to indicate line continuation): |
74 |
|
|
|
75 |
|
|
:: |
76 |
|
|
|
77 |
|
|
AUXFILE ldif-buffer-overflow-fix.diff 5007 RMD160 1354a6bd2687430b628b78aaf43f5c793d2f0704 |
78 |
|
|
SHA1 424e1dfca06488f605b9611160020227ecdd03ac MD5 06d23c04b3d6ddfb1431c22ecc5b28f6 |
79 |
|
|
AUXFILE procmime.patch 977 RMD160 39a51a4d654759b15d1644a79fb6e8921130df3c |
80 |
|
|
SHA1 d76929f6dfc2179281f7ccee5789aab4e970ba9e MD5 bf4c9cd9cb7cdc6ece7d4d327910f0cf |
81 |
|
|
EBUILD sylpheed-claws-1.0.5-r1.ebuild 3906 RMD160 cdd546c128db2dea7044437de01ec96e12b4f5bf |
82 |
|
|
SHA1 a84b49e76961d7a9100852b64c2bfbf9b053d45e MD5 b9fe79135a475458ef1b2240ee302ebd |
83 |
|
|
EBUILD sylpheed-claws-1.9.100.ebuild 4444 RMD160 89326038bfc694dafd22f10400a08d3f930fb2bd |
84 |
|
|
SHA1 8895342f3f0cc6fcbdd0fdada2ad8e23ce539d23 MD5 0643de736b42d8c0e1673e86ae0b7f80 |
85 |
|
|
EBUILD sylpheed-claws-1.9.15.ebuild 4821 RMD160 ec0ff811b893084459fe5b17b8ba8d6b35a55687 |
86 |
|
|
SHA1 358278a43da244e1f4803ec4b04d6fa45c41ab4d MD5 15b5c9348ba0b0a416892588256b4cbc |
87 |
|
|
MISCFILE ChangeLog 25770 RMD160 0e69dd7425add1560d630dd3367342418e9be776 |
88 |
|
|
SHA1 1210160f7baf0319de3b1b58dc80d7680d316d28 MD5 732cdc3b41403a115970d497a9ec257e |
89 |
|
|
MISCFILE metadata.xml 269 RMD160 39d775de55f9963f8946feaf088aa0324770bacb |
90 |
|
|
SHA1 4fd7b285049d0e587f89e86becf06c0fd77bae6d MD5 82e806ed62f0596fb7bef493d225712f |
91 |
|
|
SRCURI sylpheed-claws-1.0.5.tar.bz2 3268626 RMD160 f2708b5d69bc9a5025812511fde04eca7782e367 |
92 |
|
|
SHA1 d351d7043eef7a875df18a8c4b9464be49e2164b MD5 ef4a1a7beb407dc7c31b4799bc48f12e |
93 |
|
|
SRCURI sylpheed-claws-1.9.100.tar.bz2 3480063 RMD160 72fbcbcc05d966f34897efcc1c96377420dc5544 |
94 |
|
|
SHA1 47465662b5470af5711493ce4eaad764c5bf02ca MD5 863c314557f90f17c2f6d6a0ab57e6c2 |
95 |
|
|
SRCURI sylpheed-claws-1.9.15.tar.bz2 3481018 RMD160 b01d1af2df55806a8a8275102b10e389e0d98e94 |
96 |
|
|
SHA1 a17fc64b8dcc5b56432e5beb5c826913cb3ad79e MD5 0d187526e0eca23b87ffa4981f7e1765 |
97 |
|
|
|
98 |
|
|
|
99 |
|
|
Compability Entries |
100 |
|
|
------------------- |
101 |
|
|
|
102 |
|
|
To maintain compability with existing portage versions a transition period after |
103 |
|
|
is the introduction of the Manifest2 format is required during which portage |
104 |
|
|
will not only have to be capable of using existing Manifest and digest files but |
105 |
|
|
also generate them in addition to the new entries. |
106 |
|
|
Fortunately this can be accomplished by simply mixing old and new style entries |
107 |
|
|
in one file for the Manifest files, existing portage versions will simply ignore |
108 |
|
|
the new style entries. For the digest files there are no new entries to care |
109 |
|
|
about. |
110 |
|
|
|
111 |
|
|
Scope |
112 |
|
|
----- |
113 |
|
|
|
114 |
|
|
It is important to note that this proposal only deals with a change of the |
115 |
|
|
format of the digest and Manifest system. |
116 |
|
|
|
117 |
|
|
It does not expand the scope of it to cover eclasses, profiles or anything |
118 |
|
|
else not already covered by the Manifest system, it also doesn't affect |
119 |
|
|
the Manifest signing efforts in any way (though the implementations of both |
120 |
|
|
might be coupled). |
121 |
|
|
|
122 |
|
|
Also while multiple hash functions will become standard with the proposed |
123 |
|
|
implementation they are not a specific feature of this format [#multi-hash-thread]_. |
124 |
|
|
|
125 |
|
|
|
126 |
|
|
Rationale |
127 |
|
|
========= |
128 |
|
|
|
129 |
|
|
The main goals of the proposal have been listed in the `Motivation`_, here now |
130 |
|
|
the explanation why they are improvements and how the proposed format will |
131 |
|
|
accomplish them. |
132 |
|
|
|
133 |
|
|
Removal of digest files |
134 |
|
|
----------------------- |
135 |
|
|
|
136 |
|
|
Normal users that don't use a "tuned" filesystem for the portage tree are |
137 |
|
|
wasting several dozen to a few hundred megabytes of discspace with the current |
138 |
|
|
system, largely caused by the digest files. |
139 |
|
|
This is due to the filesystem overhead present in most filesystem that |
140 |
|
|
have a standard blocksize of four kilobytes while most digest files are under |
141 |
|
|
one kilobyte in size, so this results in approximately a waste of three kilobytes |
142 |
|
|
per digest file (likely even more). At the time of this writing the tree contains |
143 |
|
|
roughly 22.000 digest files, so the overall waste caused by digest files is |
144 |
|
|
estimated at about 70-100 megabytes. |
145 |
|
|
Furthermore it is assumed that this will also reduce the discspace wasted by |
146 |
|
|
the Manifest files as they now contain more content, but this hasn't been |
147 |
|
|
verified yet. |
148 |
|
|
|
149 |
|
|
By unifying the digest files with the Manifest these tiny files are eliminated |
150 |
|
|
(in the long run), reducing the apparent tree size by about 20%, benefitting |
151 |
|
|
both users and the Gentoo infrastructure. |
152 |
|
|
|
153 |
|
|
Reducing redundancy |
154 |
|
|
------------------- |
155 |
|
|
|
156 |
|
|
When multiple hashes are used with the current system |
157 |
|
|
both the filename and filesize are repeated for every checksum type used as each |
158 |
|
|
checksum is standalone. However this doesn't add any functionality and is |
159 |
|
|
therefore useless, so the new format removes this redundancy. |
160 |
|
|
This is a theoretical improvement at this moment as only one hash function is in |
161 |
|
|
use, but expected to change soon (see [#multi-hash-thread]_). |
162 |
|
|
|
163 |
|
|
Removal of checksum collisions |
164 |
|
|
------------------------------ |
165 |
|
|
|
166 |
|
|
The current system theoretically allows for a ``SRCURI`` type file to be recorded |
167 |
|
|
in multiple digest files with different sizes and/or checksums. In such a case |
168 |
|
|
one version of a package would report a checksum violation while another one |
169 |
|
|
would not. This could create confusion and uncertainity among users. |
170 |
|
|
So far this case hasn't been observed, but it can't be ruled out with the |
171 |
|
|
existing system. |
172 |
|
|
As the new format lists each file exactly once this would be no longer possible. |
173 |
|
|
|
174 |
|
|
Flexible verification system |
175 |
|
|
---------------------------- |
176 |
|
|
|
177 |
|
|
Right now portage verifies the checksum of every file listed in the Manifest |
178 |
|
|
before using any file of the package and all ``SRCURI`` files of an ebuild |
179 |
|
|
before using that ebuild. This is unnecessary in many cases: |
180 |
|
|
|
181 |
|
|
- During the "depend" phase (when the ebuild metadata is generated) only |
182 |
|
|
files of type ``EBUILD`` are used, so verifying the other types isn't |
183 |
|
|
necessary. Theoretically it is possible for an ebuild to include other |
184 |
|
|
files like those of type ``AUXFILE`` at this phase, but that would be a |
185 |
|
|
major QA violation and should never occur, so it can be ignored here. |
186 |
|
|
It is also not a security concern as the ebuild is verified before parsing |
187 |
|
|
it, so each manipulation would show up. |
188 |
|
|
|
189 |
|
|
- Generally files of type ``MISCFILE`` don't need to be verified as they are |
190 |
|
|
only used in very specific situations, aren't executed (just parsed at most) |
191 |
|
|
and don't affect the package build process. |
192 |
|
|
|
193 |
|
|
- Files of type ``SRCURI`` only need to be verified directly after fetching and |
194 |
|
|
before unpacking them (which often will be one step), not every time their |
195 |
|
|
associated ebuild is used. |
196 |
|
|
|
197 |
|
|
|
198 |
|
|
Backwards Compatibility |
199 |
|
|
======================= |
200 |
|
|
|
201 |
|
|
Switching the Manifest system is a task that will need a long transition period |
202 |
|
|
like most changes affecting both portage and the tree. In this case the |
203 |
|
|
implementation will be rolled out in several phases: |
204 |
|
|
|
205 |
|
|
1. Add support for verification of Manifest2 entries in portage |
206 |
|
|
|
207 |
|
|
2. Enable generation of Manifest2 entries in addition to the current system |
208 |
|
|
|
209 |
|
|
3. Ignore digests during ``emerge --sync`` to get the size-benefit clientside. |
210 |
|
|
This step may be ommitted if the following steps are expected to follow soon. |
211 |
|
|
|
212 |
|
|
4. Disable generation of entries for the current system |
213 |
|
|
|
214 |
|
|
5. Remove all traces of the current system from the tree (serverside) |
215 |
|
|
|
216 |
|
|
Each step has its own issues. While 1) and 2) can be implemented without any |
217 |
|
|
compability problems all later steps have a major impact: |
218 |
|
|
|
219 |
|
|
- Step 3) can only be implemented when the whole tree is Manifest2 ready |
220 |
|
|
(ideally speaking, practically the requirement will be more like 95% coverage |
221 |
|
|
with the expectation that for the remaining 5% either bugs will be filed after |
222 |
|
|
step 3) is completed or they'll be updated at step 5). |
223 |
|
|
|
224 |
|
|
- Steps 4) and 5) will render all portage versions without Manifest2 support |
225 |
|
|
basically useless (users would have to regenerate the digest and Manifest |
226 |
|
|
for each package before being able to merge it), so this requires a almost |
227 |
|
|
100% coverage of the userbase with Manifest2 capabale portage versions |
228 |
|
|
(with step 1) completely implemented). |
229 |
|
|
|
230 |
|
|
Another problem is that some steps affect different targets: |
231 |
|
|
|
232 |
|
|
- Steps 1) and 3) target portage versions used by users |
233 |
|
|
|
234 |
|
|
- Steps 2) and 4) target portage versions used by devs |
235 |
|
|
|
236 |
|
|
- Step 5) targets the portage tree on the cvs server |
237 |
|
|
|
238 |
|
|
While it is relatively easy to get all devs to use a new portage version this is |
239 |
|
|
practically impossible with users as some don't update their systems regulary. |
240 |
|
|
While six months are probably sufficient to reach a 95% coverage one year is |
241 |
|
|
estimated to reach an almost-complete coverage. All times are relative to the |
242 |
|
|
stable-marking of a compatible portage version. |
243 |
|
|
|
244 |
|
|
No timeframe for implementation is presented here as it is highly dependent |
245 |
|
|
on the completion of each step. |
246 |
|
|
|
247 |
|
|
In summary it can be said that while a full conversion will take over a year |
248 |
|
|
to be completed due to compability issues mentioned above some benefits of the |
249 |
|
|
system can be selectively be used as soon as step 2) is completed. |
250 |
|
|
|
251 |
|
|
|
252 |
|
|
Other problems |
253 |
|
|
============== |
254 |
|
|
|
255 |
|
|
Impacts on infrastructure |
256 |
|
|
------------------------- |
257 |
|
|
|
258 |
|
|
While one long term goal of this proposal is to reduce the size of the tree |
259 |
|
|
and therefore make life for the Gentoo Infrastructure this will only take effect |
260 |
|
|
once the implementation is rolled out completely. In the meantime however it |
261 |
|
|
will increase the tree size due to keeping checksums in both formats. It's not |
262 |
|
|
possible to give a usable estimate on the degree of the increase as it depends |
263 |
|
|
on many variables such as the exact implementation timeframe, propagation of |
264 |
|
|
Manifest2 capable portage versions among devs or the update rate of the tree. |
265 |
|
|
It has been suggested that Manifest files that are not gpg signed could be |
266 |
|
|
mass converted in one step, this could certainly help but only to some degree |
267 |
|
|
(according to a recent research [#gpg-numbers]_ about 40% of all Manifests in |
268 |
|
|
the tree are signed, but this number hasn't been verified). |
269 |
|
|
|
270 |
|
|
|
271 |
|
|
Reference Implementation |
272 |
|
|
======================== |
273 |
|
|
|
274 |
|
|
A patch for a prototype implementation of Manifest2 verification and partial |
275 |
|
|
generation has been posted at [#manifest2-patch]_, it will be reworked before |
276 |
|
|
being considered for inclusion in portage. However it shows that adding support |
277 |
|
|
for verification is quite simple, but generation is a bit tricky and will |
278 |
|
|
therefore be implemented later. |
279 |
|
|
|
280 |
|
|
|
281 |
|
|
Options |
282 |
|
|
======= |
283 |
|
|
|
284 |
|
|
Some things have been considered for this GLEP but aren't part of the proposal |
285 |
|
|
yet for various reasons: |
286 |
|
|
|
287 |
|
|
- timestamp field: the author has considered adding a timestamp field for |
288 |
|
|
each entry to list the time the entry was created. However so far no practical |
289 |
|
|
use for such a feature has been found. |
290 |
|
|
|
291 |
|
|
- convert size field into checksum: Another idea was to treat the size field |
292 |
|
|
like any other checksum. But so far no real benefit (other than a slightly |
293 |
|
|
more modular implementation) for this has been seen while it has several |
294 |
|
|
drawbacks: For once, unlike checksums, the size field is definitely required |
295 |
|
|
for all ``SRCURI`` files, also it would slightly increase the length of |
296 |
|
|
each entry by adding a ``SIZE`` keyword. |
297 |
|
|
|
298 |
|
|
- removal of the ``MISCFILE`` type: It has been suggested to completely drop |
299 |
|
|
entries of type ``MISCFILE``. This would result in a minor space reduction |
300 |
|
|
(its rather unlikely to free any blocks) but completely remove the ability |
301 |
|
|
to check these files for integrity. While they don't influence portage |
302 |
|
|
or packages directly they can contain viable information for users, so |
303 |
|
|
the author has the opinion that at least the option for integrity checks |
304 |
|
|
should be kept. |
305 |
|
|
|
306 |
|
|
Credits |
307 |
|
|
======= |
308 |
|
|
|
309 |
|
|
Thanks to the following persons for their input on or related to this GLEP |
310 |
|
|
(even though they might not have known it): |
311 |
|
|
Ned Ludd (solar), Brian Harring (ferringb), Jason Stubbs (jstubbs), |
312 |
|
|
Robin H. Johnson (robbat2), Aron Griffis (agriffis) |
313 |
|
|
|
314 |
|
|
Also thanks to Nicholas Jones (carpaski) to make the current Manifest system |
315 |
|
|
resistent enough to be able to handle this change without too many transition |
316 |
|
|
problems. |
317 |
|
|
|
318 |
|
|
References |
319 |
|
|
========== |
320 |
|
|
|
321 |
|
|
.. [#reorg-thread] http://thread.gmane.org/gmane.linux.gentoo.devel/21920 |
322 |
|
|
|
323 |
|
|
.. [#multi-hash-thread] http://thread.gmane.org/gmane.linux.gentoo.devel/33434 |
324 |
|
|
|
325 |
|
|
.. [#gpg-numbers] gentoo-core mailing list, topic "Gentoo key signing practices |
326 |
|
|
and official Gentoo keyring", Message-ID <20051117075838.GB15734@curie-int.vc.shawcable.net> |
327 |
|
|
|
328 |
|
|
.. [#manifest2-patch] http://thread.gmane.org/gmane.linux.gentoo.portage.devel/1374 |
329 |
|
|
|
330 |
|
|
Copyright |
331 |
|
|
========= |
332 |
|
|
|
333 |
|
|
This document has been placed in the public domain. |
334 |
|
|
|