/[gentoo]/xml/htdocs/proj/en/glep/glep-0044.html
Gentoo

Contents of /xml/htdocs/proj/en/glep/glep-0044.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.11 - (show annotations) (download) (as text)
Sun Jan 11 19:40:56 2009 UTC (5 years, 7 months ago) by betelgeuse
Branch: MAIN
Changes since 1.10: +81 -83 lines
File MIME type: text/html
Update Status from Accepted to Final

1 <?xml version="1.0" encoding="utf-8" ?>
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
4
5 <head>
6 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
7 <meta name="generator" content="Docutils 0.5: http://docutils.sourceforge.net/" />
8 <title>GLEP 44 -- Manifest2 format</title>
9 <link rel="stylesheet" href="tools/glep.css" type="text/css" /></head>
10 <body bgcolor="white">
11 <table class="navigation" cellpadding="0" cellspacing="0"
12 width="100%" border="0">
13 <tr><td class="navicon" width="150" height="35">
14 <a href="http://www.gentoo.org/" title="Gentoo Linux Home Page">
15 <img src="http://www.gentoo.org/images/gentoo-new.gif" alt="[Gentoo]"
16 border="0" width="150" height="35" /></a></td>
17 <td class="textlinks" align="left">
18 [<b><a href="http://www.gentoo.org/">Gentoo Linux Home</a></b>]
19 [<b><a href="http://www.gentoo.org/proj/en/glep">GLEP Index</a></b>]
20 [<b><a href="http://www.gentoo.org/proj/en/glep/glep-0044.txt">GLEP Source</a></b>]
21 </td></tr></table>
22 <table class="rfc2822 docutils field-list" frame="void" rules="none">
23 <col class="field-name" />
24 <col class="field-body" />
25 <tbody valign="top">
26 <tr class="field"><th class="field-name">GLEP:</th><td class="field-body">44</td>
27 </tr>
28 <tr class="field"><th class="field-name">Title:</th><td class="field-body">Manifest2 format</td>
29 </tr>
30 <tr class="field"><th class="field-name">Version:</th><td class="field-body">1.7</td>
31 </tr>
32 <tr class="field"><th class="field-name">Last-Modified:</th><td class="field-body"><a class="reference external" href="http://www.gentoo.org/cgi-bin/viewcvs.cgi/xml/htdocs/proj/en/glep/glep-0044.txt?cvsroot=gentoo">2006/10/14 02:55:39</a></td>
33 </tr>
34 <tr class="field"><th class="field-name">Author:</th><td class="field-body">Marius Mauch &lt;genone&#32;&#97;t&#32;gentoo.org&gt;,</td>
35 </tr>
36 <tr class="field"><th class="field-name">Status:</th><td class="field-body">Final</td>
37 </tr>
38 <tr class="field"><th class="field-name">Type:</th><td class="field-body">Standards Track</td>
39 </tr>
40 <tr class="field"><th class="field-name">Content-Type:</th><td class="field-body"><a class="reference external" href="glep-0002.html">text/x-rst</a></td>
41 </tr>
42 <tr class="field"><th class="field-name">Created:</th><td class="field-body">04-Dec-2005</td>
43 </tr>
44 <tr class="field"><th class="field-name">Post-History:</th><td class="field-body">06-Dec-2005, 23-Jan-2006, 3-Sep-2006</td>
45 </tr>
46 </tbody>
47 </table>
48 <hr />
49 <div class="contents topic" id="contents">
50 <p class="topic-title first">Contents</p>
51 <ul class="simple">
52 <li><a class="reference internal" href="#abstract" id="id9">Abstract</a></li>
53 <li><a class="reference internal" href="#motivation" id="id10">Motivation</a></li>
54 <li><a class="reference internal" href="#specification" id="id11">Specification</a><ul>
55 <li><a class="reference internal" href="#compability-entries" id="id12">Compability Entries</a></li>
56 <li><a class="reference internal" href="#scope" id="id13">Scope</a></li>
57 <li><a class="reference internal" href="#number-of-hashes" id="id14">Number of hashes</a></li>
58 </ul>
59 </li>
60 <li><a class="reference internal" href="#rationale" id="id15">Rationale</a><ul>
61 <li><a class="reference internal" href="#removal-of-digest-files" id="id16">Removal of digest files</a></li>
62 <li><a class="reference internal" href="#reducing-redundancy" id="id17">Reducing redundancy</a></li>
63 <li><a class="reference internal" href="#removal-of-checksum-collisions" id="id18">Removal of checksum collisions</a></li>
64 <li><a class="reference internal" href="#flexible-verification-system" id="id19">Flexible verification system</a></li>
65 </ul>
66 </li>
67 <li><a class="reference internal" href="#backwards-compatibility" id="id20">Backwards Compatibility</a></li>
68 <li><a class="reference internal" href="#other-problems" id="id21">Other problems</a><ul>
69 <li><a class="reference internal" href="#impacts-on-infrastructure" id="id22">Impacts on infrastructure</a></li>
70 </ul>
71 </li>
72 <li><a class="reference internal" href="#reference-implementation" id="id23">Reference Implementation</a></li>
73 <li><a class="reference internal" href="#options" id="id24">Options</a></li>
74 <li><a class="reference internal" href="#credits" id="id25">Credits</a></li>
75 <li><a class="reference internal" href="#references" id="id26">References</a></li>
76 <li><a class="reference internal" href="#copyright" id="id27">Copyright</a></li>
77 </ul>
78 </div>
79 <div class="section" id="abstract">
80 <h1><a class="toc-backref" href="#id9">Abstract</a></h1>
81 <p>This GLEP proposes a new format for the Portage Manifest and digest file system
82 by unifying both filetypes into one to improve functional and non-functional
83 aspects of the Portage Tree.</p>
84 </div>
85 <div class="section" id="motivation">
86 <h1><a class="toc-backref" href="#id10">Motivation</a></h1>
87 <p>Please see <a class="footnote-reference" href="#reorg-thread" id="id1">[1]</a> for a general overview.
88 The main long term goals of this proposal are to:</p>
89 <ul class="simple">
90 <li>Remove the tiny digest files from the tree. They are a major annoyance as on a
91 typical configuration they waste a lot of disk space and the simple transmission
92 of the names for all digest files during a <tt class="docutils literal"><span class="pre">emerge</span> <span class="pre">--sync</span></tt> needs a substantial
93 amount of bandwidth.</li>
94 <li>Reduce redundancy when multiple hash functions are used</li>
95 <li>Remove potential for checksum collisions if a file is recorded in more than one
96 digest file</li>
97 <li>Difference between filetypes for a more flexible verification system</li>
98 </ul>
99 </div>
100 <div class="section" id="specification">
101 <h1><a class="toc-backref" href="#id11">Specification</a></h1>
102 <p>The new Manifest format would change the existing format in the following ways:</p>
103 <ul>
104 <li><p class="first">Addition of a filetype specifier, currently planned are</p>
105 <ul class="simple">
106 <li><tt class="docutils literal"><span class="pre">AUX</span></tt> for files directly used by ebuilds (e.g. patches or initscripts),
107 located in the <tt class="docutils literal"><span class="pre">files/</span></tt> subdirectory</li>
108 <li><tt class="docutils literal"><span class="pre">EBUILD</span></tt> for all ebuilds</li>
109 <li><tt class="docutils literal"><span class="pre">MISC</span></tt> for files not directly used by ebuilds like <tt class="docutils literal"><span class="pre">ChangeLog</span></tt> or
110 <tt class="docutils literal"><span class="pre">metadata.xml</span></tt> files</li>
111 <li><tt class="docutils literal"><span class="pre">DIST</span></tt> for release tarballs recorded in the <tt class="docutils literal"><span class="pre">SRC_URI</span></tt> variable of an ebuild,
112 these were previously recorded in the digest files</li>
113 </ul>
114 <p>Future portage improvements might extend this list (for example with types
115 relevant for eclasses or profiles)</p>
116 </li>
117 <li><p class="first">Only have one line per file listing all information instead of one line per
118 file and checksum type</p>
119 </li>
120 <li><p class="first">Remove the separated digest-* files in the <tt class="docutils literal"><span class="pre">files/</span></tt> subdirectory</p>
121 </li>
122 </ul>
123 <p>Each line in the new format has the following format:</p>
124 <pre class="literal-block">
125 &lt;filetype&gt; &lt;filename&gt; &lt;filesize&gt; &lt;chksumtype1&gt; &lt;chksum1&gt; ... &lt;chksumtypen&gt; &lt;chksumn&gt;
126 </pre>
127 <p>However theses entries will be stored in the existing Manifest files.</p>
128 <p>An <a class="reference external" href="glep-0044-extras/manifest2-example.txt">actual example</a> <a class="footnote-reference" href="#id7" id="id8">[6]</a> for a (pure) Manifest2 file..</p>
129 <div class="section" id="compability-entries">
130 <h2><a class="toc-backref" href="#id12">Compability Entries</a></h2>
131 <p>To maintain compability with existing portage versions a transition period after
132 is the introduction of the Manifest2 format is required during which portage
133 will not only have to be capable of using existing Manifest and digest files but
134 also generate them in addition to the new entries.
135 Fortunately this can be accomplished by simply mixing old and new style entries
136 in one file for the Manifest files, existing portage versions will simply ignore
137 the new style entries. For the digest files there are no new entries to care
138 about.</p>
139 </div>
140 <div class="section" id="scope">
141 <h2><a class="toc-backref" href="#id13">Scope</a></h2>
142 <p>It is important to note that this proposal only deals with a change of the
143 format of the digest and Manifest system.</p>
144 <p>It does not expand the scope of it to cover eclasses, profiles or anything
145 else not already covered by the Manifest system, it also doesn't affect
146 the Manifest signing efforts in any way (though the implementations of both
147 might be coupled).</p>
148 <p>Also while multiple hash functions will become standard with the proposed
149 implementation they are not a specific feature of this format <a class="footnote-reference" href="#multi-hash-thread" id="id3">[2]</a>.</p>
150 </div>
151 <div class="section" id="number-of-hashes">
152 <h2><a class="toc-backref" href="#id14">Number of hashes</a></h2>
153 <p>While using multiple hashes for each file is a major feature of this proposal
154 we have to make sure that the number of hashes listed is limited to avoid
155 an explosion of the Manifest size that would revert the main benefit of this proposal
156 (reduzing tree size). Therefore the number of hashes that will be generated
157 will be limited to three different hash functions. For compability though we
158 have to rely on at least one hash function to always be present, this proposal
159 suggest to use SHA1 for this purpose (as it is supposed to be more secure than MD5
160 and currently only SHA1 and MD5 are directly available in python, also MD5 doesn't
161 have any benefit in terms of compability).</p>
162 </div>
163 </div>
164 <div class="section" id="rationale">
165 <h1><a class="toc-backref" href="#id15">Rationale</a></h1>
166 <p>The main goals of the proposal have been listed in the <a class="reference internal" href="#motivation">Motivation</a>, here now
167 the explanation why they are improvements and how the proposed format will
168 accomplish them.</p>
169 <div class="section" id="removal-of-digest-files">
170 <h2><a class="toc-backref" href="#id16">Removal of digest files</a></h2>
171 <p>Normal users that don't use a &quot;tuned&quot; filesystem for the portage tree are
172 wasting several dozen to a few hundred megabytes of disk space with the current
173 system, largely caused by the digest files.
174 This is due to the filesystem overhead present in most filesystem that
175 have a standard blocksize of four kilobytes while most digest files are under
176 one kilobyte in size, so this results in approximately a waste of three kilobytes
177 per digest file (likely even more). At the time of this writing the tree contains
178 roughly 22.000 digest files, so the overall waste caused by digest files is
179 estimated at about 70-100 megabytes.
180 Furthermore it is assumed that this will also reduce the disk space wasted by
181 the Manifest files as they now contain more content, but this hasn't been
182 verified yet.</p>
183 <p>By unifying the digest files with the Manifest these tiny files are eliminated
184 (in the long run), reducing the apparent tree size by about 20%, benefitting
185 both users and the Gentoo infrastructure.</p>
186 </div>
187 <div class="section" id="reducing-redundancy">
188 <h2><a class="toc-backref" href="#id17">Reducing redundancy</a></h2>
189 <p>When multiple hashes are used with the current system
190 both the filename and filesize are repeated for every checksum type used as each
191 checksum is standalone. However this doesn't add any functionality and is
192 therefore useless, so the new format removes this redundancy.
193 This is a theoretical improvement at this moment as only one hash function is in
194 use, but expected to change soon (see <a class="footnote-reference" href="#multi-hash-thread" id="id4">[2]</a>).</p>
195 </div>
196 <div class="section" id="removal-of-checksum-collisions">
197 <h2><a class="toc-backref" href="#id18">Removal of checksum collisions</a></h2>
198 <p>The current system theoretically allows for a <tt class="docutils literal"><span class="pre">DIST</span></tt> type file to be recorded
199 in multiple digest files with different sizes and/or checksums. In such a case
200 one version of a package would report a checksum violation while another one
201 would not. This could create confusion and uncertainity among users.
202 So far this case hasn't been observed, but it can't be ruled out with the
203 existing system.
204 As the new format lists each file exactly once this would be no longer possible.</p>
205 </div>
206 <div class="section" id="flexible-verification-system">
207 <h2><a class="toc-backref" href="#id19">Flexible verification system</a></h2>
208 <p>Right now portage verifies the checksum of every file listed in the Manifest
209 before using any file of the package and all <tt class="docutils literal"><span class="pre">DIST</span></tt> files of an ebuild
210 before using that ebuild. This is unnecessary in many cases:</p>
211 <ul class="simple">
212 <li>During the &quot;depend&quot; phase (when the ebuild metadata is generated) only
213 files of type <tt class="docutils literal"><span class="pre">EBUILD</span></tt> are used, so verifying the other types isn't
214 necessary. Theoretically it is possible for an ebuild to include other
215 files like those of type <tt class="docutils literal"><span class="pre">AUX</span></tt> at this phase, but that would be a
216 major QA violation and should never occur, so it can be ignored here.
217 It is also not a security concern as the ebuild is verified before parsing
218 it, so each manipulation would show up.</li>
219 <li>Generally files of type <tt class="docutils literal"><span class="pre">MISC</span></tt> don't need to be verified as they are
220 only used in very specific situations, aren't executed (just parsed at most)
221 and don't affect the package build process.</li>
222 <li>Files of type <tt class="docutils literal"><span class="pre">DIST</span></tt> only need to be verified directly after fetching and
223 before unpacking them (which often will be one step), not every time their
224 associated ebuild is used.</li>
225 </ul>
226 </div>
227 </div>
228 <div class="section" id="backwards-compatibility">
229 <h1><a class="toc-backref" href="#id20">Backwards Compatibility</a></h1>
230 <p>Switching the Manifest system is a task that will need a long transition period
231 like most changes affecting both portage and the tree. In this case the
232 implementation will be rolled out in several phases:</p>
233 <ol class="arabic simple">
234 <li>Add support for verification of Manifest2 entries in portage</li>
235 <li>Enable generation of Manifest2 entries in addition to the current system</li>
236 <li>Ignore digests during <tt class="docutils literal"><span class="pre">emerge</span> <span class="pre">--sync</span></tt> to get the size-benefit clientside.
237 This step may be ommitted if the following steps are expected to follow soon.</li>
238 <li>Disable generation of entries for the current system</li>
239 <li>Remove all traces of the current system from the tree (serverside)</li>
240 </ol>
241 <p>Each step has its own issues. While 1) and 2) can be implemented without any
242 compability problems all later steps have a major impact:</p>
243 <ul class="simple">
244 <li>Step 3) can only be implemented when the whole tree is Manifest2 ready
245 (ideally speaking, practically the requirement will be more like 95% coverage
246 with the expectation that for the remaining 5% either bugs will be filed after
247 step 3) is completed or they'll be updated at step 5).</li>
248 <li>Steps 4) and 5) will render all portage versions without Manifest2 support
249 basically useless (users would have to regenerate the digest and Manifest
250 for each package before being able to merge it), so this requires a almost
251 100% coverage of the userbase with Manifest2 capabale portage versions
252 (with step 1) completely implemented).</li>
253 </ul>
254 <p>Another problem is that some steps affect different targets:</p>
255 <ul class="simple">
256 <li>Steps 1) and 3) target portage versions used by users</li>
257 <li>Steps 2) and 4) target portage versions used by devs</li>
258 <li>Step 5) targets the portage tree on the cvs server</li>
259 </ul>
260 <p>While it is relatively easy to get all devs to use a new portage version this is
261 practically impossible with users as some don't update their systems regulary.
262 While six months are probably sufficient to reach a 95% coverage one year is
263 estimated to reach an almost-complete coverage. All times are relative to the
264 stable-marking of a compatible portage version.</p>
265 <p>No timeframe for implementation is presented here as it is highly dependent
266 on the completion of each step.</p>
267 <p>In summary it can be said that while a full conversion will take over a year
268 to be completed due to compability issues mentioned above some benefits of the
269 system can selectively be used as soon as step 2) is completed.</p>
270 </div>
271 <div class="section" id="other-problems">
272 <h1><a class="toc-backref" href="#id21">Other problems</a></h1>
273 <div class="section" id="impacts-on-infrastructure">
274 <h2><a class="toc-backref" href="#id22">Impacts on infrastructure</a></h2>
275 <p>While one long term goal of this proposal is to reduce the size of the tree
276 and therefore make life for the Gentoo Infrastructure easier this will only
277 take effect once the implementation is rolled out completely. In the meantime
278 however it will increase the tree size due to keeping checksums in both formats.
279 It's not possible to give a usable estimate on the degree of the increase as
280 it depends on many variables such as the exact implementation timeframe,
281 propagation of Manifest2 capable portage versions among devs or the update
282 rate of the tree. It has been suggested that Manifest files that are not gpg
283 signed could be mass converted in one step, this could certainly help but only
284 to some degree (according to a recent research <a class="footnote-reference" href="#gpg-numbers" id="id5">[3]</a> about 40% of
285 all Manifests in the tree are signed, but this number hasn't been verified).</p>
286 </div>
287 </div>
288 <div class="section" id="reference-implementation">
289 <h1><a class="toc-backref" href="#id23">Reference Implementation</a></h1>
290 <p>A patch for a prototype implementation of Manifest2 verification and partial
291 generation has been posted at <a class="footnote-reference" href="#manifest2-patch" id="id6">[4]</a>, it will be reworked before
292 being considered for inclusion in portage. However it shows that adding support
293 for verification is quite simple, but generation is a bit tricky and will
294 therefore be implemented later.</p>
295 </div>
296 <div class="section" id="options">
297 <h1><a class="toc-backref" href="#id24">Options</a></h1>
298 <p>Some things have been considered for this GLEP but aren't part of the proposal
299 yet for various reasons:</p>
300 <ul class="simple">
301 <li>timestamp field: the author has considered adding a timestamp field for
302 each entry to list the time the entry was created. However so far no practical
303 use for such a feature has been found.</li>
304 <li>convert size field into checksum: Another idea was to treat the size field
305 like any other checksum. But so far no real benefit (other than a slightly
306 more modular implementation) for this has been seen while it has several
307 drawbacks: For once, unlike checksums, the size field is definitely required
308 for all <tt class="docutils literal"><span class="pre">DIST</span></tt> files, also it would slightly increase the length of
309 each entry by adding a <tt class="docutils literal"><span class="pre">SIZE</span></tt> keyword.</li>
310 <li>removal of the <tt class="docutils literal"><span class="pre">MISC</span></tt> type: It has been suggested to completely drop
311 entries of type <tt class="docutils literal"><span class="pre">MISC</span></tt>. This would result in a minor space reduction
312 (its rather unlikely to free any blocks) but completely remove the ability
313 to check these files for integrity. While they don't influence portage
314 or packages directly they can contain viable information for users, so
315 the author has the opinion that at least the option for integrity checks
316 should be kept.</li>
317 </ul>
318 </div>
319 <div class="section" id="credits">
320 <h1><a class="toc-backref" href="#id25">Credits</a></h1>
321 <p>Thanks to the following persons for their input on or related to this GLEP
322 (even though they might not have known it):
323 Ned Ludd (solar), Brian Harring (ferringb), Jason Stubbs (jstubbs),
324 Robin H. Johnson (robbat2), Aron Griffis (agriffis)</p>
325 <p>Also thanks to Nicholas Jones (carpaski) to make the current Manifest system
326 resistent enough to be able to handle this change without too many transition
327 problems.</p>
328 </div>
329 <div class="section" id="references">
330 <h1><a class="toc-backref" href="#id26">References</a></h1>
331 <table class="docutils footnote" frame="void" id="reorg-thread" rules="none">
332 <colgroup><col class="label" /><col /></colgroup>
333 <tbody valign="top">
334 <tr><td class="label"><a class="fn-backref" href="#id1">[1]</a></td><td><a class="reference external" href="http://thread.gmane.org/gmane.linux.gentoo.devel/21920">http://thread.gmane.org/gmane.linux.gentoo.devel/21920</a></td></tr>
335 </tbody>
336 </table>
337 <table class="docutils footnote" frame="void" id="multi-hash-thread" rules="none">
338 <colgroup><col class="label" /><col /></colgroup>
339 <tbody valign="top">
340 <tr><td class="label">[2]</td><td><em>(<a class="fn-backref" href="#id3">1</a>, <a class="fn-backref" href="#id4">2</a>)</em> <a class="reference external" href="http://thread.gmane.org/gmane.linux.gentoo.devel/33434">http://thread.gmane.org/gmane.linux.gentoo.devel/33434</a></td></tr>
341 </tbody>
342 </table>
343 <table class="docutils footnote" frame="void" id="gpg-numbers" rules="none">
344 <colgroup><col class="label" /><col /></colgroup>
345 <tbody valign="top">
346 <tr><td class="label"><a class="fn-backref" href="#id5">[3]</a></td><td>gentoo-core mailing list, topic &quot;Gentoo key signing practices
347 and official Gentoo keyring&quot;, Message-ID &lt;<a class="reference external" href="mailto:20051117075838.GB15734&#64;curie-int.vc.shawcable.net">20051117075838.GB15734&#64;curie-int.vc.shawcable.net</a>&gt;</td></tr>
348 </tbody>
349 </table>
350 <table class="docutils footnote" frame="void" id="manifest2-patch" rules="none">
351 <colgroup><col class="label" /><col /></colgroup>
352 <tbody valign="top">
353 <tr><td class="label"><a class="fn-backref" href="#id6">[4]</a></td><td><a class="reference external" href="http://thread.gmane.org/gmane.linux.gentoo.portage.devel/1374">http://thread.gmane.org/gmane.linux.gentoo.portage.devel/1374</a></td></tr>
354 </tbody>
355 </table>
356 <table class="docutils footnote" frame="void" id="manifest2-example" rules="none">
357 <colgroup><col class="label" /><col /></colgroup>
358 <tbody valign="top">
359 <tr><td class="label">[5]</td><td><a class="reference external" href="http://www.gentoo.org/proj/en/glep/glep-0044-extras/manifest2-example">http://www.gentoo.org/proj/en/glep/glep-0044-extras/manifest2-example</a></td></tr>
360 </tbody>
361 </table>
362 <table class="docutils footnote" frame="void" id="id7" rules="none">
363 <colgroup><col class="label" /><col /></colgroup>
364 <tbody valign="top">
365 <tr><td class="label"><a class="fn-backref" href="#id8">[6]</a></td><td><a class="reference external" href="glep-0044-extras/manifest2-example.txt">glep-0044-extras/manifest2-example.txt</a></td></tr>
366 </tbody>
367 </table>
368 </div>
369 <div class="section" id="copyright">
370 <h1><a class="toc-backref" href="#id27">Copyright</a></h1>
371 <p>This document has been placed in the public domain.</p>
372 </div>
373
374 </div>
375 <div class="footer">
376 <hr class="footer" />
377 <a class="reference external" href="glep-0044.txt">View document source</a>.
378 Generated on: 2009-01-11 19:35 UTC.
379 Generated by <a class="reference external" href="http://docutils.sourceforge.net/">Docutils</a> from <a class="reference external" href="http://docutils.sourceforge.net/rst.html">reStructuredText</a> source.
380
381 </div>
382 </body>
383 </html>

  ViewVC Help
Powered by ViewVC 1.1.20