--- xml/htdocs/proj/en/glep/glep-0025.html 2004/04/04 22:56:06 1.1 +++ xml/htdocs/proj/en/glep/glep-0025.html 2004/11/11 21:34:45 1.2 @@ -8,7 +8,7 @@ -->- +
|Title:||Distfile Patching Support|
|Author:||Brian Harring <ferringb at gentoo.org>|
The intention of this GLEP is to propose the creation of patching support for portage, and iron out the implementation details.
Reduce the bandwidth load placed on our mirrors by decreasing the amount of bytes transferred when upgrading between versions. Side benefit of this is to significantly decrease the download requirements for users lacking broadband.
Most people are familiar with diff patches (unified diff for example)- this glep is specifically proposing the use of an actual binary differencer. The reason for this is that diff patches are line based- you change a single @@ -115,18 +120,18 @@ standard diffs.
The difference between source releases typically isn't very large, especially for minor releases. As an example, kdelibs-3.1.4.tar.bz2 is 10.53 MB, and kdelibs-3.1.5.tar.bz2 is 10.54 MB. A bzip2'ed patch between those versions is 75.6 kb , less then 1% the size of 3.1.5's tbz2.
Quite a few sections of gentoo are affected- mirroring, the portage tree, and portage itself.
For adding patch info into the tree, this glep proposes a global patch list (stored in profiles as patches.global), and individual patch lists stored in relevant package directories (named patches). Using the kernel packages as an @@ -155,11 +160,11 @@ in reverse (originally explained in Binary patches vs GNUDiff patches).
This glep proposes the patching support should be (at this stage) optional- specifically, enabled via FEATURES="patching".
When patching is enabled, the global patch list is read, and the packages patch list is read. From there, portage determines what files could be used as a base for patching to the desired file- further, determining if it's @@ -168,7 +173,7 @@ and md5 verified.
Upon fetching and md5 verification of patch(es), the desired file is reconstructed. Assuming reconstruction didn't return any errors, the target file has its uncompressed md5sum calculated and verified, then is recompressed @@ -180,13 +185,13 @@ (and the issue it addresses) follow.
There will be instances where a file is reconstructed perfectly, recompressed, and the recompressed md5sum differs from what is stored in the tree- the problem is that the md5sum of a compressed file is inherently tied to the compressor version/options used to compress the original source.
A good example of this problem is related to bzip2 versions used for compression. Between bzip2 0.9x and bzip2 1.x, there was a subtle change in the compressor resulting in a slightly better compression result- end result @@ -204,7 +209,7 @@ source's md5 has already been verified.
One issue of contention is where these files will actually be stored. As of the writing of this glep, a full distfiles mirror is roughly around 40 gb- a rough estimate by the author places the space requirements for patches for @@ -271,7 +276,7 @@ greatly appreciated.
As noted in The Proposed Solution, a system using patching and sharing out it's distfiles must share out it's alternate md5 db. Any system that uses the distfiles share must support the alternate md5 db also. If this is considered @@ -299,11 +304,11 @@ compatability issues, depending on what solution is accepted.