/[gentoo]/xml/htdocs/proj/en/glep/glep-0031.txt
Gentoo

Contents of /xml/htdocs/proj/en/glep/glep-0031.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.5 - (hide annotations) (download)
Mon Nov 7 22:26:59 2005 UTC (9 years ago) by ciaranm
Branch: MAIN
Changes since 1.4: +4 -4 lines
File MIME type: text/plain
Fix header typos, GLEP 1 compliance

1 g2boojum 1.1 GLEP: 31
2     Title: Character Sets for Portage Tree Items
3 ciaranm 1.5 Version: $Revision: 1.4 $
4 g2boojum 1.1 Author: Ciaran McCreesh <ciaranm@gentoo.org>
5 ciaranm 1.5 Last-Modified: $Date: 2005/10/30 21:35:50 $
6 g2boojum 1.3 Status: Approved
7 g2boojum 1.1 Type: Standards Track
8     Content-Type: text/x-rst
9 ciaranm 1.5 Created: 27-Oct-2004
10     Post-History: 28-Oct-2004, 1-Nov-2004, 11-Nov-2004
11 g2boojum 1.1
12     Abstract
13     ========
14    
15 g2boojum 1.2 A set of guidelines regarding what characters are permissible in the
16     portage tree and how they should be encoded is required.
17 g2boojum 1.1
18 g2boojum 1.3 Status
19     ======
20    
21     Approved on 8-Nov-2004 assuming that implementation will include
22     documentation for correctly encoding files within nano.
23    
24 g2boojum 1.1 Motivation
25     ==========
26    
27     At present we have several developers and many more users whose names
28     require characters (for example, accents) which are not part of the
29     standard 'safe' 0..127 ASCII range. There is no current standard on how
30     these should be represented, leading to inconsistency across the tree.
31    
32 g2boojum 1.2 Although the issues involved have been discussed informally many times, no
33 g2boojum 1.1 official decision has been made.
34    
35     Specification
36     =============
37    
38     ChangeLog and Metadata Character Sets
39     -------------------------------------
40    
41     It is proposed that UTF-8 ([1]_) is used for encoding ChangeLog and
42     metadata.xml files inside the portage tree.
43    
44     UTF-8 allows the full range of Unicode ([2]_) characters to be expressed,
45     which is necessary given the diversity of the Gentoo developer- and
46     user-base. It is character-compatible with ASCII for the 0..127
47     characters and does not significantly increase the storage requirements
48     for files which consist mainly of American English characters. It is
49     widely supported, widely used and an official standard.
50    
51     The ISO-8859-* character sets ([3]_) would *not* be appropriate since they
52     cannot express the full range of required characters.
53    
54     Ebuild and Eclass Character Sets
55     --------------------------------
56    
57     For the same reasons as previously, it is proposed that UTF-8 is used as
58     the official encoding for ebuild and eclass files.
59    
60 g2boojum 1.2 However, developers should be warned that any code which is parsed by bash
61     (in other words, non-comments), and any output which is echoed to the
62     screen (for example, einfo messages) or given to portage (for example any
63     of the standard global variables) must not use anything outside the
64 g2boojum 1.1 regular ASCII 0..127 range for compatibility purposes.
65    
66     files/ Entries Character Sets
67     -----------------------------
68    
69     Patches must clearly be in the same character set as the file they are
70     patching. For other files/ entries (for example, GNOME desktop files),
71     consistency with the upstream-recommended character set is most sensible.
72    
73     Suitable Characters for File and Directory Names
74     ------------------------------------------------
75    
76     Characters outside the ASCII 0..127 range cannot safely be used for file
77     or directory names. (Of course, not all characters inside the ASCII 0..127
78     range can be used safely either.)
79    
80     Backwards Compatibility
81     =======================
82    
83     The existing tree uses a mixture of encodings. It would be straightforward
84     to fix existing ChangeLogs and metadata files to use UTF-8.
85    
86     The ``echangelog`` tool is character-set agnostic. In order to properly
87     enter UTF-8, developers would have to switch to a UTF-8 shell session.
88     This only applies if the developer is entering new text which uses 'fancy'
89     characters -- existing characters are not mangled.
90    
91     Certain text editors are incapable of handling UTF-8 cleanly. However,
92     since the ``echangelog`` tool is generally the correct way to generate
93     ChangeLog entries, this should not be a major problem. Generating
94     metadata.xml files correctly in these editors could become problematic.
95 g2boojum 1.2 The ``vim`` and ``emacs`` editors, which appear to be most widely used,
96     are both capable of handling UTF-8 cleanly -- for vim, this could be
97     configured automatically via the ``gentoo-syntax`` ([4]_) package.
98 g2boojum 1.1
99     References
100     ==========
101    
102     .. [1] RFC 3629: UTF-8, a transformation format of ISO 10646
103     http://www.ietf.org/rfc/rfc3629.txt
104     .. [2] ISO/IEC 10646 (Universal Multiple-Octet Coded Character Set)
105     .. [3] ISO/IEC 8859 (8-bit single-byte coded graphic character sets)
106 g2boojum 1.2 .. [4] The app-vim/gentoo-syntax package,
107     https://developer.berlios.de/projects/gentoo-syntax/
108 g2boojum 1.1
109     Copyright
110     =========
111    
112     This document has been placed in the public domain.
113    
114 ciaranm 1.4 .. vim: set tw=74 fileencoding=utf-8 :

  ViewVC Help
Powered by ViewVC 1.1.20