--- xml/htdocs/proj/en/glep/glep-0031.html 2005/11/07 22:26:59 1.4 +++ xml/htdocs/proj/en/glep/glep-0031.html 2006/10/10 20:25:14 1.5 @@ -8,9 +8,252 @@ --> - + GLEP 31 -- Character Sets for Portage Tree Items - + @@ -32,17 +275,17 @@ - + - + - + @@ -51,8 +294,8 @@
Title:Character Sets for Portage Tree Items
Version:1.4
Version:1.5
Author:Ciaran McCreesh <ciaranm at gentoo.org>
Last-Modified:2005/10/30 21:35:50
Last-Modified:2005/11/07 22:26:59
Status:Approved
Type:Standards Track
Content-Type:text/x-rst
Content-Type:text/x-rst
Created:27-Oct-2004

-
-

Contents

+
+

Contents

-
-

Abstract

+
+

Abstract

A set of guidelines regarding what characters are permissible in the portage tree and how they should be encoded is required.

-
-

Status

-

Approved on 8-Nov-2004 assuming that implementation will include +

+

Status

+

Approved on 8-Nov-2004 assuming that implementation will include documentation for correctly encoding files within nano.

-
-

Motivation

+
+

Motivation

At present we have several developers and many more users whose names require characters (for example, accents) which are not part of the standard 'safe' 0..127 ASCII range. There is no current standard on how @@ -88,10 +331,10 @@

Although the issues involved have been discussed informally many times, no official decision has been made.

-
-

Specification

-
-

ChangeLog and Metadata Character Sets

+
+

Specification

+
+

ChangeLog and Metadata Character Sets

It is proposed that UTF-8 ([1]) is used for encoding ChangeLog and metadata.xml files inside the portage tree.

UTF-8 allows the full range of Unicode ([2]) characters to be expressed, @@ -103,8 +346,8 @@

The ISO-8859-* character sets ([3]) would not be appropriate since they cannot express the full range of required characters.

-
-

Ebuild and Eclass Character Sets

+
+

Ebuild and Eclass Character Sets

For the same reasons as previously, it is proposed that UTF-8 is used as the official encoding for ebuild and eclass files.

However, developers should be warned that any code which is parsed by bash @@ -113,21 +356,21 @@ of the standard global variables) must not use anything outside the regular ASCII 0..127 range for compatibility purposes.

-
-

files/ Entries Character Sets

+
+

files/ Entries Character Sets

Patches must clearly be in the same character set as the file they are patching. For other files/ entries (for example, GNOME desktop files), consistency with the upstream-recommended character set is most sensible.

-
-

Suitable Characters for File and Directory Names

+
+

Suitable Characters for File and Directory Names

Characters outside the ASCII 0..127 range cannot safely be used for file or directory names. (Of course, not all characters inside the ASCII 0..127 range can be used safely either.)

-
-

Backwards Compatibility

+
+

Backwards Compatibility

The existing tree uses a mixture of encodings. It would be straightforward to fix existing ChangeLogs and metadata files to use UTF-8.

The echangelog tool is character-set agnostic. In order to properly @@ -142,8 +385,8 @@ are both capable of handling UTF-8 cleanly -- for vim, this could be configured automatically via the gentoo-syntax ([4]) package.

-
-

References

+
+

References

@@ -171,8 +414,8 @@
-