/[gentoo]/xml/htdocs/doc/en/utf-8.xml
Gentoo

Diff of /xml/htdocs/doc/en/utf-8.xml

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

Revision 1.11 Revision 1.12
1<?xml version='1.0' encoding="UTF-8"?> 1<?xml version='1.0' encoding="UTF-8"?>
2<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/utf-8.xml,v 1.11 2005/04/24 12:18:59 bennyc Exp $ --> 2<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/utf-8.xml,v 1.12 2005/04/24 14:11:51 bennyc Exp $ -->
3<!DOCTYPE guide SYSTEM "/dtd/guide.dtd"> 3<!DOCTYPE guide SYSTEM "/dtd/guide.dtd">
4 4
5<guide link="/doc/en/utf-8.xml"> 5<guide link="/doc/en/utf-8.xml">
6<title>Using UTF-8 with Gentoo</title> 6<title>Using UTF-8 with Gentoo</title>
7 7
8<author title="Author"> 8<author title="Author">
9 <mail link="slarti@gentoo.org">Thomas Martin</mail> 9 <mail link="slarti@gentoo.org">Thomas Martin</mail>
10</author> 10</author>
11<author title="Contributor"> 11<author title="Contributor">
12 <mail link="devil@gentoo.org.ua">Alexander Simonov</mail> 12 <mail link="devil@gentoo.org.ua">Alexander Simonov</mail>
13</author> 13</author>
14 14
15<abstract> 15<abstract>
16This guide shows you how to set up and use the UTF-8 Unicode character set with 16This guide shows you how to set up and use the UTF-8 Unicode character set with
17your Gentoo Linux system, after explaining the benefits of Unicode and more 17your Gentoo Linux system, after explaining the benefits of Unicode and more
180file though, luckily, the usage of this file is well documented in the comments 180file though, luckily, the usage of this file is well documented in the comments
181within it. It is also explained in the <uri 181within it. It is also explained in the <uri
182link="/doc/en/guide-localization.xml#doc_chap3_sect3"> Gentoo Localisation 182link="/doc/en/guide-localization.xml#doc_chap3_sect3"> Gentoo Localisation
183Guide</uri>. 183Guide</uri>.
184</p> 184</p>
185 185
186<p> 186<p>
187Next, we'll need to decide whether a UTF-8 locale is already available for our 187Next, we'll need to decide whether a UTF-8 locale is already available for our
188language, or whether we need to create one. 188language, or whether we need to create one.
189</p> 189</p>
190 190
191<pre caption="Checking for an existing UTF-8 locale"> 191<pre caption="Checking for an existing UTF-8 locale">
192<comment>(Replace "en_GB" with your desired locale setting)</comment> 192<comment>(Replace "en_GB" with your desired locale setting)</comment>
193# <i>locale -a | grep 'en_GB'</i> 193# <i>locale -a | grep 'en_GB'</i>
194en_GB 194en_GB
195en_GB.utf8 195en_GB.UTF-8
196</pre> 196</pre>
197 197
198<p> 198<p>
199From the output of this command line, we need to take the result with a suffix 199From the output of this command line, we need to take the result with a suffix
200similar to <c>.utf8</c>. If there is no result with a suffix similar to 200similar to <c>.UTF-8</c>. If there is no result with a suffix similar to
201<c>.utf8</c>, we need to create a UTF-8 compatible locale. 201<c>.UTF-8</c>, we need to create a UTF-8 compatible locale.
202</p> 202</p>
203 203
204<note> 204<note>
205Only execute the following code listing if you do not have a UTF-8 locale 205Only execute the following code listing if you do not have a UTF-8 locale
206available for your language. 206available for your language.
207</note> 207</note>
208 208
209<pre caption="Creating a UTF-8 locale"> 209<pre caption="Creating a UTF-8 locale">
210<comment>(Replace "en_GB" with your desired locale setting)</comment> 210<comment>(Replace "en_GB" with your desired locale setting)</comment>
211# <i>localedef -i en_GB -f UTF-8 en_GB.utf8</i> 211# <i>localedef -i en_GB -f UTF-8 en_GB.UTF-8</i>
212</pre> 212</pre>
213 213
214<p> 214<p>
215Another way to include a UTF-8 locale is to add it to the 215Another way to include a UTF-8 locale is to add it to the
216<path>/etc/locales.build</path> file and rebuild <c>glibc</c> with the 216<path>/etc/locales.build</path> file and rebuild <c>glibc</c> with the
217<c>userlocales</c> USE flag set. 217<c>userlocales</c> USE flag set.
218</p> 218</p>
219 219
220<pre caption="Line in /etc/locales.build"> 220<pre caption="Line in /etc/locales.build">
221en_GB.UTF-8/UTF-8 221en_GB.UTF-8/UTF-8
222</pre> 222</pre>
223 223
224</body> 224</body>
225</section> 225</section>
226<section> 226<section>
227<title>Setting the Locale</title> 227<title>Setting the Locale</title>
228<body> 228<body>
229 229
230<p> 230<p>
231Although by now you might be determined to use UTF-8 system wide, the author 231There are two environment variables that need to be set in order to use
232does not recommend setting UTF-8 for the root user. Instead, it is best to set 232our new UTF-8 locales: <c>LANG</c> and <c>LC_ALL</c>. There are also
233the locale in your user's <path>~/.profile</path> (or, if you are using a C 233many different ways to set them; some people prefer to only have a UTF-8
234shell, <path>~/.login</path>). 234environment for a specific user, in which case they set them in their
235</p> 235<path>~/.profile</path> or <path>~/.bashrc</path>. Others prefer to set the
236 236locale globally. One specific circumstance where the author particularly
237<note> 237recommends doing this is when <path>/etc/init.d/xdm</path> is in use, because
238If you are not sure which file to use, use <path>~/.profile</path>. Also, if 238this init script starts the display manager and desktop before any of the
239you are unsure which code listing to use, use the Bourne version. 239aforementioned shell startup files are sourced, and so before any of the
240</note> 240variables are in the environment.
241
242<pre caption="Setting the locale with environment variables (Bourne version)">
243export LANG="en_GB.utf8"
244export LC_ALL="en_GB.utf8"
245</pre>
246
247<pre caption="Setting the locale with environment variables (C shell version)">
248setenv LANG "en_GB.utf8"
249setenv LC_ALL "en_GB.utf8"
250</pre>
251
252<p> 241</p>
253Now, logout and back in to apply the change. We want these environment 242
254variables in our entire environment, so it is best to logout and back in, or at 243<p>
255the very least to source <path>~/.profile</path> or <path>~/.login</path> in 244Setting the locale globally should be done using
256the console from which you have started other processes. 245<path>/etc/env.d/02local</path>. The file should look something like the
246following:
247</p>
248
249<pre caption="Demonstration /etc/env.d/02locale">
250<comment>(As always, change "en_GB.UTF-8" to your locale)</comment>
251LC_ALL="en_GB.UTF-8"
252LOCALE="en_GB.UTF-8"
253</pre>
254
255<p>
256Next, the environment must be updated with the change.
257</p>
258
259<pre caption="Updating the environment">
260# <i>env-update</i>
261>>> Regenerating /etc/ld.so.cache...
262 * Caching service dependencies ...
263 # <i>source /etc/profile</i>
264</pre>
265
266<p>
267Now, run <c>locale</c> with no arguments to see if we have the correct
268variables in our environment:
269</p>
270
271<pre caption="Checking if our new locale is in the environment">
272# <i>locale</i>
273LANG=en_GB.UTF-8
274LC_CTYPE="en_GB.UTF-8"
275LC_NUMERIC="en_GB.UTF-8"
276LC_TIME="en_GB.UTF-8"
277LC_COLLATE="en_GB.UTF-8"
278LC_MONETARY="en_GB.UTF-8"
279LC_MESSAGES="en_GB.UTF-8"
280LC_PAPER="en_GB.UTF-8"
281LC_NAME="en_GB.UTF-8"
282LC_ADDRESS="en_GB.UTF-8"
283LC_TELEPHONE="en_GB.UTF-8"
284LC_MEASUREMENT="en_GB.UTF-8"
285LC_IDENTIFICATION="en_GB.UTF-8"
286LC_ALL=en_GB.UTF-8
287</pre>
288
289<p>
290That's everything. You are now using UTF-8 locales, and the next hurdle is the
291configuration of the applications you use from day to day.
257</p> 292</p>
258 293
259</body> 294</body>
260</section> 295</section>
261</chapter> 296</chapter>
262 297
263<chapter> 298<chapter>
264<title>Application Support</title> 299<title>Application Support</title>
265<section> 300<section>
266<body> 301<body>
267 302
268<p> 303<p>
269When Unicode first started gaining momentum in the software world, multibyte 304When Unicode first started gaining momentum in the software world, multibyte
270character sets were not well suited to languages like C, in which many of the 305character sets were not well suited to languages like C, in which many of the
271day-to-day programs people use are written. Even today, some programs are not 306day-to-day programs people use are written. Even today, some programs are not
659When I press AltGr and ; at once, 'á' is produced, and when I press AltGr and ; 694When I press AltGr and ; at once, 'á' is produced, and when I press AltGr and ;
660at once, release them, and then press e, 'é' is produced. 695at once, release them, and then press e, 'é' is produced.
661</p> 696</p>
662 697
663<p> 698<p>
664By pressing AltGr, Shift and [ at once, releasing them, and then pressing a, a 699By pressing AltGr, Shift and [ at once, releasing them, and then pressing a, a
665Scandinavian 'å' is produced. Similarly, when I press AltGr, Shift and [ at 700Scandinavian 'å' is produced. Similarly, when I press AltGr, Shift and [ at
666once, release <e>only</e> the [, and then press it again, '˚' is produced. 701once, release <e>only</e> the [, and then press it again, '˚' is produced.
667Although it looks like one, this (U+02DA) is not the same as a degree symbol 702Although it looks like one, this (U+02DA) is not the same as a degree symbol
668(U+00B0). This works for other accents produced by dead keys — AltGr and [, 703(U+00B0). This works for other accents produced by dead keys — AltGr and [,
669releasing only the [, then pressing it again makes '¨'. 704releasing only the [, then pressing it again makes '¨'.
670</p> 705</p>
671 706
672<p> 707<p>
673AltGr can be used with alphabetical keys alone. For example, AltGr and m, a 708AltGr can be used with alphabetical keys alone. For example, AltGr and m, a
674Greek lower-case letter mu is produced: 'µ'. 709Greek lower-case letter mu is produced: 'µ'. AltGr and s produce a
710scharfes s or esszet: 'ß'. As many European users would expect (because
711it is marked on their keyboard), AltGr and 4 produces a Euro sign, '€'.
675</p> 712</p>
676 713
677</body> 714</body>
678</section> 715</section>
679<section> 716<section>
680<title>Resources</title> 717<title>Resources</title>
681<body> 718<body>
682 719
683<ul> 720<ul>
684 <li> 721 <li>
685 <uri link="http://www.wikipedia.com/wiki/Unicode">The Wikipedia entry for 722 <uri link="http://www.wikipedia.com/wiki/Unicode">The Wikipedia entry for
686 Unicode</uri> 723 Unicode</uri>
687 </li> 724 </li>
688 <li> 725 <li>
689 <uri link="http://www.wikipedia.com/wiki/UTF-8">The Wikipedia entry for 726 <uri link="http://www.wikipedia.com/wiki/UTF-8">The Wikipedia entry for

Legend:
Removed from v.1.11  
changed lines
  Added in v.1.12

  ViewVC Help
Powered by ViewVC 1.1.20