/[gentoo]/xml/htdocs/doc/en/utf-8.xml
Gentoo

Diff of /xml/htdocs/doc/en/utf-8.xml

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

Revision 1.11 Revision 1.27
1<?xml version='1.0' encoding="UTF-8"?> 1<?xml version='1.0' encoding="UTF-8"?>
2<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/utf-8.xml,v 1.11 2005/04/24 12:18:59 bennyc Exp $ --> 2<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/utf-8.xml,v 1.27 2005/07/02 11:55:16 swift Exp $ -->
3<!DOCTYPE guide SYSTEM "/dtd/guide.dtd"> 3<!DOCTYPE guide SYSTEM "/dtd/guide.dtd">
4 4
5<guide link="/doc/en/utf-8.xml"> 5<guide link="/doc/en/utf-8.xml">
6<title>Using UTF-8 with Gentoo</title> 6<title>Using UTF-8 with Gentoo</title>
7 7
9 <mail link="slarti@gentoo.org">Thomas Martin</mail> 9 <mail link="slarti@gentoo.org">Thomas Martin</mail>
10</author> 10</author>
11<author title="Contributor"> 11<author title="Contributor">
12 <mail link="devil@gentoo.org.ua">Alexander Simonov</mail> 12 <mail link="devil@gentoo.org.ua">Alexander Simonov</mail>
13</author> 13</author>
14<author title="Editor">
15 <mail link="fox2mike@gentoo.org">Shyam Mani</mail>
16</author>
14 17
15<abstract> 18<abstract>
16This guide shows you how to set up and use the UTF-8 Unicode character set with 19This guide shows you how to set up and use the UTF-8 Unicode character set with
17your Gentoo Linux system, after explaining the benefits of Unicode and more 20your Gentoo Linux system, after explaining the benefits of Unicode and more
18specifically UTF-8. 21specifically UTF-8.
19</abstract> 22</abstract>
20 23
24<!-- The content of this document is licensed under the CC-BY-SA license -->
25<!-- See http://creativecommons.org/licenses/by-sa/2.5 -->
21<license /> 26<license />
22 27
23<version>1.8</version> 28<version>2.7</version>
24<date>2005-04-05</date> 29<date>2005-07-02</date>
25 30
26<chapter> 31<chapter>
27<title>Character Encodings</title> 32<title>Character Encodings</title>
28<section> 33<section>
29<title>What is a Character Encoding?</title> 34<title>What is a Character Encoding?</title>
190 195
191<pre caption="Checking for an existing UTF-8 locale"> 196<pre caption="Checking for an existing UTF-8 locale">
192<comment>(Replace "en_GB" with your desired locale setting)</comment> 197<comment>(Replace "en_GB" with your desired locale setting)</comment>
193# <i>locale -a | grep 'en_GB'</i> 198# <i>locale -a | grep 'en_GB'</i>
194en_GB 199en_GB
195en_GB.utf8 200en_GB.UTF-8
196</pre> 201</pre>
197 202
198<p> 203<p>
199From the output of this command line, we need to take the result with a suffix 204From the output of this command line, we need to take the result with a suffix
200similar to <c>.utf8</c>. If there is no result with a suffix similar to 205similar to <c>.UTF-8</c>. If there is no result with a suffix similar to
201<c>.utf8</c>, we need to create a UTF-8 compatible locale. 206<c>.UTF-8</c>, we need to create a UTF-8 compatible locale.
202</p> 207</p>
203 208
204<note> 209<note>
205Only execute the following code listing if you do not have a UTF-8 locale 210Only execute the following code listing if you do not have a UTF-8 locale
206available for your language. 211available for your language.
207</note> 212</note>
208 213
209<pre caption="Creating a UTF-8 locale"> 214<pre caption="Creating a UTF-8 locale">
210<comment>(Replace "en_GB" with your desired locale setting)</comment> 215<comment>(Replace "en_GB" with your desired locale setting)</comment>
211# <i>localedef -i en_GB -f UTF-8 en_GB.utf8</i> 216# <i>localedef -i en_GB -f UTF-8 en_GB.UTF-8</i>
212</pre> 217</pre>
213 218
214<p> 219<p>
215Another way to include a UTF-8 locale is to add it to the 220Another way to include a UTF-8 locale is to add it to the
216<path>/etc/locales.build</path> file and rebuild <c>glibc</c> with the 221<path>/etc/locales.build</path> file and rebuild <c>glibc</c> with the
226<section> 231<section>
227<title>Setting the Locale</title> 232<title>Setting the Locale</title>
228<body> 233<body>
229 234
230<p> 235<p>
231Although by now you might be determined to use UTF-8 system wide, the author 236There is one environment variable that needs to be set in order to use
232does not recommend setting UTF-8 for the root user. Instead, it is best to set 237our new UTF-8 locales: <c>LC_ALL</c> (this variable overrides the <c>LANG</c>
233the locale in your user's <path>~/.profile</path> (or, if you are using a C 238setting as well). There are also many different ways to set it; some people
234shell, <path>~/.login</path>). 239prefer to only have a UTF-8 environment for a specific user, in which case
235</p> 240they set them in their <path>~/.profile</path> (if you use <c>/bin/sh</c>),
236 241<path>~/.bash_profile</path> or <path>~/.bashrc</path> (if you use
237<note> 242<c>/bin/bash</c>).
238If you are not sure which file to use, use <path>~/.profile</path>. Also, if
239you are unsure which code listing to use, use the Bourne version.
240</note>
241
242<pre caption="Setting the locale with environment variables (Bourne version)">
243export LANG="en_GB.utf8"
244export LC_ALL="en_GB.utf8"
245</pre>
246
247<pre caption="Setting the locale with environment variables (C shell version)">
248setenv LANG "en_GB.utf8"
249setenv LC_ALL "en_GB.utf8"
250</pre>
251
252<p> 243</p>
253Now, logout and back in to apply the change. We want these environment 244
254variables in our entire environment, so it is best to logout and back in, or at 245<p>
255the very least to source <path>~/.profile</path> or <path>~/.login</path> in 246Others prefer to set the locale globally. One specific circumstance where
256the console from which you have started other processes. 247the author particularly recommends doing this is when
248<path>/etc/init.d/xdm</path> is in use, because
249this init script starts the display manager and desktop before any of the
250aforementioned shell startup files are sourced, and so before any of the
251variables are in the environment.
252</p>
253
254<p>
255Setting the locale globally should be done using
256<path>/etc/env.d/02locale</path>. The file should look something like the
257following:
258</p>
259
260<pre caption="Demonstration /etc/env.d/02locale">
261<comment>(As always, change "en_GB.UTF-8" to your locale)</comment>
262LC_ALL="en_GB.UTF-8"
263</pre>
264
265<p>
266Next, the environment must be updated with the change.
267</p>
268
269<pre caption="Updating the environment">
270# <i>env-update</i>
271>>> Regenerating /etc/ld.so.cache...
272 * Caching service dependencies ...
273# <i>source /etc/profile</i>
274</pre>
275
276<p>
277Now, run <c>locale</c> with no arguments to see if we have the correct
278variables in our environment:
279</p>
280
281<pre caption="Checking if our new locale is in the environment">
282# <i>locale</i>
283LANG=
284LC_CTYPE="en_GB.UTF-8"
285LC_NUMERIC="en_GB.UTF-8"
286LC_TIME="en_GB.UTF-8"
287LC_COLLATE="en_GB.UTF-8"
288LC_MONETARY="en_GB.UTF-8"
289LC_MESSAGES="en_GB.UTF-8"
290LC_PAPER="en_GB.UTF-8"
291LC_NAME="en_GB.UTF-8"
292LC_ADDRESS="en_GB.UTF-8"
293LC_TELEPHONE="en_GB.UTF-8"
294LC_MEASUREMENT="en_GB.UTF-8"
295LC_IDENTIFICATION="en_GB.UTF-8"
296LC_ALL=en_GB.UTF-8
297</pre>
298
299<p>
300That's everything. You are now using UTF-8 locales, and the next hurdle is the
301configuration of the applications you use from day to day.
257</p> 302</p>
258 303
259</body> 304</body>
260</section> 305</section>
261</chapter> 306</chapter>
342making the most of Unicode. 387making the most of Unicode.
343</p> 388</p>
344 389
345<p> 390<p>
346The <c>KEYMAP</c> variable, set in <path>/etc/conf.d/keymaps</path>, should 391The <c>KEYMAP</c> variable, set in <path>/etc/conf.d/keymaps</path>, should
347have a Unicode keymap specified. To do this, simply prepend the keymap already 392have a Unicode keymap specified.
348specified there with -u.
349</p> 393</p>
350 394
351<pre caption="Example /etc/conf.d/keymaps snippet"> 395<pre caption="Example /etc/conf.d/keymaps snippet">
352<comment>(Change "uk" to your local layout)</comment> 396<comment>(Change "uk" to your local layout)</comment>
353KEYMAP="-u uk" 397KEYMAP="uk"
354</pre> 398</pre>
355 399
356</body> 400</body>
357</section> 401</section>
358<section> 402<section>
365</note> 409</note>
366 410
367<p> 411<p>
368It is wise to add <c>unicode</c> to your global USE flags in 412It is wise to add <c>unicode</c> to your global USE flags in
369<path>/etc/make.conf</path>, and then to remerge <c>sys-libs/ncurses</c> and 413<path>/etc/make.conf</path>, and then to remerge <c>sys-libs/ncurses</c> and
370also <c>sys-libs/slang</c> if appropriate: 414<c>sys-libs/slang</c> if appropriate:
371</p> 415</p>
372 416
373<pre caption="Emerging ncurses and slang"> 417<pre caption="Emerging ncurses and slang">
374<comment>(We avoid putting these libraries in our world file with --oneshot)</comment> 418<comment>(We avoid putting these libraries in our world file with --oneshot)</comment>
375# <i>emerge --oneshot --verbose --ask sys-libs/ncurses sys-libs/slang</i> 419# <i>emerge --oneshot sys-libs/ncurses sys-libs/slang</i>
376</pre> 420</pre>
377 421
378<p> 422<p>
379We also need to rebuild packages that link to these, now the USE changes have 423We also need to rebuild packages that link to these, now the USE changes have
380been applied. The tool we use (<c>revdep-rebuild</c>) is part of the 424been applied. The tool we use (<c>revdep-rebuild</c>) is part of the
487<section> 531<section>
488<title>Vim, Emacs, Xemacs and Nano</title> 532<title>Vim, Emacs, Xemacs and Nano</title>
489<body> 533<body>
490 534
491<p> 535<p>
492Vim, Emacs and Xemacs provide full UTF-8 support, and also have builtin 536Vim provides full UTF-8 support, and also has builtin detection of UTF-8 files.
493detection of UTF-8 files. For further information in Vim, use <c>:help 537For further information in Vim, use <c>:help mbyte.txt</c>.
494mbyte.txt</c>. 538</p>
539
495</p> 540<p>
541Emacs 22.x and higher has full UTF-8 support as well. Xemacs 22.x does not
542support combining characters yet.
543</p>
544
545<p>
546Lower versions of Emacs and/or Xemacs might require you to install
547<c>app-emacs/mule-ucs</c> and/or <c>app-xemacs/mule-ucs</c>
548and add the following code to your <path>~/.emacs</path> to have support for CJK
549languages in UTF-8:
550</p>
551
552<pre caption="Emacs CJK UTF-8 support">
553(require 'un-define)
554(require 'jisx0213)
555(set-language-environment "Japanese")
556(set-default-coding-systems 'utf-8)
557(set-terminal-coding-system 'utf-8)
558</pre>
496 559
497<p> 560<p>
498Nano currently does not provide support for UTF-8, although it has been planned 561Nano currently does not provide support for UTF-8, although it has been planned
499for a long time. With luck, this will change in future. At the time of writing, 562for a long time. With luck, this will change in future. At the time of writing,
500UTF-8 support is in Nano's CVS, and should be included in the next release. 563UTF-8 support is in Nano's CVS, and should be included in the next release.
560about this than to ask them to configure their client correctly. 623about this than to ask them to configure their client correctly.
561</note> 624</note>
562 625
563<p> 626<p>
564Further information is available from the <uri 627Further information is available from the <uri
565link="http://wiki.mutt.org/index.cgi?MuttFaq/Charset"> Mutt WikiWiki</uri>. 628link="http://wiki.mutt.org/index.cgi?MuttFaq/Charset">Mutt Wiki</uri>.
629</p>
630
631</body>
632</section>
633<section>
634<title>Less</title>
635<body>
636
566</p> 637<p>
638We all use a lot of <c>more</c> or <c>less</c> along with the <c>|</c> to be
639able to correctly see the output of a command, like for example
640<c>dmesg | less</c>. While <c>more</c> only needs the shell to be UTF-8 aware,
641<c>less</c> needs an environment variable set, <c>LESSCHARSET</c> to ensure
642that unicode characters are rendered correctly. This can be set in
643<path>/etc/profile</path> or <path>~/.bash_profile</path>. Fire up the editor
644of your choice and the add the following line to one of the files mentioned
645above.
646</p>
647
648<pre caption="Setting up the Environment variable for less">
649LESSCHARSET=utf-8
650</pre>
651
652</body>
653</section>
654<section>
655<title>Man</title>
656<body>
657
658<p>
659Man pages are an integral part of any Linux machine. To ensure that any
660unicode in your man pages render correctly, edit <path>/etc/man.conf</path>
661and replace a line as shown below.
662</p>
663
664<pre caption="man.conf changes for Unicode support">
665<comment>(This is the old line)</comment>
666NROFF /usr/bin/nroff -Tascii -c -mandoc
667<comment>(Replace the one above with this)</comment>
668NROFF /usr/bin/nroff -mandoc -c
669</pre>
670
671</body>
672</section>
673<section>
674<title>elinks and links</title>
675<body>
676
677<p>
678These are commonly used text-based browsers, and we shall see how we can enable
679UTF-8 support on them. On <c>elinks</c> and <c>links</c>, there are two ways to
680go about this, one using the Setup option from within the browser or editing the
681config file. To set the option through the browser, open a site with
682<c>elinks</c> or <c>links</c> and then <c>Alt+S</c> to enter the Setup Menu then
683select Terminal options, or press <c>T</c>. Scroll down and select the last
684option <c>UTF-8 I/O</c> by pressing Enter. Then Save and exit the menu. On
685<c>links</c> you may have to do a repeat <c>Alt+S</c> and then press <c>S</c> to
686save. The config file option, is shown below.
687</p>
688
689<pre caption="Enabling UTF-8 for elinks/links">
690<comment>(For elinks, edit /etc/elinks/elinks.conf or ~/.elinks/elinks.conf and
691add the following line)</comment>
692set terminal.linux.utf_8_io = 1
693
694<comment>(For links, edit ~/.links/links.cfg and add the following
695line)</comment>
696terminal "xterm" 0 1 0 us-ascii utf-8
697</pre>
567 698
568</body> 699</body>
569</section> 700</section>
570<section> 701<section>
571<title>Testing it all out</title> 702<title>Testing it all out</title>
669releasing only the [, then pressing it again makes '¨'. 800releasing only the [, then pressing it again makes '¨'.
670</p> 801</p>
671 802
672<p> 803<p>
673AltGr can be used with alphabetical keys alone. For example, AltGr and m, a 804AltGr can be used with alphabetical keys alone. For example, AltGr and m, a
674Greek lower-case letter mu is produced: 'µ'. 805Greek lower-case letter mu is produced: 'µ'. AltGr and s produce a
806scharfes s or esszet: 'ß'. As many European users would expect (because
807it is marked on their keyboard), AltGr and 4 produces a Euro sign, '€'.
675</p> 808</p>
676 809
677</body> 810</body>
678</section> 811</section>
679<section> 812<section>

Legend:
Removed from v.1.11  
changed lines
  Added in v.1.27

  ViewVC Help
Powered by ViewVC 1.1.20