/[gentoo]/xml/htdocs/doc/en/utf-8.xml
Gentoo

Diff of /xml/htdocs/doc/en/utf-8.xml

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

Revision 1.4 Revision 1.5
1<?xml version='1.0' encoding="UTF-8"?> 1<?xml version='1.0' encoding="UTF-8"?>
2<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/utf-8.xml,v 1.4 2005/02/23 20:20:24 so Exp $ --> 2<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/utf-8.xml,v 1.5 2005/02/24 14:57:18 cam Exp $ -->
3<!DOCTYPE guide SYSTEM "/dtd/guide.dtd"> 3<!DOCTYPE guide SYSTEM "/dtd/guide.dtd">
4 4
5<guide link="/doc/en/utf-8.xml"> 5<guide link="/doc/en/utf-8.xml">
6<title>Using UTF-8 with Gentoo</title> 6<title>Using UTF-8 with Gentoo</title>
7 7
8<author title="Author"> 8<author title="Author">
9 <mail link="slarti@gentoo.org">Thomas Martin</mail> 9 <mail link="slarti@gentoo.org">Thomas Martin</mail>
10</author> 10</author>
11<author title="Contributor"> 11<author title="Contributor">
12 <mail link="devil@gentoo.org.ua">Alexander Simonov</mail> 12 <mail link="devil@gentoo.org.ua">Alexander Simonov</mail>
13</author> 13</author>
14 14
15<abstract> 15<abstract>
16This guide shows you how to set up and use the UTF-8 Unicode character set with 16This guide shows you how to set up and use the UTF-8 Unicode character set with
17your Gentoo Linux system, after explaining the benefits of Unicode and more 17your Gentoo Linux system, after explaining the benefits of Unicode and more
18specifically UTF-8. 18specifically UTF-8.
19</abstract> 19</abstract>
20 20
21<license /> 21<license />
22 22
23<version>1.3</version> 23<version>1.4</version>
24<date>2005-02-23</date> 24<date>2005-02-23</date>
25 25
26<chapter> 26<chapter>
27<title>Character Encodings</title> 27<title>Character Encodings</title>
28<section> 28<section>
29<title>What is a Character Encoding?</title> 29<title>What is a Character Encoding?</title>
30<body> 30<body>
31 31
32<p> 32<p>
33Computers do not understand text themselves. Instead, every character is 33Computers do not understand text themselves. Instead, every character is
34represented by a number. Traditionally, each set of numbers used to represent 34represented by a number. Traditionally, each set of numbers used to represent
35alphabets and characters (known as a coding system, encoding or character set) 35alphabets and characters (known as a coding system, encoding or character set)
36was limited in size due to limitations in computer hardware. 36was limited in size due to limitations in computer hardware.
37</p> 37</p>
38 38
449<section> 449<section>
450<title>Window Managers and Terminal Emulators</title> 450<title>Window Managers and Terminal Emulators</title>
451<body> 451<body>
452 452
453<p> 453<p>
454Window managers not built on GTK or Qt generally have very good Unicode 454Window managers not built on GTK or Qt generally have very good Unicode
455support, as they often use the Xft library for handling fonts. If your window 455support, as they often use the Xft library for handling fonts. If your window
456manager does not use Xft for fonts, you can still use the FontSpec mentioned in 456manager does not use Xft for fonts, you can still use the FontSpec mentioned in
457the previous section as a Unicode font. 457the previous section as a Unicode font.
458</p> 458</p>
459 459
460<p> 460<p>
461Terminal emulators that use Xft and support Unicode are harder to come by. 461Terminal emulators that use Xft and support Unicode are harder to come by.
462Aside from Konsole and gnome-terminal, the best options in Portage are 462Aside from Konsole and gnome-terminal, the best options in Portage are
463<c>x11-terms/rxvt-unicode</c>, <c>xfce-extra/terminal</c>, 463<c>x11-terms/rxvt-unicode</c>, <c>xfce-extra/terminal</c>,
464<c>app-gnustep/terminal</c>, <c>x11-terms/mlterm</c>, <c>x11-terms/mrxvt</c> or 464<c>gnustep-apps/terminal</c>, <c>x11-terms/mlterm</c>, <c>x11-terms/mrxvt</c> or
465plain <c>x11-terms/xterm</c> when built with the <c>unicode</c> USE flag and 465plain <c>x11-terms/xterm</c> when built with the <c>unicode</c> USE flag and
466invoked as <c>uxterm</c>. <c>app-misc/screen</c> supports UTF-8 too, when 466invoked as <c>uxterm</c>. <c>app-misc/screen</c> supports UTF-8 too, when
467invoked as <c>screen -u</c> or the following is put into the 467invoked as <c>screen -u</c> or the following is put into the
468<path>~/.screenrc</path>: 468<path>~/.screenrc</path>:
469</p> 469</p>
470 470
471<pre caption="~/.screenrc for UTF-8"> 471<pre caption="~/.screenrc for UTF-8">
472defutf8 on 472defutf8 on
473</pre> 473</pre>
474 474
475</body> 475</body>
476</section> 476</section>
477<section> 477<section>
478<title>Vim, Emacs, Xemacs and Nano</title> 478<title>Vim, Emacs, Xemacs and Nano</title>
479<body> 479<body>
502shell have Unicode support, although there is a concerted effort to add 502shell have Unicode support, although there is a concerted effort to add
503multibyte character set support underway at the moment. 503multibyte character set support underway at the moment.
504</p> 504</p>
505 505
506<p> 506<p>
507The C shell, <c>tcsh</c> and <c>ksh</c> do not provide UTF-8 support at all. 507The C shell, <c>tcsh</c> and <c>ksh</c> do not provide UTF-8 support at all.
508</p> 508</p>
509 509
510</body> 510</body>
511</section> 511</section>
512<section> 512<section>
513<title>Irssi</title> 513<title>Irssi</title>
514<body> 514<body>
515 515
516<p> 516<p>
517Irssi has complete UTF-8 support, although it does require a user to set an 517Since 0.8.10, Irssi has complete UTF-8 support, although it does require a user
518option. 518to set an option.
519</p> 519</p>
520 520
521<pre caption="Enabling UTF-8 in Irssi"> 521<pre caption="Enabling UTF-8 in Irssi">
522/set term_charset UTF-8 522/set term_charset UTF-8
523</pre> 523</pre>
524
525<impo>
526The <c>/recode</c> command is only available on irssi-0.8.10 and above.
527</impo>
528 524
529<p> 525<p>
530For channels where non-ASCII characters are often exchanged in non-UTF-8 526For channels where non-ASCII characters are often exchanged in non-UTF-8
531charsets, the <c>/recode</c> command may be used to convert the characters. 527charsets, the <c>/recode</c> command may be used to convert the characters.
532Type <c>/help recode</c> for more information. 528Type <c>/help recode</c> for more information.
533</p> 529</p>
534 530
535</body> 531</body>
536</section> 532</section>
537<section> 533<section>
538<title>Mutt</title> 534<title>Mutt</title>
539<body> 535<body>
540 536
541<p> 537<p>
542The Mutt mail user agent has very good Unicode support. To use UTF-8 with Mutt, 538The Mutt mail user agent has very good Unicode support. To use UTF-8 with Mutt,

Legend:
Removed from v.1.4  
changed lines
  Added in v.1.5

  ViewVC Help
Powered by ViewVC 1.1.20