/[gentoo]/xml/htdocs/doc/en/utf-8.xml
Gentoo

Diff of /xml/htdocs/doc/en/utf-8.xml

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

Revision 1.10 Revision 1.11
1<?xml version='1.0' encoding="UTF-8"?> 1<?xml version='1.0' encoding="UTF-8"?>
2<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/utf-8.xml,v 1.10 2005/04/24 03:25:46 bennyc Exp $ --> 2<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/utf-8.xml,v 1.11 2005/04/24 12:18:59 bennyc Exp $ -->
3<!DOCTYPE guide SYSTEM "/dtd/guide.dtd"> 3<!DOCTYPE guide SYSTEM "/dtd/guide.dtd">
4 4
5<guide link="/doc/en/utf-8.xml"> 5<guide link="/doc/en/utf-8.xml">
6<title>Using UTF-8 with Gentoo</title> 6<title>Using UTF-8 with Gentoo</title>
7 7
8<author title="Author"> 8<author title="Author">
9 <mail link="slarti@gentoo.org">Thomas Martin</mail> 9 <mail link="slarti@gentoo.org">Thomas Martin</mail>
10</author> 10</author>
11<author title="Contributor"> 11<author title="Contributor">
12 <mail link="devil@gentoo.org.ua">Alexander Simonov</mail> 12 <mail link="devil@gentoo.org.ua">Alexander Simonov</mail>
13</author> 13</author>
14 14
15<abstract> 15<abstract>
16This guide shows you how to set up and use the UTF-8 Unicode character set with 16This guide shows you how to set up and use the UTF-8 Unicode character set with
17your Gentoo Linux system, after explaining the benefits of Unicode and more 17your Gentoo Linux system, after explaining the benefits of Unicode and more
18specifically UTF-8. 18specifically UTF-8.
19</abstract> 19</abstract>
20 20
21<license /> 21<license />
22 22
23<version>1.5</version> 23<version>1.8</version>
24<date>2005-04-23</date> 24<date>2005-04-05</date>
25 25
26<chapter> 26<chapter>
27<title>Character Encodings</title> 27<title>Character Encodings</title>
28<section> 28<section>
29<title>What is a Character Encoding?</title> 29<title>What is a Character Encoding?</title>
30<body> 30<body>
31 31
32<p> 32<p>
33Computers do not understand text themselves. Instead, every character is 33Computers do not understand text themselves. Instead, every character is
34represented by a number. Traditionally, each set of numbers used to represent 34represented by a number. Traditionally, each set of numbers used to represent
35alphabets and characters (known as a coding system, encoding or character set) 35alphabets and characters (known as a coding system, encoding or character set)
36was limited in size due to limitations in computer hardware. 36was limited in size due to limitations in computer hardware.
37</p> 37</p>
38 38
39</body> 39</body>
96</p> 96</p>
97 97
98<p> 98<p>
99This has led to confusion, and also to an almost total inability for 99This has led to confusion, and also to an almost total inability for
100multilingual communication, especially across different alphabets. Enter 100multilingual communication, especially across different alphabets. Enter
101Unicode. 101Unicode.
102</p> 102</p>
103 103
104</body> 104</body>
105</section> 105</section>
106<section> 106<section>
107<title>What is Unicode?</title> 107<title>What is Unicode?</title>
108<body> 108<body>
109 109
110<p> 110<p>
111Unicode throws away the traditional single-byte limit of character sets, and 111Unicode throws away the traditional single-byte limit of character sets. It
112even with two bytes per-character this allows a maximum 65,536 characters. 112uses 17 "planes" of 65,536 code points to describe a maximum of 1,114,112
113Although this number is extremely high when compared to seven-bit and eight-bit 113characters. As the first plane, aka. "Basic Multilingual Plane" or BMP,
114encodings, it is still not enough for a character set designed to be used for 114contains almost everything you will ever use, many have made the wrong
115symbols and scripts used only by scholars, and symbols that are only used in 115assumption that Unicode was a 16-bit character set.
116mathematics and other specialised fields.
117</p> 116</p>
118 117
119<p> 118<p>
120Unicode has been mapped in many different ways, but the two most common are 119Unicode has been mapped in many different ways, but the two most common are
121<b>UTF</b> (Unicode Transformation Format) and <b>UCS</b> (Universal Character 120<b>UTF</b> (Unicode Transformation Format) and <b>UCS</b> (Universal Character
122Set). A number after UTF indicates the number of bits in one unit, while the 121Set). A number after UTF indicates the number of bits in one unit, while the
123number after UCS indicates the number of bytes. UTF-8 has become the most 122number after UCS indicates the number of bytes. UTF-8 has become the most
124widespread means for the interchange of Unicode text as a result of its 123widespread means for the interchange of Unicode text as a result of its
125eight-bit clean nature, and it is the subject of this document. 124eight-bit clean nature, and it is the subject of this document.
126</p> 125</p>
127 126
128</body> 127</body>
129</section> 128</section>
130<section> 129<section>
131<title>UTF-8</title> 130<title>UTF-8</title>
138ASCII. UTF-8 means that ASCII and Latin characters are interchangeable with 137ASCII. UTF-8 means that ASCII and Latin characters are interchangeable with
139little increase in the size of the data, because only the first bit is used. 138little increase in the size of the data, because only the first bit is used.
140Users of Eastern alphabets such as Japanese, who have been assigned a higher 139Users of Eastern alphabets such as Japanese, who have been assigned a higher
141byte range are unhappy, as this results in as much as a 50% redundancy in their 140byte range are unhappy, as this results in as much as a 50% redundancy in their
142data. 141data.
143</p> 142</p>
144 143
145</body> 144</body>
146</section> 145</section>
147<section> 146<section>
148<title>What UTF-8 Can Do for You</title> 147<title>What UTF-8 Can Do for You</title>
149<body> 148<body>
150 149
151<p> 150<p>
152UTF-8 allows you to work in a standards-compliant and internationally accepted 151UTF-8 allows you to work in a standards-compliant and internationally accepted
153multilingual environment, with a comparitively low data redundancy. UTF-8 is 152multilingual environment, with a comparatively low data redundancy. UTF-8 is
154the preferred way for transmitting non-ASCII characters over the Internet, 153the preferred way for transmitting non-ASCII characters over the Internet,
155through Email, IRC or almost any other medium. Despite this, many people regard 154through Email, IRC or almost any other medium. Despite this, many people regard
156UTF-8 in online communication as abusive. It is always best to be aware of the 155UTF-8 in online communication as abusive. It is always best to be aware of the
157attitude towards UTF-8 in a specific channel, mailing list or Usenet group 156attitude towards UTF-8 in a specific channel, mailing list or Usenet group
158before using <e>non-ASCII</e> UTF-8. 157before using <e>non-ASCII</e> UTF-8.
159</p> 158</p>
160 159
161</body> 160</body>
162</section> 161</section>
163</chapter> 162</chapter>
164 163
165<chapter> 164<chapter>
166<title>Setting up UTF-8 with Gentoo Linux</title> 165<title>Setting up UTF-8 with Gentoo Linux</title>
167<section> 166<section>
168<title>Finding or Creating UTF-8 Locales</title> 167<title>Finding or Creating UTF-8 Locales</title>
200From the output of this command line, we need to take the result with a suffix 199From the output of this command line, we need to take the result with a suffix
201similar to <c>.utf8</c>. If there is no result with a suffix similar to 200similar to <c>.utf8</c>. If there is no result with a suffix similar to
202<c>.utf8</c>, we need to create a UTF-8 compatible locale. 201<c>.utf8</c>, we need to create a UTF-8 compatible locale.
203</p> 202</p>
204 203
205<note> 204<note>
206Only execute the following code listing if you do not have a UTF-8 locale 205Only execute the following code listing if you do not have a UTF-8 locale
207available for your language. 206available for your language.
208</note> 207</note>
209 208
210<pre caption="Creating a UTF-8 locale"> 209<pre caption="Creating a UTF-8 locale">
211<comment>(Replace "en_GB" with your desired locale setting)</comment> 210<comment>(Replace "en_GB" with your desired locale setting)</comment>
212# <i>localedef -i en_GB -f UTF-8 en_GB.utf8</i> 211# <i>localedef -i en_GB -f UTF-8 en_GB.utf8</i>
213</pre> 212</pre>
214 213
214<p>
215Another way to include a UTF-8 locale is to add it to the
216<path>/etc/locales.build</path> file and rebuild <c>glibc</c> with the
217<c>userlocales</c> USE flag set.
218</p>
219
220<pre caption="Line in /etc/locales.build">
221en_GB.UTF-8/UTF-8
222</pre>
223
215</body> 224</body>
216</section> 225</section>
217<section> 226<section>
218<title>Setting the Locale</title> 227<title>Setting the Locale</title>
219<body> 228<body>
220 229
221<p> 230<p>
222There are two environment variables that need to be set in order to use 231Although by now you might be determined to use UTF-8 system wide, the author
223our new UTF-8 locales: <c>LANG</c> and <c>LC_ALL</c>. There are also 232does not recommend setting UTF-8 for the root user. Instead, it is best to set
224many different ways to set them; some people prefer to only have a UTF-8 233the locale in your user's <path>~/.profile</path> (or, if you are using a C
225environment for a specific user, in which case they set them in their 234shell, <path>~/.login</path>).
226<path>~/.profile</path> or <path>~/.bashrc</path>. Others prefer to set the
227locale globally. One specific circumstance where the author particularly
228recommends doing this is when <path>/etc/init.d/xdm</path> is in use, because
229this init script starts the display manager and desktop before any of the
230aforementioned shell startup files are sourced, and so before any of the
231variables are in the environment.
232</p>
233
234<p> 235</p>
235Setting the locale globally should be done using 236
236<path>/etc/env.d/02local</path>. The file should look something like the 237<note>
237following: 238If you are not sure which file to use, use <path>~/.profile</path>. Also, if
239you are unsure which code listing to use, use the Bourne version.
240</note>
241
242<pre caption="Setting the locale with environment variables (Bourne version)">
243export LANG="en_GB.utf8"
244export LC_ALL="en_GB.utf8"
245</pre>
246
247<pre caption="Setting the locale with environment variables (C shell version)">
248setenv LANG "en_GB.utf8"
249setenv LC_ALL "en_GB.utf8"
250</pre>
251
238</p> 252<p>
239 253Now, logout and back in to apply the change. We want these environment
240<pre caption="Demonstration /etc/env.d/02locale"> 254variables in our entire environment, so it is best to logout and back in, or at
241<comment>(As always, change "en_GB.UTF-8" to your locale)</comment> 255the very least to source <path>~/.profile</path> or <path>~/.login</path> in
242LC_ALL="en_GB.UTF-8" 256the console from which you have started other processes.
243LOCALE="en_GB.UTF-8"
244</pre>
245
246<p>
247Next, the environment must be updated with the change.
248</p>
249
250<pre caption="Updating the environment">
251# <i>env-update</i>
252>>> Regenerating /etc/ld.so.cache...
253 * Caching service dependencies ...
254 # <i>source /etc/profile</i>
255</pre>
256
257<p>
258Now, run <c>locale</c> with no arguments to see if we have the correct
259variables in our environment:
260</p>
261
262<pre caption="Checking if our new locale is in the environment">
263# <i>locale</i>
264LANG=en_GB.UTF-8
265LC_CTYPE="en_GB.UTF-8"
266LC_NUMERIC="en_GB.UTF-8"
267LC_TIME="en_GB.UTF-8"
268LC_COLLATE="en_GB.UTF-8"
269LC_MONETARY="en_GB.UTF-8"
270LC_MESSAGES="en_GB.UTF-8"
271LC_PAPER="en_GB.UTF-8"
272LC_NAME="en_GB.UTF-8"
273LC_ADDRESS="en_GB.UTF-8"
274LC_TELEPHONE="en_GB.UTF-8"
275LC_MEASUREMENT="en_GB.UTF-8"
276LC_IDENTIFICATION="en_GB.UTF-8"
277LC_ALL=en_GB.UTF-8
278</pre>
279
280<p>
281That is all. You are now using UTF-8 locales, and the next hurdle is the
282configuration of the applications you use from day to day.
283</p> 257</p>
284 258
285</body> 259</body>
286</section> 260</section>
287</chapter> 261</chapter>
288 262
289<chapter> 263<chapter>
290<title>Application Support</title> 264<title>Application Support</title>
291<section> 265<section>
292<body> 266<body>
293 267
294<p> 268<p>
295When Unicode first started gaining momentum in the software world, multibyte 269When Unicode first started gaining momentum in the software world, multibyte
296character sets were not well suited to languages like C, in which many of the 270character sets were not well suited to languages like C, in which many of the
297day-to-day programs people use are written. Even today, some programs are not 271day-to-day programs people use are written. Even today, some programs are not
364<p> 338<p>
365To enable UTF-8 on the console, you should edit <path>/etc/rc.conf</path> and 339To enable UTF-8 on the console, you should edit <path>/etc/rc.conf</path> and
366set <c>UNICODE="yes"</c>, and also read the comments in that file -- it is 340set <c>UNICODE="yes"</c>, and also read the comments in that file -- it is
367important to have a font that has a good range of characters if you plan on 341important to have a font that has a good range of characters if you plan on
368making the most of Unicode. 342making the most of Unicode.
369</p> 343</p>
370 344
371<p> 345<p>
372The <c>KEYMAP</c> variable, set in <path>/etc/conf.d/keymaps</path>, should 346The <c>KEYMAP</c> variable, set in <path>/etc/conf.d/keymaps</path>, should
373have a Unicode keymap specified. To do this, simply prepend the keymap already 347have a Unicode keymap specified. To do this, simply prepend the keymap already
374specified there with -u. 348specified there with -u.
375</p> 349</p>
376 350
377<pre caption="Example /etc/conf.d/keymaps snippet"> 351<pre caption="Example /etc/conf.d/keymaps snippet">
378<comment>(Change "uk" to your local layout)</comment> 352<comment>(Change "uk" to your local layout)</comment>
379KEYMAP="uk" 353KEYMAP="-u uk"
380</pre> 354</pre>
381 355
382</body> 356</body>
383</section> 357</section>
384<section> 358<section>
385<title>Ncurses and Slang</title> 359<title>Ncurses and Slang</title>
386<body> 360<body>
387 361
388<note> 362<note>
389Ignore any mention of Slang in this section if you do not have it installed or 363Ignore any mention of Slang in this section if you do not have it installed or
390do not use it. 364do not use it.
391</note> 365</note>
392 366
393<p> 367<p>
394It is wise to add <c>unicode</c> to your global USE flags in 368It is wise to add <c>unicode</c> to your global USE flags in
395<path>/etc/make.conf</path>, and then to remerge <c>sys-libs/ncurses</c> and 369<path>/etc/make.conf</path>, and then to remerge <c>sys-libs/ncurses</c> and
396also <c>sys-libs/slang</c> if appropriate: 370also <c>sys-libs/slang</c> if appropriate:
397</p> 371</p>
398 372
399<pre caption="Emerging ncurses and slang"> 373<pre caption="Emerging ncurses and slang">
400<comment>(We avoid putting these libraries in our world file with --oneshot)</comment> 374<comment>(We avoid putting these libraries in our world file with --oneshot)</comment>
401# <i>emerge --oneshot --verbose --ask sys-libs/ncurses sys-libs/slang</i> 375# <i>emerge --oneshot --verbose --ask sys-libs/ncurses sys-libs/slang</i>
402</pre> 376</pre>
403 377
404<p> 378<p>
405We also need to rebuild packages that link to these, now the USE changes have 379We also need to rebuild packages that link to these, now the USE changes have
406been applied. 380been applied. The tool we use (<c>revdep-rebuild</c>) is part of the
381<c>gentoolkit</c> package.
407</p> 382</p>
408 383
409<pre caption="Rebuilding of programs that link to ncurses or slang"> 384<pre caption="Rebuilding of programs that link to ncurses or slang">
410# <i>revdep-rebuild --soname libncurses.so.5</i> 385# <i>revdep-rebuild --soname libncurses.so.5</i>
411# <i>revdep-rebuild --soname libslang.so.1</i> 386# <i>revdep-rebuild --soname libslang.so.1</i>
412</pre> 387</pre>
413 388
414</body> 389</body>
415</section> 390</section>
416<section> 391<section>
417<title>KDE, GNOME and Xfce</title> 392<title>KDE, GNOME and Xfce</title>
418<body> 393<body>
419 394
420<p> 395<p>
421All of the major desktop environments have full Unicode support, and will 396All of the major desktop environments have full Unicode support, and will
445} 420}
446widget_class "*" style "user-font" 421widget_class "*" style "user-font"
447</pre> 422</pre>
448 423
449<p> 424<p>
450If an application has support for both a Qt and GTK+2 GUI, the GTK+2 GUI will 425If an application has support for both a Qt and GTK+2 GUI, the GTK+2 GUI will
451generally give better results with Unicode. 426generally give better results with Unicode.
452</p> 427</p>
453 428
454</body> 429</body>
455</section> 430</section>
456<section> 431<section>
457<title>X11 and Fonts</title> 432<title>X11 and Fonts</title>
458<body> 433<body>
459 434
435<impo>
436<c>x11-base/xorg-x11</c> has far better support for Unicode than XFree86
437and is <e>highly</e> recommended.
438</impo>
439
460<p> 440<p>
461TrueType fonts have support for Unicode, and most of the fonts that ship with 441TrueType fonts have support for Unicode, and most of the fonts that ship with
462Xorg have impressive character support, although, obviously, not every single 442Xorg have impressive character support, although, obviously, not every single
463glyph available in Unicode has been created for that font. To build fonts 443glyph available in Unicode has been created for that font. To build fonts
464(including the Bitstream Vera set) with support for East Asian letters with X, 444(including the Bitstream Vera set) with support for East Asian letters with X,
465make sure you have the <c>cjk</c> USE flag set. Many other applications utilise 445make sure you have the <c>cjk</c> USE flag set. Many other applications utilise
466this flag, so it may be worthwhile to add it as a permanent USE flag. 446this flag, so it may be worthwhile to add it as a permanent USE flag.
467</p> 447</p>
468 448
469<p> 449<p>
470Also, several font packages in Portage are Unicode aware. 450Also, several font packages in Portage are Unicode aware.
471</p> 451</p>
472 452
473<pre caption="Optional: Install some more Unicode-aware fonts"> 453<pre caption="Optional: Install some more Unicode-aware fonts">
474# <i>emerge terminus-font intlfonts freefonts cronyx-fonts corefonts</i> 454# <i>emerge terminus-font intlfonts freefonts cronyx-fonts corefonts</i>
475</pre> 455</pre>
476 456
477</body> 457</body>
478</section> 458</section>
479<section> 459<section>
480<title>Window Managers and Terminal Emulators</title> 460<title>Window Managers and Terminal Emulators</title>
481<body> 461<body>
482 462
483<p> 463<p>
484Window managers, even those not built on GTK or Qt, generally have very 464Window managers not built on GTK or Qt generally have very good Unicode
485good Unicode support, as they often use the Xft library for handling 465support, as they often use the Xft library for handling fonts. If your window
486fonts. If your window manager does not use Xft for fonts, you can still 466manager does not use Xft for fonts, you can still use the FontSpec mentioned in
487use the FontSpec mentioned in the previous section as a Unicode font. 467the previous section as a Unicode font.
488</p> 468</p>
489 469
490<p> 470<p>
491Terminal emulators that use Xft and support Unicode are harder to come by. 471Terminal emulators that use Xft and support Unicode are harder to come by.
492Aside from Konsole and gnome-terminal, the best options in Portage are 472Aside from Konsole and gnome-terminal, the best options in Portage are
493<c>x11-terms/rxvt-unicode</c>, <c>xfce-extra/terminal</c>, 473<c>x11-terms/rxvt-unicode</c>, <c>xfce-extra/terminal</c>,
494<c>gnustep-apps/terminal</c>, <c>x11-terms/mlterm</c>, <c>x11-terms/mrxvt</c> or 474<c>gnustep-apps/terminal</c>, <c>x11-terms/mlterm</c>, <c>x11-terms/mrxvt</c> or
495plain <c>x11-terms/xterm</c> when built with the <c>unicode</c> USE flag and 475plain <c>x11-terms/xterm</c> when built with the <c>unicode</c> USE flag and
496invoked as <c>uxterm</c>. <c>app-misc/screen</c> supports UTF-8 too, when 476invoked as <c>uxterm</c>. <c>app-misc/screen</c> supports UTF-8 too, when
497invoked as <c>screen -U</c> or the following is put into the 477invoked as <c>screen -u</c> or the following is put into the
498<path>~/.screenrc</path>: 478<path>~/.screenrc</path>:
499</p> 479</p>
500 480
501<pre caption="~/.screenrc for UTF-8"> 481<pre caption="~/.screenrc for UTF-8">
502defutf8 on 482defutf8 on
503</pre> 483</pre>
504 484
505</body> 485</body>
506</section> 486</section>
507<section> 487<section>
508<title>Vim, Emacs, Xemacs and Nano</title> 488<title>Vim, Emacs, Xemacs and Nano</title>
509<body> 489<body>
510 490
511<p> 491<p>
512Vim, Emacs and Xemacs provide full UTF-8 support, and also have builtin 492Vim, Emacs and Xemacs provide full UTF-8 support, and also have builtin
525<section> 505<section>
526<title>Shells</title> 506<title>Shells</title>
527<body> 507<body>
528 508
529<p> 509<p>
530Currently, <c>bash</c> provides full Unicode support through the GNU readline 510Currently, <c>bash</c> provides full Unicode support through the GNU readline
531library. Z Shell users are in a somewhat worse position -- no parts of the 511library. Z Shell users are in a somewhat worse position -- no parts of the
532shell have Unicode support, although there is a concerted effort to add 512shell have Unicode support, although there is a concerted effort to add
533multibyte character set support underway at the moment. 513multibyte character set support underway at the moment.
534</p> 514</p>
535 515
536<p> 516<p>
537The C shell, <c>tcsh</c> and <c>ksh</c> do not provide UTF-8 support at all. 517The C shell, <c>tcsh</c> and <c>ksh</c> do not provide UTF-8 support at all.
538</p> 518</p>
539 519
540<note>
541Although not strictly related to shells, many of the GNU text-processing
542programs in your system (<c>tr</c>, <c>grep</c>, etc.) are much slower
543when processing Unicode. Nonetheless, the difference is not at all
544noticeable in nearly every case, but if you are ever hit by these bugs
545then at least you will know what is causing them. Perl also tends to be
546slower when operating on multibyte characters. The author knows of one
547other gotcha: <c>tr</c> will not convert three-byte UTF-8 characters to
548two-byte UTF-8 characters.
549</note>
550
551</body> 520</body>
552</section> 521</section>
553<section> 522<section>
554<title>Irssi</title> 523<title>Irssi</title>
555<body> 524<body>
556 525
557<p> 526<p>
558Since 0.8.10, Irssi has complete UTF-8 support, although it does require a user 527Since 0.8.10, Irssi has complete UTF-8 support, although it does require a user
559to set an option. 528to set an option.
560</p> 529</p>
561 530
562<pre caption="Enabling UTF-8 in Irssi"> 531<pre caption="Enabling UTF-8 in Irssi">
563/set term_charset UTF-8 532/set term_charset UTF-8
564</pre> 533</pre>
565 534
575<title>Mutt</title> 544<title>Mutt</title>
576<body> 545<body>
577 546
578<p> 547<p>
579The Mutt mail user agent has very good Unicode support. To use UTF-8 with Mutt, 548The Mutt mail user agent has very good Unicode support. To use UTF-8 with Mutt,
580put the following in your <path>~/.muttrc</path>: 549put the following in your <path>~/.muttrc</path>:
581</p> 550</p>
582 551
583<pre caption="~/.muttrc for UTF-8"> 552<pre caption="~/.muttrc for UTF-8">
584set send_charset="utf8" <comment>(outgoing character set)</comment> 553set send_charset="utf8" <comment>(outgoing character set)</comment>
585set charset="utf8" <comment>(display character set)</comment> 554set charset="utf8" <comment>(display character set)</comment>
586</pre> 555</pre>
587 556
588<note> 557<note>
589You may still see '?' in mail you read with Mutt. This is a result of people 558You may still see '?' in mail you read with Mutt. This is a result of people
590using Latin (ISO 8859) or another charset for email transmission. It is best to 559using a mail client which does not indicate the used charset. You can't do much
591tell them to use UTF-8 for mail, and point them to the IETF RFC 2277 (see 560about this than to ask them to configure their client correctly.
592References at the end of this document). Also note that in some lists,
593subscribers may not like UTF-8. Be sure that the group or person you are
594communicating with does not mind UTF-8.
595</note> 561</note>
596 562
597<p> 563<p>
598Further information is available from the <uri 564Further information is available from the <uri
599link="http://wiki.mutt.org/index.cgi?MuttFaq/Charset"> Mutt WikiWiki</uri>. 565link="http://wiki.mutt.org/index.cgi?MuttFaq/Charset"> Mutt WikiWiki</uri>.
600</p> 566</p>
601 567
602</body> 568</body>
603</section> 569</section>
604<section> 570<section>
605<title>Testing it all out</title> 571<title>Testing it all out</title>
606<body> 572<body>
607 573
608<p> 574<p>
609There are numerous UTF-8 test websites around. <c>net-www/w3m</c>, 575There are numerous UTF-8 test websites around. <c>net-www/w3m</c>,
663Section "InputDevice" 629Section "InputDevice"
664 Identifier "Keyboard0" 630 Identifier "Keyboard0"
665 Driver "kbd" 631 Driver "kbd"
666 Option "XkbLayout" "en_US" <comment># Rather than just "us"</comment> 632 Option "XkbLayout" "en_US" <comment># Rather than just "us"</comment>
667 <comment>(Other Xkb options here)</comment> 633 <comment>(Other Xkb options here)</comment>
668EndSection 634EndSection
669</pre> 635</pre>
670 636
671<note> 637<note>
672The preceding change only needs to be applied if you are using a North American 638The preceding change only needs to be applied if you are using a North American
673layout, or another layout where dead keys do not seem to be working. European 639layout, or another layout where dead keys do not seem to be working. European
674users should have working dead keys as is. 640users should have working dead keys as is.
675</note> 641</note>
676 642
677<p> 643<p>
678This change will come into effect when the X server is restarted. To apply the 644This change will come into effect when your X server is restarted. To apply the
679change now, use the <c>setxkbmap</c> tool, for example, <c>setxkbmap en_US</c>. 645change now, use the <c>setxkbmap</c> tool, for example, <c>setxkbmap en_US</c>.
680</p> 646</p>
681 647
682<p> 648<p>
683It is probably easiest to describe dead keys with examples. Although the 649It is probably easiest to describe dead keys with examples. Although the
684results are layout dependent, the concepts should remain the same regardless of 650results are locale dependent, the concepts should remain the same regardless of
685locale. The examples contain UTF-8, so to view them you need to either tell 651locale. The examples contain UTF-8, so to view them you need to either tell
686your browser to view the page as UTF-8, or have a UTF-8 locale already 652your browser to view the page as UTF-8, or have a UTF-8 locale already
687configured. 653configured.
688</p> 654</p>
689 655
690<p> 656<p>
691When I press AltGr and [ at once, release them, and then press a, 'ä' is 657When I press AltGr and [ at once, release them, and then press a, 'ä' is
692produced. When I press AltGr and [ at once, and then press e, 'ë' is 658produced. When I press AltGr and [ at once, and then press e, 'ë' is produced.
693produced. When I press AltGr and ; at once, release them, and press a, 659When I press AltGr and ; at once, 'á' is produced, and when I press AltGr and ;
694'á' is produced, and when I press AltGr and ; at once, release them, and 660at once, release them, and then press e, 'é' is produced.
695then press e, 'é' is produced.
696</p> 661</p>
697 662
698<p> 663<p>
699By pressing AltGr, Shift and [ at once, releasing them, and then pressing a, a 664By pressing AltGr, Shift and [ at once, releasing them, and then pressing a, a
700Scandinavian 'å' is produced. Similarly, when I press AltGr, Shift and [ at 665Scandinavian 'å' is produced. Similarly, when I press AltGr, Shift and [ at
701once, release <e>only</e> the [, and then press it again, '˚' is produced. 666once, release <e>only</e> the [, and then press it again, '˚' is produced.
702Although it looks like one, this (U+02DA) is not the same as a degree symbol 667Although it looks like one, this (U+02DA) is not the same as a degree symbol
703(U+00B0). This works for other accents produced by dead keys — AltGr and [, 668(U+00B0). This works for other accents produced by dead keys — AltGr and [,
704releasing only the [, then pressing it again makes '¨'. 669releasing only the [, then pressing it again makes '¨'.
705</p> 670</p>
706 671
707<p> 672<p>
708AltGr can be used with alphabetical keys alone. For example, AltGr and m, a 673AltGr can be used with alphabetical keys alone. For example, AltGr and m, a
709Greek lower-case letter mu is produced: 'µ'. AltGr and s produce a 674Greek lower-case letter mu is produced: 'µ'.
710Schauffer's s: 'ß'. As many European users would expect (because it is
711marked on their keyboard), AltGr and 4 produces a Euro sign, '€'.
712</p> 675</p>
713 676
714</body> 677</body>
715</section> 678</section>
716<section> 679<section>
717<title>Resources</title> 680<title>Resources</title>
718<body> 681<body>
719 682
720<ul> 683<ul>
721 <li> 684 <li>
722 <uri link="http://www.wikipedia.com/wiki/Unicode">The Wikipedia entry for 685 <uri link="http://www.wikipedia.com/wiki/Unicode">The Wikipedia entry for
723 Unicode</uri> 686 Unicode</uri>
724 </li> 687 </li>
725 <li> 688 <li>
726 <uri link="http://www.wikipedia.com/wiki/UTF-8">The Wikipedia entry for 689 <uri link="http://www.wikipedia.com/wiki/UTF-8">The Wikipedia entry for
727 UTF-8</uri> 690 UTF-8</uri>
728 </li> 691 </li>
729 <li><uri link="http://www.unicode.org">Unicode.org</uri></li> 692 <li><uri link="http://www.unicode.org">Unicode.org</uri></li>
730 <li><uri link="http://www.utf-8.com">UTF-8.com</uri></li> 693 <li><uri link="http://www.utf-8.com">UTF-8.com</uri></li>
731 <li><uri link="http://www.ietf.org/rfc/rfc3629.txt">RFC 3629</uri></li> 694 <li><uri link="http://www.ietf.org/rfc/rfc3629.txt">RFC 3629</uri></li>
732 <li><uri link="http://www.ietf.org/rfc/rfc2277.txt">RFC 2277</uri></li> 695 <li><uri link="http://www.ietf.org/rfc/rfc2277.txt">RFC 2277</uri></li>
696 <li>
697 <uri
698 link="http://www.tbray.org/ongoing/When/200x/2003/04/26/UTF">Characters vs.
699 Bytes</uri>
700 </li>
733</ul> 701</ul>
734 702
735</body> 703</body>
736</section> 704</section>
737</chapter> 705</chapter>
738</guide> 706</guide>

Legend:
Removed from v.1.10  
changed lines
  Added in v.1.11

  ViewVC Help
Powered by ViewVC 1.1.20