/[gentoo]/xml/htdocs/doc/en/utf-8.xml
Gentoo

Diff of /xml/htdocs/doc/en/utf-8.xml

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

Revision 1.9 Revision 1.10
1<?xml version='1.0' encoding="UTF-8"?> 1<?xml version='1.0' encoding="UTF-8"?>
2<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/utf-8.xml,v 1.9 2005/04/05 08:59:28 neysx Exp $ --> 2<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/utf-8.xml,v 1.10 2005/04/24 03:25:46 bennyc Exp $ -->
3<!DOCTYPE guide SYSTEM "/dtd/guide.dtd"> 3<!DOCTYPE guide SYSTEM "/dtd/guide.dtd">
4 4
5<guide link="/doc/en/utf-8.xml"> 5<guide link="/doc/en/utf-8.xml">
6<title>Using UTF-8 with Gentoo</title> 6<title>Using UTF-8 with Gentoo</title>
7 7
8<author title="Author"> 8<author title="Author">
9 <mail link="slarti@gentoo.org">Thomas Martin</mail> 9 <mail link="slarti@gentoo.org">Thomas Martin</mail>
10</author> 10</author>
11<author title="Contributor"> 11<author title="Contributor">
12 <mail link="devil@gentoo.org.ua">Alexander Simonov</mail> 12 <mail link="devil@gentoo.org.ua">Alexander Simonov</mail>
13</author> 13</author>
14 14
15<abstract> 15<abstract>
16This guide shows you how to set up and use the UTF-8 Unicode character set with 16This guide shows you how to set up and use the UTF-8 Unicode character set with
17your Gentoo Linux system, after explaining the benefits of Unicode and more 17your Gentoo Linux system, after explaining the benefits of Unicode and more
18specifically UTF-8. 18specifically UTF-8.
19</abstract> 19</abstract>
20 20
21<license /> 21<license />
22 22
23<version>1.8</version> 23<version>1.5</version>
24<date>2005-04-05</date> 24<date>2005-04-23</date>
25 25
26<chapter> 26<chapter>
27<title>Character Encodings</title> 27<title>Character Encodings</title>
28<section> 28<section>
29<title>What is a Character Encoding?</title> 29<title>What is a Character Encoding?</title>
30<body> 30<body>
31 31
32<p> 32<p>
33Computers do not understand text themselves. Instead, every character is 33Computers do not understand text themselves. Instead, every character is
34represented by a number. Traditionally, each set of numbers used to represent 34represented by a number. Traditionally, each set of numbers used to represent
35alphabets and characters (known as a coding system, encoding or character set) 35alphabets and characters (known as a coding system, encoding or character set)
36was limited in size due to limitations in computer hardware. 36was limited in size due to limitations in computer hardware.
37</p> 37</p>
38 38
39</body> 39</body>
96</p> 96</p>
97 97
98<p> 98<p>
99This has led to confusion, and also to an almost total inability for 99This has led to confusion, and also to an almost total inability for
100multilingual communication, especially across different alphabets. Enter 100multilingual communication, especially across different alphabets. Enter
101Unicode. 101Unicode.
102</p> 102</p>
103 103
104</body> 104</body>
105</section> 105</section>
106<section> 106<section>
107<title>What is Unicode?</title> 107<title>What is Unicode?</title>
108<body> 108<body>
109 109
110<p> 110<p>
111Unicode throws away the traditional single-byte limit of character sets. It 111Unicode throws away the traditional single-byte limit of character sets, and
112uses 17 "planes" of 65,536 code points to describe a maximum of 1,114,112 112even with two bytes per-character this allows a maximum 65,536 characters.
113characters. As the first plane, aka. "Basic Multilingual Plane" or BMP, 113Although this number is extremely high when compared to seven-bit and eight-bit
114contains almost everything you will ever use, many have made the wrong 114encodings, it is still not enough for a character set designed to be used for
115assumption that Unicode was a 16-bit character set. 115symbols and scripts used only by scholars, and symbols that are only used in
116mathematics and other specialised fields.
116</p> 117</p>
117 118
118<p> 119<p>
119Unicode has been mapped in many different ways, but the two most common are 120Unicode has been mapped in many different ways, but the two most common are
120<b>UTF</b> (Unicode Transformation Format) and <b>UCS</b> (Universal Character 121<b>UTF</b> (Unicode Transformation Format) and <b>UCS</b> (Universal Character
121Set). A number after UTF indicates the number of bits in one unit, while the 122Set). A number after UTF indicates the number of bits in one unit, while the
122number after UCS indicates the number of bytes. UTF-8 has become the most 123number after UCS indicates the number of bytes. UTF-8 has become the most
123widespread means for the interchange of Unicode text as a result of its 124widespread means for the interchange of Unicode text as a result of its
124eight-bit clean nature, and it is the subject of this document. 125eight-bit clean nature, and it is the subject of this document.
125</p> 126</p>
126 127
127</body> 128</body>
128</section> 129</section>
129<section> 130<section>
130<title>UTF-8</title> 131<title>UTF-8</title>
137ASCII. UTF-8 means that ASCII and Latin characters are interchangeable with 138ASCII. UTF-8 means that ASCII and Latin characters are interchangeable with
138little increase in the size of the data, because only the first bit is used. 139little increase in the size of the data, because only the first bit is used.
139Users of Eastern alphabets such as Japanese, who have been assigned a higher 140Users of Eastern alphabets such as Japanese, who have been assigned a higher
140byte range are unhappy, as this results in as much as a 50% redundancy in their 141byte range are unhappy, as this results in as much as a 50% redundancy in their
141data. 142data.
142</p> 143</p>
143 144
144</body> 145</body>
145</section> 146</section>
146<section> 147<section>
147<title>What UTF-8 Can Do for You</title> 148<title>What UTF-8 Can Do for You</title>
148<body> 149<body>
149 150
150<p> 151<p>
151UTF-8 allows you to work in a standards-compliant and internationally accepted 152UTF-8 allows you to work in a standards-compliant and internationally accepted
152multilingual environment, with a comparatively low data redundancy. UTF-8 is 153multilingual environment, with a comparitively low data redundancy. UTF-8 is
153the preferred way for transmitting non-ASCII characters over the Internet, 154the preferred way for transmitting non-ASCII characters over the Internet,
154through Email, IRC or almost any other medium. Despite this, many people regard 155through Email, IRC or almost any other medium. Despite this, many people regard
155UTF-8 in online communication as abusive. It is always best to be aware of the 156UTF-8 in online communication as abusive. It is always best to be aware of the
156attitude towards UTF-8 in a specific channel, mailing list or Usenet group 157attitude towards UTF-8 in a specific channel, mailing list or Usenet group
157before using <e>non-ASCII</e> UTF-8. 158before using <e>non-ASCII</e> UTF-8.
158</p> 159</p>
159 160
160</body> 161</body>
161</section> 162</section>
162</chapter> 163</chapter>
163 164
164<chapter> 165<chapter>
165<title>Setting up UTF-8 with Gentoo Linux</title> 166<title>Setting up UTF-8 with Gentoo Linux</title>
166<section> 167<section>
167<title>Finding or Creating UTF-8 Locales</title> 168<title>Finding or Creating UTF-8 Locales</title>
199From the output of this command line, we need to take the result with a suffix 200From the output of this command line, we need to take the result with a suffix
200similar to <c>.utf8</c>. If there is no result with a suffix similar to 201similar to <c>.utf8</c>. If there is no result with a suffix similar to
201<c>.utf8</c>, we need to create a UTF-8 compatible locale. 202<c>.utf8</c>, we need to create a UTF-8 compatible locale.
202</p> 203</p>
203 204
204<note> 205<note>
205Only execute the following code listing if you do not have a UTF-8 locale 206Only execute the following code listing if you do not have a UTF-8 locale
206available for your language. 207available for your language.
207</note> 208</note>
208 209
209<pre caption="Creating a UTF-8 locale"> 210<pre caption="Creating a UTF-8 locale">
210<comment>(Replace "en_GB" with your desired locale setting)</comment> 211<comment>(Replace "en_GB" with your desired locale setting)</comment>
211# <i>localedef -i en_GB -f UTF-8 en_GB.utf8</i> 212# <i>localedef -i en_GB -f UTF-8 en_GB.utf8</i>
212</pre> 213</pre>
213 214
214<p>
215Another way to include a UTF-8 locale is to add it to the
216<path>/etc/locales.build</path> file and rebuild <c>glibc</c> with the
217<c>userlocales</c> USE flag set.
218</p>
219
220<pre caption="Line in /etc/locales.build">
221en_GB.UTF-8/UTF-8
222</pre>
223
224</body> 215</body>
225</section> 216</section>
226<section> 217<section>
227<title>Setting the Locale</title> 218<title>Setting the Locale</title>
228<body> 219<body>
229 220
230<p> 221<p>
231Although by now you might be determined to use UTF-8 system wide, the author 222There are two environment variables that need to be set in order to use
232does not recommend setting UTF-8 for the root user. Instead, it is best to set 223our new UTF-8 locales: <c>LANG</c> and <c>LC_ALL</c>. There are also
233the locale in your user's <path>~/.profile</path> (or, if you are using a C 224many different ways to set them; some people prefer to only have a UTF-8
234shell, <path>~/.login</path>). 225environment for a specific user, in which case they set them in their
235</p> 226<path>~/.profile</path> or <path>~/.bashrc</path>. Others prefer to set the
236 227locale globally. One specific circumstance where the author particularly
237<note> 228recommends doing this is when <path>/etc/init.d/xdm</path> is in use, because
238If you are not sure which file to use, use <path>~/.profile</path>. Also, if 229this init script starts the display manager and desktop before any of the
239you are unsure which code listing to use, use the Bourne version. 230aforementioned shell startup files are sourced, and so before any of the
240</note> 231variables are in the environment.
241
242<pre caption="Setting the locale with environment variables (Bourne version)">
243export LANG="en_GB.utf8"
244export LC_ALL="en_GB.utf8"
245</pre>
246
247<pre caption="Setting the locale with environment variables (C shell version)">
248setenv LANG "en_GB.utf8"
249setenv LC_ALL "en_GB.utf8"
250</pre>
251
252<p> 232</p>
253Now, logout and back in to apply the change. We want these environment 233
254variables in our entire environment, so it is best to logout and back in, or at 234<p>
255the very least to source <path>~/.profile</path> or <path>~/.login</path> in 235Setting the locale globally should be done using
256the console from which you have started other processes. 236<path>/etc/env.d/02local</path>. The file should look something like the
237following:
238</p>
239
240<pre caption="Demonstration /etc/env.d/02locale">
241<comment>(As always, change "en_GB.UTF-8" to your locale)</comment>
242LC_ALL="en_GB.UTF-8"
243LOCALE="en_GB.UTF-8"
244</pre>
245
246<p>
247Next, the environment must be updated with the change.
248</p>
249
250<pre caption="Updating the environment">
251# <i>env-update</i>
252>>> Regenerating /etc/ld.so.cache...
253 * Caching service dependencies ...
254 # <i>source /etc/profile</i>
255</pre>
256
257<p>
258Now, run <c>locale</c> with no arguments to see if we have the correct
259variables in our environment:
260</p>
261
262<pre caption="Checking if our new locale is in the environment">
263# <i>locale</i>
264LANG=en_GB.UTF-8
265LC_CTYPE="en_GB.UTF-8"
266LC_NUMERIC="en_GB.UTF-8"
267LC_TIME="en_GB.UTF-8"
268LC_COLLATE="en_GB.UTF-8"
269LC_MONETARY="en_GB.UTF-8"
270LC_MESSAGES="en_GB.UTF-8"
271LC_PAPER="en_GB.UTF-8"
272LC_NAME="en_GB.UTF-8"
273LC_ADDRESS="en_GB.UTF-8"
274LC_TELEPHONE="en_GB.UTF-8"
275LC_MEASUREMENT="en_GB.UTF-8"
276LC_IDENTIFICATION="en_GB.UTF-8"
277LC_ALL=en_GB.UTF-8
278</pre>
279
280<p>
281That is all. You are now using UTF-8 locales, and the next hurdle is the
282configuration of the applications you use from day to day.
257</p> 283</p>
258 284
259</body> 285</body>
260</section> 286</section>
261</chapter> 287</chapter>
262 288
263<chapter> 289<chapter>
264<title>Application Support</title> 290<title>Application Support</title>
265<section> 291<section>
266<body> 292<body>
267 293
268<p> 294<p>
269When Unicode first started gaining momentum in the software world, multibyte 295When Unicode first started gaining momentum in the software world, multibyte
270character sets were not well suited to languages like C, in which many of the 296character sets were not well suited to languages like C, in which many of the
271day-to-day programs people use are written. Even today, some programs are not 297day-to-day programs people use are written. Even today, some programs are not
338<p> 364<p>
339To enable UTF-8 on the console, you should edit <path>/etc/rc.conf</path> and 365To enable UTF-8 on the console, you should edit <path>/etc/rc.conf</path> and
340set <c>UNICODE="yes"</c>, and also read the comments in that file -- it is 366set <c>UNICODE="yes"</c>, and also read the comments in that file -- it is
341important to have a font that has a good range of characters if you plan on 367important to have a font that has a good range of characters if you plan on
342making the most of Unicode. 368making the most of Unicode.
343</p> 369</p>
344 370
345<p> 371<p>
346The <c>KEYMAP</c> variable, set in <path>/etc/conf.d/keymaps</path>, should 372The <c>KEYMAP</c> variable, set in <path>/etc/conf.d/keymaps</path>, should
347have a Unicode keymap specified. To do this, simply prepend the keymap already 373have a Unicode keymap specified. To do this, simply prepend the keymap already
348specified there with -u. 374specified there with -u.
349</p> 375</p>
350 376
351<pre caption="Example /etc/conf.d/keymaps snippet"> 377<pre caption="Example /etc/conf.d/keymaps snippet">
352<comment>(Change "uk" to your local layout)</comment> 378<comment>(Change "uk" to your local layout)</comment>
353KEYMAP="-u uk" 379KEYMAP="uk"
354</pre> 380</pre>
355 381
356</body> 382</body>
357</section> 383</section>
358<section> 384<section>
359<title>Ncurses and Slang</title> 385<title>Ncurses and Slang</title>
360<body> 386<body>
361 387
362<note> 388<note>
363Ignore any mention of Slang in this section if you do not have it installed or 389Ignore any mention of Slang in this section if you do not have it installed or
364do not use it. 390do not use it.
365</note> 391</note>
366 392
367<p> 393<p>
368It is wise to add <c>unicode</c> to your global USE flags in 394It is wise to add <c>unicode</c> to your global USE flags in
369<path>/etc/make.conf</path>, and then to remerge <c>sys-libs/ncurses</c> and 395<path>/etc/make.conf</path>, and then to remerge <c>sys-libs/ncurses</c> and
370also <c>sys-libs/slang</c> if appropriate: 396also <c>sys-libs/slang</c> if appropriate:
371</p> 397</p>
372 398
373<pre caption="Emerging ncurses and slang"> 399<pre caption="Emerging ncurses and slang">
374<comment>(We avoid putting these libraries in our world file with --oneshot)</comment> 400<comment>(We avoid putting these libraries in our world file with --oneshot)</comment>
375# <i>emerge --oneshot --verbose --ask sys-libs/ncurses sys-libs/slang</i> 401# <i>emerge --oneshot --verbose --ask sys-libs/ncurses sys-libs/slang</i>
376</pre> 402</pre>
377 403
378<p> 404<p>
379We also need to rebuild packages that link to these, now the USE changes have 405We also need to rebuild packages that link to these, now the USE changes have
380been applied. The tool we use (<c>revdep-rebuild</c>) is part of the 406been applied.
381<c>gentoolkit</c> package.
382</p> 407</p>
383 408
384<pre caption="Rebuilding of programs that link to ncurses or slang"> 409<pre caption="Rebuilding of programs that link to ncurses or slang">
385# <i>revdep-rebuild --soname libncurses.so.5</i> 410# <i>revdep-rebuild --soname libncurses.so.5</i>
386# <i>revdep-rebuild --soname libslang.so.1</i> 411# <i>revdep-rebuild --soname libslang.so.1</i>
387</pre> 412</pre>
388 413
389</body> 414</body>
390</section> 415</section>
391<section> 416<section>
392<title>KDE, GNOME and Xfce</title> 417<title>KDE, GNOME and Xfce</title>
393<body> 418<body>
394 419
395<p> 420<p>
396All of the major desktop environments have full Unicode support, and will 421All of the major desktop environments have full Unicode support, and will
420} 445}
421widget_class "*" style "user-font" 446widget_class "*" style "user-font"
422</pre> 447</pre>
423 448
424<p> 449<p>
425If an application has support for both a Qt and GTK+2 GUI, the GTK+2 GUI will 450If an application has support for both a Qt and GTK+2 GUI, the GTK+2 GUI will
426generally give better results with Unicode. 451generally give better results with Unicode.
427</p> 452</p>
428 453
429</body> 454</body>
430</section> 455</section>
431<section> 456<section>
432<title>X11 and Fonts</title> 457<title>X11 and Fonts</title>
433<body> 458<body>
434 459
435<impo>
436<c>x11-base/xorg-x11</c> has far better support for Unicode than XFree86
437and is <e>highly</e> recommended.
438</impo>
439
440<p> 460<p>
441TrueType fonts have support for Unicode, and most of the fonts that ship with 461TrueType fonts have support for Unicode, and most of the fonts that ship with
442Xorg have impressive character support, although, obviously, not every single 462Xorg have impressive character support, although, obviously, not every single
443glyph available in Unicode has been created for that font. To build fonts 463glyph available in Unicode has been created for that font. To build fonts
444(including the Bitstream Vera set) with support for East Asian letters with X, 464(including the Bitstream Vera set) with support for East Asian letters with X,
445make sure you have the <c>cjk</c> USE flag set. Many other applications utilise 465make sure you have the <c>cjk</c> USE flag set. Many other applications utilise
446this flag, so it may be worthwhile to add it as a permanent USE flag. 466this flag, so it may be worthwhile to add it as a permanent USE flag.
447</p> 467</p>
448 468
449<p> 469<p>
450Also, several font packages in Portage are Unicode aware. 470Also, several font packages in Portage are Unicode aware.
451</p> 471</p>
452 472
453<pre caption="Optional: Install some more Unicode-aware fonts"> 473<pre caption="Optional: Install some more Unicode-aware fonts">
454# <i>emerge terminus-font intlfonts freefonts cronyx-fonts corefonts</i> 474# <i>emerge terminus-font intlfonts freefonts cronyx-fonts corefonts</i>
455</pre> 475</pre>
456 476
457</body> 477</body>
458</section> 478</section>
459<section> 479<section>
460<title>Window Managers and Terminal Emulators</title> 480<title>Window Managers and Terminal Emulators</title>
461<body> 481<body>
462 482
463<p> 483<p>
464Window managers not built on GTK or Qt generally have very good Unicode 484Window managers, even those not built on GTK or Qt, generally have very
465support, as they often use the Xft library for handling fonts. If your window 485good Unicode support, as they often use the Xft library for handling
466manager does not use Xft for fonts, you can still use the FontSpec mentioned in 486fonts. If your window manager does not use Xft for fonts, you can still
467the previous section as a Unicode font. 487use the FontSpec mentioned in the previous section as a Unicode font.
468</p> 488</p>
469 489
470<p> 490<p>
471Terminal emulators that use Xft and support Unicode are harder to come by. 491Terminal emulators that use Xft and support Unicode are harder to come by.
472Aside from Konsole and gnome-terminal, the best options in Portage are 492Aside from Konsole and gnome-terminal, the best options in Portage are
473<c>x11-terms/rxvt-unicode</c>, <c>xfce-extra/terminal</c>, 493<c>x11-terms/rxvt-unicode</c>, <c>xfce-extra/terminal</c>,
474<c>gnustep-apps/terminal</c>, <c>x11-terms/mlterm</c>, <c>x11-terms/mrxvt</c> or 494<c>gnustep-apps/terminal</c>, <c>x11-terms/mlterm</c>, <c>x11-terms/mrxvt</c> or
475plain <c>x11-terms/xterm</c> when built with the <c>unicode</c> USE flag and 495plain <c>x11-terms/xterm</c> when built with the <c>unicode</c> USE flag and
476invoked as <c>uxterm</c>. <c>app-misc/screen</c> supports UTF-8 too, when 496invoked as <c>uxterm</c>. <c>app-misc/screen</c> supports UTF-8 too, when
477invoked as <c>screen -u</c> or the following is put into the 497invoked as <c>screen -U</c> or the following is put into the
478<path>~/.screenrc</path>: 498<path>~/.screenrc</path>:
479</p> 499</p>
480 500
481<pre caption="~/.screenrc for UTF-8"> 501<pre caption="~/.screenrc for UTF-8">
482defutf8 on 502defutf8 on
483</pre> 503</pre>
484 504
485</body> 505</body>
486</section> 506</section>
487<section> 507<section>
488<title>Vim, Emacs, Xemacs and Nano</title> 508<title>Vim, Emacs, Xemacs and Nano</title>
489<body> 509<body>
490 510
491<p> 511<p>
492Vim, Emacs and Xemacs provide full UTF-8 support, and also have builtin 512Vim, Emacs and Xemacs provide full UTF-8 support, and also have builtin
505<section> 525<section>
506<title>Shells</title> 526<title>Shells</title>
507<body> 527<body>
508 528
509<p> 529<p>
510Currently, <c>bash</c> provides full Unicode support through the GNU readline 530Currently, <c>bash</c> provides full Unicode support through the GNU readline
511library. Z Shell users are in a somewhat worse position -- no parts of the 531library. Z Shell users are in a somewhat worse position -- no parts of the
512shell have Unicode support, although there is a concerted effort to add 532shell have Unicode support, although there is a concerted effort to add
513multibyte character set support underway at the moment. 533multibyte character set support underway at the moment.
514</p> 534</p>
515 535
516<p> 536<p>
517The C shell, <c>tcsh</c> and <c>ksh</c> do not provide UTF-8 support at all. 537The C shell, <c>tcsh</c> and <c>ksh</c> do not provide UTF-8 support at all.
518</p> 538</p>
519 539
540<note>
541Although not strictly related to shells, many of the GNU text-processing
542programs in your system (<c>tr</c>, <c>grep</c>, etc.) are much slower
543when processing Unicode. Nonetheless, the difference is not at all
544noticeable in nearly every case, but if you are ever hit by these bugs
545then at least you will know what is causing them. Perl also tends to be
546slower when operating on multibyte characters. The author knows of one
547other gotcha: <c>tr</c> will not convert three-byte UTF-8 characters to
548two-byte UTF-8 characters.
549</note>
550
520</body> 551</body>
521</section> 552</section>
522<section> 553<section>
523<title>Irssi</title> 554<title>Irssi</title>
524<body> 555<body>
525 556
526<p> 557<p>
527Since 0.8.10, Irssi has complete UTF-8 support, although it does require a user 558Since 0.8.10, Irssi has complete UTF-8 support, although it does require a user
528to set an option. 559to set an option.
529</p> 560</p>
530 561
531<pre caption="Enabling UTF-8 in Irssi"> 562<pre caption="Enabling UTF-8 in Irssi">
532/set term_charset UTF-8 563/set term_charset UTF-8
533</pre> 564</pre>
534 565
544<title>Mutt</title> 575<title>Mutt</title>
545<body> 576<body>
546 577
547<p> 578<p>
548The Mutt mail user agent has very good Unicode support. To use UTF-8 with Mutt, 579The Mutt mail user agent has very good Unicode support. To use UTF-8 with Mutt,
549put the following in your <path>~/.muttrc</path>: 580put the following in your <path>~/.muttrc</path>:
550</p> 581</p>
551 582
552<pre caption="~/.muttrc for UTF-8"> 583<pre caption="~/.muttrc for UTF-8">
553set send_charset="utf8" <comment>(outgoing character set)</comment> 584set send_charset="utf8" <comment>(outgoing character set)</comment>
554set charset="utf8" <comment>(display character set)</comment> 585set charset="utf8" <comment>(display character set)</comment>
555</pre> 586</pre>
556 587
557<note> 588<note>
558You may still see '?' in mail you read with Mutt. This is a result of people 589You may still see '?' in mail you read with Mutt. This is a result of people
559using a mail client which does not indicate the used charset. You can't do much 590using Latin (ISO 8859) or another charset for email transmission. It is best to
560about this than to ask them to configure their client correctly. 591tell them to use UTF-8 for mail, and point them to the IETF RFC 2277 (see
592References at the end of this document). Also note that in some lists,
593subscribers may not like UTF-8. Be sure that the group or person you are
594communicating with does not mind UTF-8.
561</note> 595</note>
562 596
563<p> 597<p>
564Further information is available from the <uri 598Further information is available from the <uri
565link="http://wiki.mutt.org/index.cgi?MuttFaq/Charset"> Mutt WikiWiki</uri>. 599link="http://wiki.mutt.org/index.cgi?MuttFaq/Charset"> Mutt WikiWiki</uri>.
566</p> 600</p>
567 601
568</body> 602</body>
569</section> 603</section>
570<section> 604<section>
571<title>Testing it all out</title> 605<title>Testing it all out</title>
572<body> 606<body>
573 607
574<p> 608<p>
575There are numerous UTF-8 test websites around. <c>net-www/w3m</c>, 609There are numerous UTF-8 test websites around. <c>net-www/w3m</c>,
629Section "InputDevice" 663Section "InputDevice"
630 Identifier "Keyboard0" 664 Identifier "Keyboard0"
631 Driver "kbd" 665 Driver "kbd"
632 Option "XkbLayout" "en_US" <comment># Rather than just "us"</comment> 666 Option "XkbLayout" "en_US" <comment># Rather than just "us"</comment>
633 <comment>(Other Xkb options here)</comment> 667 <comment>(Other Xkb options here)</comment>
634EndSection 668EndSection
635</pre> 669</pre>
636 670
637<note> 671<note>
638The preceding change only needs to be applied if you are using a North American 672The preceding change only needs to be applied if you are using a North American
639layout, or another layout where dead keys do not seem to be working. European 673layout, or another layout where dead keys do not seem to be working. European
640users should have working dead keys as is. 674users should have working dead keys as is.
641</note> 675</note>
642 676
643<p> 677<p>
644This change will come into effect when your X server is restarted. To apply the 678This change will come into effect when the X server is restarted. To apply the
645change now, use the <c>setxkbmap</c> tool, for example, <c>setxkbmap en_US</c>. 679change now, use the <c>setxkbmap</c> tool, for example, <c>setxkbmap en_US</c>.
646</p> 680</p>
647 681
648<p> 682<p>
649It is probably easiest to describe dead keys with examples. Although the 683It is probably easiest to describe dead keys with examples. Although the
650results are locale dependent, the concepts should remain the same regardless of 684results are layout dependent, the concepts should remain the same regardless of
651locale. The examples contain UTF-8, so to view them you need to either tell 685locale. The examples contain UTF-8, so to view them you need to either tell
652your browser to view the page as UTF-8, or have a UTF-8 locale already 686your browser to view the page as UTF-8, or have a UTF-8 locale already
653configured. 687configured.
654</p> 688</p>
655 689
656<p> 690<p>
657When I press AltGr and [ at once, release them, and then press a, 'ä' is 691When I press AltGr and [ at once, release them, and then press a, 'ä' is
658produced. When I press AltGr and [ at once, and then press e, 'ë' is produced. 692produced. When I press AltGr and [ at once, and then press e, 'ë' is
659When I press AltGr and ; at once, 'á' is produced, and when I press AltGr and ; 693produced. When I press AltGr and ; at once, release them, and press a,
660at once, release them, and then press e, 'é' is produced. 694'á' is produced, and when I press AltGr and ; at once, release them, and
695then press e, 'é' is produced.
661</p> 696</p>
662 697
663<p> 698<p>
664By pressing AltGr, Shift and [ at once, releasing them, and then pressing a, a 699By pressing AltGr, Shift and [ at once, releasing them, and then pressing a, a
665Scandinavian 'å' is produced. Similarly, when I press AltGr, Shift and [ at 700Scandinavian 'å' is produced. Similarly, when I press AltGr, Shift and [ at
666once, release <e>only</e> the [, and then press it again, '˚' is produced. 701once, release <e>only</e> the [, and then press it again, '˚' is produced.
667Although it looks like one, this (U+02DA) is not the same as a degree symbol 702Although it looks like one, this (U+02DA) is not the same as a degree symbol
668(U+00B0). This works for other accents produced by dead keys — AltGr and [, 703(U+00B0). This works for other accents produced by dead keys — AltGr and [,
669releasing only the [, then pressing it again makes '¨'. 704releasing only the [, then pressing it again makes '¨'.
670</p> 705</p>
671 706
672<p> 707<p>
673AltGr can be used with alphabetical keys alone. For example, AltGr and m, a 708AltGr can be used with alphabetical keys alone. For example, AltGr and m, a
674Greek lower-case letter mu is produced: 'µ'. 709Greek lower-case letter mu is produced: 'µ'. AltGr and s produce a
710Schauffer's s: 'ß'. As many European users would expect (because it is
711marked on their keyboard), AltGr and 4 produces a Euro sign, '€'.
675</p> 712</p>
676 713
677</body> 714</body>
678</section> 715</section>
679<section> 716<section>
680<title>Resources</title> 717<title>Resources</title>
681<body> 718<body>
682 719
683<ul> 720<ul>
684 <li> 721 <li>
685 <uri link="http://www.wikipedia.com/wiki/Unicode">The Wikipedia entry for 722 <uri link="http://www.wikipedia.com/wiki/Unicode">The Wikipedia entry for
686 Unicode</uri> 723 Unicode</uri>
687 </li> 724 </li>
688 <li> 725 <li>
689 <uri link="http://www.wikipedia.com/wiki/UTF-8">The Wikipedia entry for 726 <uri link="http://www.wikipedia.com/wiki/UTF-8">The Wikipedia entry for
690 UTF-8</uri> 727 UTF-8</uri>
691 </li> 728 </li>
692 <li><uri link="http://www.unicode.org">Unicode.org</uri></li> 729 <li><uri link="http://www.unicode.org">Unicode.org</uri></li>
693 <li><uri link="http://www.utf-8.com">UTF-8.com</uri></li> 730 <li><uri link="http://www.utf-8.com">UTF-8.com</uri></li>
694 <li><uri link="http://www.ietf.org/rfc/rfc3629.txt">RFC 3629</uri></li> 731 <li><uri link="http://www.ietf.org/rfc/rfc3629.txt">RFC 3629</uri></li>
695 <li><uri link="http://www.ietf.org/rfc/rfc2277.txt">RFC 2277</uri></li> 732 <li><uri link="http://www.ietf.org/rfc/rfc2277.txt">RFC 2277</uri></li>
696 <li>
697 <uri
698 link="http://www.tbray.org/ongoing/When/200x/2003/04/26/UTF">Characters vs.
699 Bytes</uri>
700 </li>
701</ul> 733</ul>
702 734
703</body> 735</body>
704</section> 736</section>
705</chapter> 737</chapter>
706</guide> 738</guide>

Legend:
Removed from v.1.9  
changed lines
  Added in v.1.10

  ViewVC Help
Powered by ViewVC 1.1.20