/[gentoo]/xml/htdocs/doc/en/utf-8.xml
Gentoo

Diff of /xml/htdocs/doc/en/utf-8.xml

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

Revision 1.10 Revision 1.41
1<?xml version='1.0' encoding="UTF-8"?> 1<?xml version='1.0' encoding="UTF-8"?>
2<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/utf-8.xml,v 1.10 2005/04/24 03:25:46 bennyc Exp $ --> 2<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/utf-8.xml,v 1.41 2006/07/15 17:22:49 fox2mike Exp $ -->
3<!DOCTYPE guide SYSTEM "/dtd/guide.dtd"> 3<!DOCTYPE guide SYSTEM "/dtd/guide.dtd">
4 4
5<guide link="/doc/en/utf-8.xml"> 5<guide link="/doc/en/utf-8.xml">
6<title>Using UTF-8 with Gentoo</title> 6<title>Using UTF-8 with Gentoo</title>
7 7
9 <mail link="slarti@gentoo.org">Thomas Martin</mail> 9 <mail link="slarti@gentoo.org">Thomas Martin</mail>
10</author> 10</author>
11<author title="Contributor"> 11<author title="Contributor">
12 <mail link="devil@gentoo.org.ua">Alexander Simonov</mail> 12 <mail link="devil@gentoo.org.ua">Alexander Simonov</mail>
13</author> 13</author>
14<author title="Editor">
15 <mail link="fox2mike@gentoo.org">Shyam Mani</mail>
16</author>
14 17
15<abstract> 18<abstract>
16This guide shows you how to set up and use the UTF-8 Unicode character set with 19This guide shows you how to set up and use the UTF-8 Unicode character set with
17your Gentoo Linux system, after explaining the benefits of Unicode and more 20your Gentoo Linux system, after explaining the benefits of Unicode and more
18specifically UTF-8. 21specifically UTF-8.
19</abstract> 22</abstract>
20 23
24<!-- The content of this document is licensed under the CC-BY-SA license -->
25<!-- See http://creativecommons.org/licenses/by-sa/2.5 -->
21<license /> 26<license />
22 27
23<version>1.5</version> 28<version>2.20</version>
24<date>2005-04-23</date> 29<date>2006-07-15</date>
25 30
26<chapter> 31<chapter>
27<title>Character Encodings</title> 32<title>Character Encodings</title>
28<section> 33<section>
29<title>What is a Character Encoding?</title> 34<title>What is a Character Encoding?</title>
106<section> 111<section>
107<title>What is Unicode?</title> 112<title>What is Unicode?</title>
108<body> 113<body>
109 114
110<p> 115<p>
111Unicode throws away the traditional single-byte limit of character sets, and 116Unicode throws away the traditional single-byte limit of character sets. It
112even with two bytes per-character this allows a maximum 65,536 characters. 117uses 17 "planes" of 65,536 code points to describe a maximum of 1,114,112
113Although this number is extremely high when compared to seven-bit and eight-bit 118characters. As the first plane, aka. "Basic Multilingual Plane" or BMP,
114encodings, it is still not enough for a character set designed to be used for 119contains almost everything you will ever use, many have made the wrong
115symbols and scripts used only by scholars, and symbols that are only used in 120assumption that Unicode was a 16-bit character set.
116mathematics and other specialised fields.
117</p> 121</p>
118 122
119<p> 123<p>
120Unicode has been mapped in many different ways, but the two most common are 124Unicode has been mapped in many different ways, but the two most common are
121<b>UTF</b> (Unicode Transformation Format) and <b>UCS</b> (Universal Character 125<b>UTF</b> (Unicode Transformation Format) and <b>UCS</b> (Universal Character
148<title>What UTF-8 Can Do for You</title> 152<title>What UTF-8 Can Do for You</title>
149<body> 153<body>
150 154
151<p> 155<p>
152UTF-8 allows you to work in a standards-compliant and internationally accepted 156UTF-8 allows you to work in a standards-compliant and internationally accepted
153multilingual environment, with a comparitively low data redundancy. UTF-8 is 157multilingual environment, with a comparatively low data redundancy. UTF-8 is
154the preferred way for transmitting non-ASCII characters over the Internet, 158the preferred way for transmitting non-ASCII characters over the Internet,
155through Email, IRC or almost any other medium. Despite this, many people regard 159through Email, IRC or almost any other medium. Despite this, many people regard
156UTF-8 in online communication as abusive. It is always best to be aware of the 160UTF-8 in online communication as abusive. It is always best to be aware of the
157attitude towards UTF-8 in a specific channel, mailing list or Usenet group 161attitude towards UTF-8 in a specific channel, mailing list or Usenet group
158before using <e>non-ASCII</e> UTF-8. 162before using <e>non-ASCII</e> UTF-8.
191 195
192<pre caption="Checking for an existing UTF-8 locale"> 196<pre caption="Checking for an existing UTF-8 locale">
193<comment>(Replace "en_GB" with your desired locale setting)</comment> 197<comment>(Replace "en_GB" with your desired locale setting)</comment>
194# <i>locale -a | grep 'en_GB'</i> 198# <i>locale -a | grep 'en_GB'</i>
195en_GB 199en_GB
196en_GB.utf8 200en_GB.UTF-8
197</pre> 201</pre>
198 202
199<p> 203<p>
200From the output of this command line, we need to take the result with a suffix 204From the output of this command line, we need to take the result with a suffix
201similar to <c>.utf8</c>. If there is no result with a suffix similar to 205similar to <c>.UTF-8</c>. If there is no result with a suffix similar to
202<c>.utf8</c>, we need to create a UTF-8 compatible locale. 206<c>.UTF-8</c>, we need to create a UTF-8 compatible locale.
203</p> 207</p>
204 208
205<note> 209<note>
206Only execute the following code listing if you do not have a UTF-8 locale 210Only execute the following code listing if you do not have a UTF-8 locale
207available for your language. 211available for your language.
208</note> 212</note>
209 213
210<pre caption="Creating a UTF-8 locale"> 214<pre caption="Creating a UTF-8 locale">
211<comment>(Replace "en_GB" with your desired locale setting)</comment> 215<comment>(Replace "en_GB" with your desired locale setting)</comment>
212# <i>localedef -i en_GB -f UTF-8 en_GB.utf8</i> 216# <i>localedef -i en_GB -f UTF-8 en_GB.UTF-8</i>
217</pre>
218
219<p>
220Another way to include a UTF-8 locale is to add it to the
221<path>/etc/locales.build</path> file and rebuild <c>glibc</c> with the
222<c>userlocales</c> USE flag set.
223</p>
224
225<pre caption="Line in /etc/locales.build">
226en_GB.UTF-8/UTF-8
213</pre> 227</pre>
214 228
215</body> 229</body>
216</section> 230</section>
217<section> 231<section>
218<title>Setting the Locale</title> 232<title>Setting the Locale</title>
219<body> 233<body>
220 234
221<p> 235<p>
222There are two environment variables that need to be set in order to use 236There is one environment variable that needs to be set in order to use
223our new UTF-8 locales: <c>LANG</c> and <c>LC_ALL</c>. There are also 237our new UTF-8 locales: <c>LC_ALL</c> (this variable overrides the <c>LANG</c>
224many different ways to set them; some people prefer to only have a UTF-8 238setting as well). There are also many different ways to set it; some people
225environment for a specific user, in which case they set them in their 239prefer to only have a UTF-8 environment for a specific user, in which case
240they set them in their <path>~/.profile</path> (if you use <c>/bin/sh</c>),
226<path>~/.profile</path> or <path>~/.bashrc</path>. Others prefer to set the 241<path>~/.bash_profile</path> or <path>~/.bashrc</path> (if you use
227locale globally. One specific circumstance where the author particularly 242<c>/bin/bash</c>).
243</p>
244
245<p>
246Others prefer to set the locale globally. One specific circumstance where
247the author particularly recommends doing this is when
228recommends doing this is when <path>/etc/init.d/xdm</path> is in use, because 248<path>/etc/init.d/xdm</path> is in use, because
229this init script starts the display manager and desktop before any of the 249this init script starts the display manager and desktop before any of the
230aforementioned shell startup files are sourced, and so before any of the 250aforementioned shell startup files are sourced, and so before any of the
231variables are in the environment. 251variables are in the environment.
232</p> 252</p>
233 253
234<p> 254<p>
235Setting the locale globally should be done using 255Setting the locale globally should be done using
236<path>/etc/env.d/02local</path>. The file should look something like the 256<path>/etc/env.d/02locale</path>. The file should look something like the
237following: 257following:
238</p> 258</p>
239 259
240<pre caption="Demonstration /etc/env.d/02locale"> 260<pre caption="Demonstration /etc/env.d/02locale">
241<comment>(As always, change "en_GB.UTF-8" to your locale)</comment> 261<comment>(As always, change "en_GB.UTF-8" to your locale)</comment>
242LC_ALL="en_GB.UTF-8" 262LC_ALL="en_GB.UTF-8"
243LOCALE="en_GB.UTF-8"
244</pre> 263</pre>
245 264
246<p> 265<p>
247Next, the environment must be updated with the change. 266Next, the environment must be updated with the change.
248</p> 267</p>
249 268
250<pre caption="Updating the environment"> 269<pre caption="Updating the environment">
251# <i>env-update</i> 270# <i>env-update</i>
252>>> Regenerating /etc/ld.so.cache... 271>>> Regenerating /etc/ld.so.cache...
253 * Caching service dependencies ... 272 * Caching service dependencies ...
254 # <i>source /etc/profile</i> 273# <i>source /etc/profile</i>
255</pre> 274</pre>
256 275
257<p> 276<p>
258Now, run <c>locale</c> with no arguments to see if we have the correct 277Now, run <c>locale</c> with no arguments to see if we have the correct
259variables in our environment: 278variables in our environment:
260</p> 279</p>
261 280
262<pre caption="Checking if our new locale is in the environment"> 281<pre caption="Checking if our new locale is in the environment">
263# <i>locale</i> 282# <i>locale</i>
264LANG=en_GB.UTF-8 283LANG=
265LC_CTYPE="en_GB.UTF-8" 284LC_CTYPE="en_GB.UTF-8"
266LC_NUMERIC="en_GB.UTF-8" 285LC_NUMERIC="en_GB.UTF-8"
267LC_TIME="en_GB.UTF-8" 286LC_TIME="en_GB.UTF-8"
268LC_COLLATE="en_GB.UTF-8" 287LC_COLLATE="en_GB.UTF-8"
269LC_MONETARY="en_GB.UTF-8" 288LC_MONETARY="en_GB.UTF-8"
276LC_IDENTIFICATION="en_GB.UTF-8" 295LC_IDENTIFICATION="en_GB.UTF-8"
277LC_ALL=en_GB.UTF-8 296LC_ALL=en_GB.UTF-8
278</pre> 297</pre>
279 298
280<p> 299<p>
281That is all. You are now using UTF-8 locales, and the next hurdle is the 300That's everything. You are now using UTF-8 locales, and the next hurdle is the
282configuration of the applications you use from day to day. 301configuration of the applications you use from day to day.
283</p> 302</p>
284 303
285</body> 304</body>
286</section> 305</section>
320 your FAT filesystems or Joilet CD-ROMs.)</comment> 339 your FAT filesystems or Joilet CD-ROMs.)</comment>
321</pre> 340</pre>
322 341
323<p> 342<p>
324If you plan on mounting NTFS partitions, you may need to specify an <c>nls=</c> 343If you plan on mounting NTFS partitions, you may need to specify an <c>nls=</c>
325option with mount. For more information, see <c>man mount</c>. 344option with mount. If you plan on mounting FAT partitions, you may need to
345specify a <c>codepage=</c> option with mount. Optionally, you can also set a
346default codepage for FAT in the kernel configuration. Note that the
347<c>codepage</c> option with mount will override the kernel settings.
348</p>
349
350<pre caption="FAT settings in kernel configuration">
351File Systems --&gt;
352 DOS/FAT/NT Filesystems --&gt;
353 (437) Default codepage for fat
354</pre>
355
356<p>
357You should avoid setting <c>Default iocharset for fat</c> to UTF-8, as it is
358not recommended. Instead, you may want to pass the option utf8=true when
359mounting your FAT partitions. For further information, see <c>man mount</c> and
360the kernel documentation at
361<path>/usr/src/linux/Documentation/filesystems/vfat.txt</path>.
326</p> 362</p>
327 363
328<p> 364<p>
329For changing the encoding of filenames, <c>app-text/convmv</c> can be used. 365For changing the encoding of filenames, <c>app-text/convmv</c> can be used.
330</p> 366</p>
331 367
332<pre caption="Example usage of convmv"> 368<pre caption="Example usage of convmv">
333# <i>emerge --ask app-text/convmv</i> 369# <i>emerge --ask app-text/convmv</i>
370<comment>(Command format)</comment>
334# <i>convmv -f current-encoding -t utf-8 filename</i> 371# <i>convmv -f &lt;current-encoding&gt; -t utf-8 &lt;filename&gt;</i>
372<comment>(Substitute iso-8859-1 with the charset you are converting
373from)</comment>
374# <i>convmv -f iso-8859-1 -t utf-8 filename</i>
335</pre> 375</pre>
336 376
337<p> 377<p>
338For changing the <e>contents</e> of files, use the <c>iconv</c> utility, 378For changing the <e>contents</e> of files, use the <c>iconv</c> utility,
339bundled with <c>glibc</c>: 379bundled with <c>glibc</c>:
368making the most of Unicode. 408making the most of Unicode.
369</p> 409</p>
370 410
371<p> 411<p>
372The <c>KEYMAP</c> variable, set in <path>/etc/conf.d/keymaps</path>, should 412The <c>KEYMAP</c> variable, set in <path>/etc/conf.d/keymaps</path>, should
373have a Unicode keymap specified. To do this, simply prepend the keymap already 413have a Unicode keymap specified.
374specified there with -u.
375</p> 414</p>
376 415
377<pre caption="Example /etc/conf.d/keymaps snippet"> 416<pre caption="Example /etc/conf.d/keymaps snippet">
378<comment>(Change "uk" to your local layout)</comment> 417<comment>(Change "uk" to your local layout)</comment>
379KEYMAP="uk" 418KEYMAP="uk"
391</note> 430</note>
392 431
393<p> 432<p>
394It is wise to add <c>unicode</c> to your global USE flags in 433It is wise to add <c>unicode</c> to your global USE flags in
395<path>/etc/make.conf</path>, and then to remerge <c>sys-libs/ncurses</c> and 434<path>/etc/make.conf</path>, and then to remerge <c>sys-libs/ncurses</c> and
396also <c>sys-libs/slang</c> if appropriate: 435<c>sys-libs/slang</c> if appropriate. Portage will do this automatically when
436you update your system:
397</p> 437</p>
398 438
399<pre caption="Emerging ncurses and slang"> 439<pre caption="Updating your system">
400<comment>(We avoid putting these libraries in our world file with --oneshot)</comment> 440# <i>emerge --update --deep --newuse world</i>
401# <i>emerge --oneshot --verbose --ask sys-libs/ncurses sys-libs/slang</i>
402</pre> 441</pre>
403 442
404<p> 443<p>
405We also need to rebuild packages that link to these, now the USE changes have 444We also need to rebuild packages that link to these, now the USE changes have
406been applied. 445been applied. The tool we use (<c>revdep-rebuild</c>) is part of the
446<c>gentoolkit</c> package.
407</p> 447</p>
408 448
409<pre caption="Rebuilding of programs that link to ncurses or slang"> 449<pre caption="Rebuilding of programs that link to ncurses or slang">
410# <i>revdep-rebuild --soname libncurses.so.5</i> 450# <i>revdep-rebuild --soname libncurses.so.5</i>
411# <i>revdep-rebuild --soname libslang.so.1</i> 451# <i>revdep-rebuild --soname libslang.so.1</i>
455</section> 495</section>
456<section> 496<section>
457<title>X11 and Fonts</title> 497<title>X11 and Fonts</title>
458<body> 498<body>
459 499
500<impo>
501<c>x11-base/xorg-x11</c> has far better support for Unicode than XFree86
502and is <e>highly</e> recommended.
503</impo>
504
460<p> 505<p>
461TrueType fonts have support for Unicode, and most of the fonts that ship with 506TrueType fonts have support for Unicode, and most of the fonts that ship with
462Xorg have impressive character support, although, obviously, not every single 507Xorg have impressive character support, although, obviously, not every single
463glyph available in Unicode has been created for that font. To build fonts 508glyph available in Unicode has been created for that font. To build fonts
464(including the Bitstream Vera set) with support for East Asian letters with X, 509(including the Bitstream Vera set) with support for East Asian letters with X,
479<section> 524<section>
480<title>Window Managers and Terminal Emulators</title> 525<title>Window Managers and Terminal Emulators</title>
481<body> 526<body>
482 527
483<p> 528<p>
484Window managers, even those not built on GTK or Qt, generally have very 529Window managers not built on GTK or Qt generally have very good Unicode
485good Unicode support, as they often use the Xft library for handling 530support, as they often use the Xft library for handling fonts. If your window
486fonts. If your window manager does not use Xft for fonts, you can still 531manager does not use Xft for fonts, you can still use the FontSpec mentioned in
487use the FontSpec mentioned in the previous section as a Unicode font. 532the previous section as a Unicode font.
488</p> 533</p>
489 534
490<p> 535<p>
491Terminal emulators that use Xft and support Unicode are harder to come by. 536Terminal emulators that use Xft and support Unicode are harder to come by.
492Aside from Konsole and gnome-terminal, the best options in Portage are 537Aside from Konsole and gnome-terminal, the best options in Portage are
493<c>x11-terms/rxvt-unicode</c>, <c>xfce-extra/terminal</c>, 538<c>x11-terms/rxvt-unicode</c>, <c>xfce-extra/terminal</c>,
494<c>gnustep-apps/terminal</c>, <c>x11-terms/mlterm</c>, <c>x11-terms/mrxvt</c> or 539<c>gnustep-apps/terminal</c>, <c>x11-terms/mlterm</c>, or plain
495plain <c>x11-terms/xterm</c> when built with the <c>unicode</c> USE flag and 540<c>x11-terms/xterm</c> when built with the <c>unicode</c> USE flag and invoked
496invoked as <c>uxterm</c>. <c>app-misc/screen</c> supports UTF-8 too, when 541as <c>uxterm</c>. <c>app-misc/screen</c> supports UTF-8 too, when invoked as
497invoked as <c>screen -U</c> or the following is put into the 542<c>screen -U</c> or the following is put into the <path>~/.screenrc</path>:
498<path>~/.screenrc</path>:
499</p> 543</p>
500 544
501<pre caption="~/.screenrc for UTF-8"> 545<pre caption="~/.screenrc for UTF-8">
502defutf8 on 546defutf8 on
503</pre> 547</pre>
507<section> 551<section>
508<title>Vim, Emacs, Xemacs and Nano</title> 552<title>Vim, Emacs, Xemacs and Nano</title>
509<body> 553<body>
510 554
511<p> 555<p>
512Vim, Emacs and Xemacs provide full UTF-8 support, and also have builtin 556Vim provides full UTF-8 support, and also has builtin detection of UTF-8 files.
513detection of UTF-8 files. For further information in Vim, use <c>:help 557For further information in Vim, use <c>:help mbyte.txt</c>.
514mbyte.txt</c>.
515</p>
516
517<p> 558</p>
518Nano currently does not provide support for UTF-8, although it has been planned 559
519for a long time. With luck, this will change in future. At the time of writing, 560<p>
520UTF-8 support is in Nano's CVS, and should be included in the next release. 561Emacs 22.x and higher has full UTF-8 support as well. Xemacs 22.x does not
562support combining characters yet.
563</p>
564
565<p>
566Lower versions of Emacs and/or Xemacs might require you to install
567<c>app-emacs/mule-ucs</c> and/or <c>app-xemacs/mule-ucs</c>
568and add the following code to your <path>~/.emacs</path> to have support for CJK
569languages in UTF-8:
570</p>
571
572<pre caption="Emacs CJK UTF-8 support">
573(require 'un-define)
574(require 'jisx0213)
575(set-language-environment "Japanese")
576(set-default-coding-systems 'utf-8)
577(set-terminal-coding-system 'utf-8)
578</pre>
579
580<p>
581Nano has provided full UTF-8 support since version 1.3.6.
521</p> 582</p>
522 583
523</body> 584</body>
524</section> 585</section>
525<section> 586<section>
535 596
536<p> 597<p>
537The C shell, <c>tcsh</c> and <c>ksh</c> do not provide UTF-8 support at all. 598The C shell, <c>tcsh</c> and <c>ksh</c> do not provide UTF-8 support at all.
538</p> 599</p>
539 600
540<note>
541Although not strictly related to shells, many of the GNU text-processing
542programs in your system (<c>tr</c>, <c>grep</c>, etc.) are much slower
543when processing Unicode. Nonetheless, the difference is not at all
544noticeable in nearly every case, but if you are ever hit by these bugs
545then at least you will know what is causing them. Perl also tends to be
546slower when operating on multibyte characters. The author knows of one
547other gotcha: <c>tr</c> will not convert three-byte UTF-8 characters to
548two-byte UTF-8 characters.
549</note>
550
551</body> 601</body>
552</section> 602</section>
553<section> 603<section>
554<title>Irssi</title> 604<title>Irssi</title>
555<body> 605<body>
556 606
557<p> 607<p>
558Since 0.8.10, Irssi has complete UTF-8 support, although it does require a user 608Irssi has complete UTF-8 support, although it does require a user
559to set an option. 609to set an option.
560</p> 610</p>
561 611
562<pre caption="Enabling UTF-8 in Irssi"> 612<pre caption="Enabling UTF-8 in Irssi">
563/set term_charset UTF-8 613/set term_charset UTF-8
575<title>Mutt</title> 625<title>Mutt</title>
576<body> 626<body>
577 627
578<p> 628<p>
579The Mutt mail user agent has very good Unicode support. To use UTF-8 with Mutt, 629The Mutt mail user agent has very good Unicode support. To use UTF-8 with Mutt,
580put the following in your <path>~/.muttrc</path>: 630you don't need to put anything in your configuration files. Mutt will work
581</p> 631under unicode enviroment without modification if all your configuration files
582 632(signature included) are UTF-8 encoded.
583<pre caption="~/.muttrc for UTF-8">
584set send_charset="utf8" <comment>(outgoing character set)</comment>
585set charset="utf8" <comment>(display character set)</comment>
586</pre> 633</p>
587 634
588<note> 635<note>
589You may still see '?' in mail you read with Mutt. This is a result of people 636You may still see '?' in mail you read with Mutt. This is a result of people
590using Latin (ISO 8859) or another charset for email transmission. It is best to 637using a mail client which does not indicate the used charset. You can't do much
591tell them to use UTF-8 for mail, and point them to the IETF RFC 2277 (see 638about this than to ask them to configure their client correctly.
592References at the end of this document). Also note that in some lists,
593subscribers may not like UTF-8. Be sure that the group or person you are
594communicating with does not mind UTF-8.
595</note> 639</note>
596 640
597<p> 641<p>
598Further information is available from the <uri 642Further information is available from the <uri
599link="http://wiki.mutt.org/index.cgi?MuttFaq/Charset"> Mutt WikiWiki</uri>. 643link="http://wiki.mutt.org/index.cgi?MuttFaq/Charset">Mutt Wiki</uri>.
644</p>
645
646</body>
647</section>
648<section>
649<title>Less</title>
650<body>
651
600</p> 652<p>
653We all use a lot of <c>more</c> or <c>less</c> along with the <c>|</c> to be
654able to correctly see the output of a command, like for example
655<c>dmesg | less</c>. While <c>more</c> only needs the shell to be UTF-8 aware,
656<c>less</c> needs an environment variable set, <c>LESSCHARSET</c> to ensure
657that unicode characters are rendered correctly. This can be set in
658<path>/etc/profile</path> or <path>~/.bash_profile</path>. Fire up the editor
659of your choice and the add the following line to one of the files mentioned
660above.
661</p>
662
663<pre caption="Setting up the Environment variable for less">
664LESSCHARSET=utf-8
665</pre>
666
667</body>
668</section>
669<section>
670<title>Man</title>
671<body>
672
673<p>
674Man pages are an integral part of any Linux machine. To ensure that any
675unicode in your man pages render correctly, edit <path>/etc/man.conf</path>
676and replace a line as shown below.
677</p>
678
679<pre caption="man.conf changes for Unicode support">
680<comment>(This is the old line)</comment>
681NROFF /usr/bin/nroff -Tascii -c -mandoc
682<comment>(Replace the one above with this)</comment>
683NROFF /usr/bin/nroff -mandoc -c
684</pre>
685
686</body>
687</section>
688<section>
689<title>elinks and links</title>
690<body>
691
692<p>
693These are commonly used text-based browsers, and we shall see how we can enable
694UTF-8 support on them. On <c>elinks</c> and <c>links</c>, there are two ways to
695go about this, one using the Setup option from within the browser or editing the
696config file. To set the option through the browser, open a site with
697<c>elinks</c> or <c>links</c> and then <c>Alt+S</c> to enter the Setup Menu then
698select Terminal options, or press <c>T</c>. Scroll down and select the last
699option <c>UTF-8 I/O</c> by pressing Enter. Then Save and exit the menu. On
700<c>links</c> you may have to do a repeat <c>Alt+S</c> and then press <c>S</c> to
701save. The config file option, is shown below.
702</p>
703
704<pre caption="Enabling UTF-8 for elinks/links">
705<comment>(For elinks, edit /etc/elinks/elinks.conf or ~/.elinks/elinks.conf and
706add the following line)</comment>
707set terminal.linux.utf_8_io = 1
708
709<comment>(For links, edit ~/.links/links.cfg and add the following
710line)</comment>
711terminal "xterm" 0 1 0 us-ascii utf-8
712</pre>
713
714</body>
715</section>
716<section>
717<title>Samba</title>
718<body>
719
720<p>
721Samba is a software suite which implements the SMB (Server Message Block)
722protocol for UNIX systems such as Macs, Linux and FreeBSD. The protocol
723is also sometimes referred to as the Common Internet File System (CIFS). Samba
724also includes the NetBOIS system - used for file sharing over windows networks.
725</p>
726
727<pre caption="Enabling UTF-8 for Samba">
728<comment>(Edit /etc/samba/smb.conf and add the following under the [global] section)</comment>
729dos charset = 1255
730unix charset = UTF-8
731display charset = UTF-8
732</pre>
601 733
602</body> 734</body>
603</section> 735</section>
604<section> 736<section>
605<title>Testing it all out</title> 737<title>Testing it all out</title>
673layout, or another layout where dead keys do not seem to be working. European 805layout, or another layout where dead keys do not seem to be working. European
674users should have working dead keys as is. 806users should have working dead keys as is.
675</note> 807</note>
676 808
677<p> 809<p>
678This change will come into effect when the X server is restarted. To apply the 810This change will come into effect when your X server is restarted. To apply the
679change now, use the <c>setxkbmap</c> tool, for example, <c>setxkbmap en_US</c>. 811change now, use the <c>setxkbmap</c> tool, for example, <c>setxkbmap en_US</c>.
680</p> 812</p>
681 813
682<p> 814<p>
683It is probably easiest to describe dead keys with examples. Although the 815It is probably easiest to describe dead keys with examples. Although the
684results are layout dependent, the concepts should remain the same regardless of 816results are locale dependent, the concepts should remain the same regardless of
685locale. The examples contain UTF-8, so to view them you need to either tell 817locale. The examples contain UTF-8, so to view them you need to either tell
686your browser to view the page as UTF-8, or have a UTF-8 locale already 818your browser to view the page as UTF-8, or have a UTF-8 locale already
687configured. 819configured.
688</p> 820</p>
689 821
690<p> 822<p>
691When I press AltGr and [ at once, release them, and then press a, 'ä' is 823When I press AltGr and [ at once, release them, and then press a, 'ä' is
692produced. When I press AltGr and [ at once, and then press e, 'ë' is 824produced. When I press AltGr and [ at once, and then press e, 'ë' is produced.
693produced. When I press AltGr and ; at once, release them, and press a, 825When I press AltGr and ; at once, 'á' is produced, and when I press AltGr and ;
694'á' is produced, and when I press AltGr and ; at once, release them, and 826at once, release them, and then press e, 'é' is produced.
695then press e, 'é' is produced.
696</p> 827</p>
697 828
698<p> 829<p>
699By pressing AltGr, Shift and [ at once, releasing them, and then pressing a, a 830By pressing AltGr, Shift and [ at once, releasing them, and then pressing a, a
700Scandinavian 'å' is produced. Similarly, when I press AltGr, Shift and [ at 831Scandinavian 'å' is produced. Similarly, when I press AltGr, Shift and [ at
705</p> 836</p>
706 837
707<p> 838<p>
708AltGr can be used with alphabetical keys alone. For example, AltGr and m, a 839AltGr can be used with alphabetical keys alone. For example, AltGr and m, a
709Greek lower-case letter mu is produced: 'µ'. AltGr and s produce a 840Greek lower-case letter mu is produced: 'µ'. AltGr and s produce a
710Schauffer's s: 'ß'. As many European users would expect (because it is 841scharfes s or esszet: 'ß'. As many European users would expect (because
711marked on their keyboard), AltGr and 4 produces a Euro sign, '€'. 842it is marked on their keyboard), AltGr and 4 (or E depending on the keyboard
843layout) produces a Euro sign, '€'.
712</p> 844</p>
713 845
714</body> 846</body>
715</section> 847</section>
716<section> 848<section>
717<title>Resources</title> 849<title>Resources</title>
718<body> 850<body>
719 851
720<ul> 852<ul>
721 <li> 853 <li>
722 <uri link="http://www.wikipedia.com/wiki/Unicode">The Wikipedia entry for 854 <uri link="http://en.wikipedia.org/wiki/Unicode">The Wikipedia entry for
723 Unicode</uri> 855 Unicode</uri>
724 </li> 856 </li>
725 <li> 857 <li>
726 <uri link="http://www.wikipedia.com/wiki/UTF-8">The Wikipedia entry for 858 <uri link="http://en.wikipedia.org/wiki/UTF-8">The Wikipedia entry for
727 UTF-8</uri> 859 UTF-8</uri>
728 </li> 860 </li>
729 <li><uri link="http://www.unicode.org">Unicode.org</uri></li> 861 <li><uri link="http://www.unicode.org">Unicode.org</uri></li>
730 <li><uri link="http://www.utf-8.com">UTF-8.com</uri></li> 862 <li><uri link="http://www.utf-8.com">UTF-8.com</uri></li>
731 <li><uri link="http://www.ietf.org/rfc/rfc3629.txt">RFC 3629</uri></li> 863 <li><uri link="http://www.ietf.org/rfc/rfc3629.txt">RFC 3629</uri></li>
732 <li><uri link="http://www.ietf.org/rfc/rfc2277.txt">RFC 2277</uri></li> 864 <li><uri link="http://www.ietf.org/rfc/rfc2277.txt">RFC 2277</uri></li>
865 <li>
866 <uri
867 link="http://www.tbray.org/ongoing/When/200x/2003/04/26/UTF">Characters vs.
868 Bytes</uri>
869 </li>
733</ul> 870</ul>
734 871
735</body> 872</body>
736</section> 873</section>
737</chapter> 874</chapter>

Legend:
Removed from v.1.10  
changed lines
  Added in v.1.41

  ViewVC Help
Powered by ViewVC 1.1.20