/[gentoo]/xml/htdocs/doc/en/draft/debugging-howto.xml
Gentoo

Contents of /xml/htdocs/doc/en/draft/debugging-howto.xml

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (hide annotations) (download) (as text)
Wed Jul 13 05:55:39 2005 UTC (9 years, 2 months ago) by fox2mike
Branch: MAIN
File MIME type: application/xml
Initial Version of the debugging-guide, originally part of the bugzilla-howto.

1 fox2mike 1.1 <?xml version="1.0" encoding="UTF-8"?>
2     <!DOCTYPE guide SYSTEM "/dtd/guide.dtd">
3     <!-- $Header$ -->
4    
5     <guide link="/doc/en/debugging-howto.xml">
6     <title>Gentoo Linux Debugging Guide</title>
7    
8     <author title="Author">
9     <mail link="chriswhite@gentoo.org">Chris White</mail>
10     </author>
11     <author title="Editor">
12     <mail link="fox2mike@gentoo.org">Shyam Mani</mail>
13     </author>
14    
15     <abstract>
16     This document aims at helping the user debug various errors they may encounter
17     during day to day usage of Gentoo.
18     </abstract>
19    
20     <!-- The content of this document is licensed under the CC-BY-SA license -->
21     <!-- See http://creativecommons.org/licenses/by-sa/2.5 -->
22     <license/>
23    
24     <version>1.0</version>
25     <date>2005-07-13</date>
26    
27     <chapter>
28     <title>Introduction</title>
29     <section>
30     <title>Preface</title>
31     <body>
32    
33     <p>
34     One of the factors that delay a bug being fixed is the way it is reported. By
35     creating this guide, we hope to help improve the communication between
36     developers and users in bug resolution. Getting bugs fixed is an important, if
37     not crucial part of the quality assurance for any project and hopefully this
38     guide will help make that a success.
39     </p>
40    
41     </body>
42     </section>
43     <section>
44     <title>Bugs!!!!</title>
45     <body>
46    
47     <p>
48     You're emerge-ing a package or working with a program and suddenly the worst
49     happens -- you find a bug. Bugs come in many forms like emerge failures or
50     segmentation faults. Whatever the cause, the fact still remains that such a bug
51     must be fixed. Here is a few examples of such bugs.
52     </p>
53    
54     <pre caption="A run time error">
55     $ <i>./bad_code `perl -e 'print Ax100'`</i>
56     Segmentation fault
57     </pre>
58    
59     <pre caption="An emerge failure">
60     /usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.2/include/g++-v3/backward/backward_warning.h:32:2:
61     warning: #warning This file includes at least one deprecated or antiquated
62     header. Please consider using one of the 32 headers found in section 17.4.1.2 of
63     the C++ standard. Examples include substituting the &lt;X&gt; header for the &lt;X.h&gt;
64     header for C++ includes, or &lt;sstream&gt; instead of the deprecated header
65     &lt;strstream.h&gt;. To disable this warning use -Wno-deprecated.
66     In file included from main.cc:40:
67     menudef.h:55: error: brace-enclosed initializer used to initialize `
68     OXPopupMenu*'
69     menudef.h:62: error: brace-enclosed initializer used to initialize `
70     OXPopupMenu*'
71     menudef.h:70: error: brace-enclosed initializer used to initialize `
72     OXPopupMenu*'
73     menudef.h:78: error: brace-enclosed initializer used to initialize `
74     OXPopupMenu*'
75     main.cc: In member function `void OXMain::DoOpen()':
76     main.cc:323: warning: unused variable `FILE*fp'
77     main.cc: In member function `void OXMain::DoSave(char*)':
78     main.cc:337: warning: unused variable `FILE*fp'
79     make[1]: *** [main.o] Error 1
80     make[1]: Leaving directory
81     `/var/tmp/portage/xclass-0.7.4/work/xclass-0.7.4/example-app'
82     make: *** [shared] Error 2
83    
84     !!! ERROR: x11-libs/xclass-0.7.4 failed.
85     !!! Function src_compile, Line 29, Exitcode 2
86     !!! 'emake shared' failed
87     </pre>
88    
89     <p>
90     These errors can be quite troublesome. However, once you find them, what do
91     you do? The following sections will look at two important tools for handling
92     run time errors. After that, we'll take a look at compile errors, and how to
93     handle them. Let's start out with the first tool for debugging run time
94     errors -- <c>gdb</c>.
95     </p>
96    
97     </body>
98     </section>
99     </chapter>
100    
101    
102     <chapter>
103     <title>Debugging using GDB</title>
104     <section>
105     <title>Introduction</title>
106     <body>
107    
108     <p>
109     GDB, or the (G)NU (D)e(B)ugger, is a program used to find run time errors that
110     normally involve memory corruption. First off, let's take a look at what
111     debugging entails. One of the main things you must do in order to debug a
112     program is to <c>emerge</c> the program with <c>FEATURES="nostrip"</c>. This
113     prevents the stripping of debug symbols. Why are programs stripped by default?
114     The reason is the same as that for having gzipped man pages -- saving space.
115     Here's how the size of a program varies with and without debug symbol stripping.
116     </p>
117    
118     <pre caption="Filesize Comparison">
119     <comment>(debug symbols stripped)</comment>
120     -rwxr-xr-x 1 chris users 3140 6/28 13:11 bad_code
121     <comment>(debug symbols intact)</comment>
122     -rwxr-xr-x 1 chris users 6374 6/28 13:10 bad_code
123     </pre>
124    
125     <p>
126     Just for reference, <e>bad_code</e> is the program we'll be debugging with
127     <c>gdb</c> later on. As you can see, the program without debugging symbols is
128     3140 bytes, while the program with them is 6374 bytes. That's close to double
129     the size! Two more things can be done for debugging. The first is adding ggdb3
130     to your CFLAGS and CXXFLAGS. This flag adds more debugging information than is
131     generally included. We'll see what that means later on. This is how
132     <path>/etc/make.conf</path> <e>might</e> look with the newly added flags.
133     </p>
134    
135     <pre caption="make.conf settings">
136     CFLAGS="-O2 -pipe -ggdb3"
137     CXXFLAGS="${CFLAGS}"
138     </pre>
139    
140     <p>
141     Lastly, you can also add debug to the package's USE flags. This can be done with the
142     <path>package.use</path> file.
143     </p>
144    
145     <pre caption="Using package.use to add debug USE flag">
146     # <i>echo "category/package debug" >> /etc/portage/package.use</i>
147     </pre>
148    
149     <note>
150     The directory <path>/etc/portage</path> does not exist by default and you may
151     have to create it, if you have not already done so. If the package already has
152     USE flags set in <path>package.use</path>, you will need to manually modify them
153     in your favorite editor.
154     </note>
155    
156     <p>
157     Then we re-emerge the package with the modifications we've done so far as shown
158     below.
159     </p>
160    
161     <pre caption="Re-emergeing a package with debugging">
162     # <i>FEATURES="nostrip" emerge package</i>
163     </pre>
164    
165     <p>
166     Now that debug symbols are setup, we can continue with debugging the program.
167     </p>
168    
169     </body>
170     </section>
171     <section>
172     <title>Running the program with GDB</title>
173     <body>
174    
175     <p>
176     Let's say we have a program here called "bad_code". Some person claims that the
177     program crashes and provides an example. You go ahead and test it out:
178     </p>
179    
180     <pre caption="Breaking The Program">
181     $ <i>./bad_code `perl -e 'print Ax100'`</i>
182     Segmentation fault
183     </pre>
184    
185     <p>
186     It seems this person was right. Since the program is obviously broken, we have
187     a bug at hand. Now, it's time to use <c>gdb</c> to help solve this matter. First
188     we run <c>gdb</c> with <c>--args</c>, then give it the full program with
189     arguments like shown:
190     </p>
191    
192     <pre caption="Running Our Program Through GDB">
193     $ <i>gdb --args ./bad_code `perl -e 'print Ax100'`</i>
194     GNU gdb 6.3
195     Copyright 2004 Free Software Foundation, Inc.
196     GDB is free software, covered by the GNU General Public License, and you are
197     welcome to change it and/or distribute copies of it under certain conditions.
198     Type "show copying" to see the conditions.
199     There is absolutely no warranty for GDB. Type "show warranty" for details.
200     This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1".
201     </pre>
202    
203     <note>
204     One can also debug with core dumps. These core files contain the same
205     information that the program would produce when run with gdb. In order to debug
206     with a core file with bad_code, you would run <c>gdb ./bad_code core</c> where
207     core is the name of the core file.
208     </note>
209    
210     <p>
211     You should see a prompt that says "(gdb)" and waits for input. First, we have to
212     run the program. We type in <c>run</c> at the command and receive a notice like:
213     </p>
214    
215     <pre caption="Running the program in GDB">
216     (gdb) <i>run</i>
217     Starting program: /home/chris/bad_code
218    
219     Program received signal SIGSEGV, Segmentation fault.
220     0xb7ec6dc0 in strcpy () from /lib/libc.so.6
221     </pre>
222    
223     <p>
224     Here we see the program starting, as well as a notification of SIGSEGV, or
225     Segmentation Fault. This is GDB telling us that our program has crashed. It
226     also gives the last run function it could trace when the program crashes.
227     However, this isn't too useful, as there could be multiple strcpy's in the
228     program, making it hard for developers to find which one is causing the issue.
229     In order to help them out, we do what's called a backtrace. A backtrace runs
230     backwards through all the functions that occurred upon program execution, to the
231     function at fault. Functions that return (without causing a crash) will not show
232     up on the backtrace. To get a backtrace, at the (gdb) prompt, type in <c>bt</c>.
233     You will get something like this:
234     </p>
235    
236     <pre caption="Program backtrace">
237     (gdb) <i>bt</i>
238     #0 0xb7ec6dc0 in strcpy () from /lib/libc.so.6
239     #1 0x0804838c in run_it ()
240     #2 0x080483ba in main ()
241     </pre>
242    
243     <p>
244     You can notice the trace pattern clearly. main() is called first, followed by
245     run_it(), and somewhere in run_it() lies the strcpy() at fault. Things such as
246     this help developers narrow down problems. There are a few exceptions to the
247     output. First off is forgetting to enable debug symbols with
248     <c>FEATURES="nostrip"</c>. With debug symbols stripped, the output looks something
249     like this:
250     </p>
251    
252     <pre caption="Program backtrace With debug symbols stripped">
253     (gdb) <i>bt</i>
254     #0 0xb7e2cdc0 in strcpy () from /lib/libc.so.6
255     #1 0x0804838c in ?? ()
256     #2 0xbfd19510 in ?? ()
257     #3 0x00000000 in ?? ()
258     #4 0x00000000 in ?? ()
259     #5 0xb7eef148 in libgcc_s_personality () from /lib/libc.so.6
260     #6 0x080482ed in ?? ()
261     #7 0x080495b0 in ?? ()
262     #8 0xbfd19528 in ?? ()
263     #9 0xb7dd73b8 in __guard_setup () from /lib/libc.so.6
264     #10 0xb7dd742d in __guard_setup () from /lib/libc.so.6
265     #11 0x00000006 in ?? ()
266     #12 0xbfd19548 in ?? ()
267     #13 0x080483ba in ?? ()
268     #14 0x00000000 in ?? ()
269     #15 0x00000000 in ?? ()
270     #16 0xb7deebcc in __new_exitfn () from /lib/libc.so.6
271     #17 0x00000000 in ?? ()
272     #18 0xbfd19560 in ?? ()
273     #19 0xb7ef017c in nullserv () from /lib/libc.so.6
274     #20 0xb7dd6f37 in __libc_start_main () from /lib/libc.so.6
275     #21 0x00000001 in ?? ()
276     #22 0xbfd195d4 in ?? ()
277     #23 0xbfd195dc in ?? ()
278     #24 0x08048201 in ?? ()
279     </pre>
280    
281     <p>
282     This backtrace contains a large number of ?? marks. This is because without
283     debug symbols, <c>gdb</c> doesn't know how the program was run. Hence, it is
284     crucial that debug symbols are <e>not</e> stripped. Now remember a while ago we
285     mentioned the -ggdb3 flag. Let's see what the output looks like with the flag
286     enabled:
287     </p>
288    
289     <pre caption="Program backtrace with -ggdb3">
290     (gdb) <i>bt</i>
291     #0 0xb7e4bdc0 in strcpy () from /lib/libc.so.6
292     #1 0x0804838c in run_it (input=0x0) at bad_code.c:7
293     #2 0x080483ba in main (argc=1, argv=0xbfd3a434) at bad_code.c:12
294     </pre>
295    
296     <p>
297     Here we see that a lot more information is available for developers. Not only is
298     function information displayed, but even the exact line numbers of the source
299     files. This method is the most preferred if you can spare the extra space.
300     Here's how much the file size varies between debug, strip, and -ggdb3 enabled
301     programs.
302     </p>
303    
304     <pre caption="Filesize differences With -ggdb3 flag">
305     <comment>(debug symbols stripped)</comment>
306     -rwxr-xr-x 1 chris users 3140 6/28 13:11 bad_code
307     <comment>(debug symbols enabled)</comment>
308     -rwxr-xr-x 1 chris users 6374 6/28 13:10 bad_code
309     <comment>(-ggdb3 flag enabled)</comment>
310     -rwxr-xr-x 1 chris users 19552 6/28 13:11 bad_code
311     </pre>
312    
313     <p>
314     As you can see, -ggdb3 adds about <e>13178</e> more bytes to the file size over the one
315     with debugging symbols. However, as shown above, this increase in file size can
316     be worth it if presenting debug information to developers. The backtrace can be
317     saved to a file by copying and pasting from the terminal (if it's a non-x based
318     terminal, you can use gpm. To keep this doc simple, I recommend you read up on
319     the documentation for gpm to see how to copy and paste with it). Now that we're
320     done with <c>gdb</c>, we can quit.
321     </p>
322    
323     <pre caption="Quitting GDB">
324     (gdb) <i>quit</i>
325     The program is running. Exit anyway? (y or n) <i>y</i>
326     $
327     </pre>
328    
329     <p>
330     This ends the walk-through of <c>gdb</c>. Using <c>gdb</c>, we hope that you will
331     be able to use it to create better bug reports. However, there are other types
332     of errors that can cause a program to fail during run time. One of the other
333     ways is through improper file access. We can find those using a nifty little
334     tool called <c>strace</c>.
335     </p>
336    
337     </body>
338     </section>
339     </chapter>
340    
341     <chapter>
342     <title>Finding file access errors using strace</title>
343     <section>
344     <title>Introduction</title>
345     <body>
346    
347     <p>
348     Programs often use files to fetch configuration information, access hardware or
349     write logs. Sometimes, a program attempts to reach such files incorrectly. A
350     tool called <c>strace</c> was created to help deal with this. <c>strace</c>
351     traces system calls (hence the name) which include calls that use the memory and
352     files. For our example, we're going to take a program foobar2. This is an
353     updated version of foobar. However, during the change over to foobar2, you notice
354     all your configurations are missing! In foobar version 1, you had it setup to
355     say "foo", but now it's using the default "bar".
356     </p>
357    
358     <pre caption="Foobar2 With an invalid configuration">
359     $ <i>./foobar2</i>
360     Configuration says: bar
361     </pre>
362    
363     <p>
364     Our previous configuration specifically had it set to foo, so let's use
365     <c>strace</c> to find out what's going on.
366     </p>
367    
368     </body>
369     </section>
370     <section>
371     <title>Using strace to track the issue</title>
372     <body>
373    
374     <p>
375     We make <c>strace</c> log the results of the system calls. To do this, we run
376     <c>strace</c> with the -o[file] arguments. Let's use it on foobar2 as shown.
377     </p>
378    
379     <pre caption="Running foobar2 through strace">
380     # <i>strace -ostrace.log ./foobar2</i>
381     </pre>
382    
383     <p>
384     This creates a file called <path>strace.log</path> in the current directory. We
385     check the file, and shown below are the relevant parts from the file.
386     </p>
387    
388     <pre caption="A Look At the strace Log">
389     open(".foobar2/config", O_RDONLY) = 3
390     read(3, "bar", 3) = 3
391     </pre>
392    
393     <p>
394     Aha! So There's the problem. Someone moved the configuration directory to
395     <path>.foobar2</path> instead of <path>.foobar</path>. We also see the program
396     reading in "bar" as it should. In this case, we can recommend the ebuild
397     maintainer to put a warning about it. For now though, we can copy over the
398     config file from <path>.foobar</path> and modify it to produce the correct
399     results.
400     </p>
401    
402     </body>
403     </section>
404     <section>
405     <title>Conclusion</title>
406     <body>
407    
408     <p>
409     Now we've taken care of finding run time bugs. These bugs prove to be
410     problematic when you try and run your programs. However, run time errors are
411     the least of your concerns if your program won't compile at all. Let's take a
412     look at how to address <c>emerge</c> compile errors.
413     </p>
414    
415     </body>
416     </section>
417     </chapter>
418    
419     <chapter>
420     <title>Handling emerge Errors</title>
421     <section>
422     <title>Introduction</title>
423     <body>
424    
425     <p>
426     <c>emerge</c> errors, such as the one displayed earlier, can be a major cause
427     of frustration for users. Reporting them is considered crucial for maintaining
428     the health of Gentoo. Let's take a look at a sample ebuild, foobar2, which
429     contains some build errors.
430     </p>
431    
432     </body>
433     </section>
434     <section id="emerge_error">
435     <title>Evaluating emerge Errors</title>
436     <body>
437    
438     <p>
439     Let's take a look at this very simple <c>emerge</c> error:
440     </p>
441    
442     <pre caption="emerge Error">
443     gcc -D__TEST__ -D__GNU__ -D__LINUX__ -L/usr/lib -I/usr/include -L/usr/lib/nspr/ -I/usr/include/fmod -c -o foobar2-7.o foobar2-7.c
444     gcc -D__TEST__ -D__GNU__ -D__LINUX__ -L/usr/lib -I/usr/include -L/usr/lib/nspr/ -I/usr/include/fmod -c -o foobar2-8.o foobar2-8.c
445     gcc -D__TEST__ -D__GNU__ -D__LINUX__ -L/usr/lib -I/usr/include -L/usr/lib/nspr/ -I/usr/include/fmod -c -o foobar2-9.o foobar2-9.c
446     gcc -D__TEST__ -D__GNU__ -D__LINUX__ -L/usr/lib -I/usr/include -L/usr/lib/nspr/ -I/usr/include/fmod -c -o foobar2.o foobar2.c
447     foobar2.c:1:17: ogg.h: No such file or directory
448     make: *** [foobar2.o] Error 1
449    
450     !!! ERROR: sys-apps/foobar2-1.0 failed.
451     !!! Function src_compile, Line 19, Exitcode 2
452     !!! Make failed!
453     !!! If you need support, post the topmost build error, NOT this status message
454     </pre>
455    
456     <p>
457     The program is compiling smoothly when it suddenly stops and presents an error message. This
458     particular error can be split into 3 different sections, The compile messages, the build
459     error, and the emerge error message as shown below.
460     </p>
461    
462     <pre caption="Parts of the error">
463     <comment>(Compilation Messages)</comment>
464     gcc -D__TEST__ -D__GNU__ -D__LINUX__ -L/usr/lib -I/usr/include -L/usr/lib/nspr/ -I/usr/include/fmod -c -o foobar2-7.o foobar2-7.c
465     gcc -D__TEST__ -D__GNU__ -D__LINUX__ -L/usr/lib -I/usr/include -L/usr/lib/nspr/ -I/usr/include/fmod -c -o foobar2-8.o foobar2-8.c
466     gcc -D__TEST__ -D__GNU__ -D__LINUX__ -L/usr/lib -I/usr/include -L/usr/lib/nspr/ -I/usr/include/fmod -c -o foobar2-9.o foobar2-9.c
467     gcc -D__TEST__ -D__GNU__ -D__LINUX__ -L/usr/lib -I/usr/include -L/usr/lib/nspr/ -I/usr/include/fmod -c -o foobar2.o foobar2.c
468    
469     <comment>(Build Error)</comment>
470     foobar2.c:1:17: ogg.h: No such file or directory
471     make: *** [foobar2.o] Error 1
472    
473     <comment>(emerge Error)</comment>
474     !!! ERROR: sys-apps/foobar2-1.0 failed.
475     !!! Function src_compile, Line 19, Exitcode 2
476     !!! Make failed!
477     !!! If you need support, post the topmost build error, NOT this status message
478     </pre>
479    
480     <p>
481     The compilation messages are what lead up to the error. Most often, it's good to
482     at least include 10 lines of compile information so that the developer knows
483     where the compilation was at when the error occurred.
484     </p>
485    
486     <p>
487     Make errors are the actual error and the information the developer needs. When
488     you see "make: ***", this is often where the error has occurred. Normally, you
489     can copy and paste 10 lines above it and the developer will be able to address
490     the issue. However, this may not always work and we'll take a look at an
491     alternative shortly.
492     </p>
493    
494     <p>
495     The emerge error is what <c>emerge</c> throws out as an error. Sometimes, this
496     might also contain some important information. Often people make the mistake of
497     posting the emerge error and that's all. This is useless by itself, but with
498     make error and compile information, a developer can get what application and
499     what version of the package is failing. As a side note, make is commonly used as
500     the build process for programs (<b>but not always</b>). If you can't find a
501     "make: ***" error anywhere, then simply copy and paste 20 lines before the
502     emerge error. This should take care of most all build system error messages. Now
503     let's say the errors seem to be quite large. 10 lines won't be enough to catch
504     everything. That's where PORT_LOGDIR comes into play.
505     </p>
506    
507     </body>
508     </section>
509     <section>
510     <title>emerge and PORT_LOGDIR</title>
511     <body>
512    
513     <p>
514     PORT_LOGDIR is a portage variable that sets up a log directory for separate
515     emerge logs. Let's take a look and see what that entails. First, run your emerge
516     with PORT_LOGDIR set to your favorite log location. Let's say we have a
517     location <path>/var/log/portage</path>. We'll use that for our log directory:
518     </p>
519    
520     <note>
521     In the default setup, <path>/var/log/portage</path> does not exist, and you will
522     most likely have to create it. If you do not, portage will fail to write the
523     logs.
524     </note>
525    
526     <pre caption="emerge-ing With PORT_LOGDIR">
527     # <i>PORT_LOGDIR=/var/log/portage emerge foobar2</i>
528     </pre>
529    
530     <p>
531     Now the emerge fails again. However, this time we have a log we can work with,
532     and attach to the bug later on. Let's take a quick look at our log directory.
533     </p>
534    
535     <pre caption="PORT_LOGDIR Contents">
536     # <i>ls -la /var/log/portage</i>
537     total 16
538     drwxrws--- 2 root root 4096 Jun 30 10:08 .
539     drwxr-xr-x 15 root root 4096 Jun 30 10:08 ..
540     -rw-r--r-- 1 root root 7390 Jun 30 10:09 2115-foobar2-1.0.log
541     </pre>
542    
543     <p>
544     The log files have the format [counter]-[package name]-[version].log. Counter
545     is a special variable that is meant to state this package as the n-th package
546     you've emerged. This prevents duplicate logs from appearing. A quick look at
547     the log file will show the entire emerge process. This can be attached later
548     on as we'll see in the bug reporting section. Now that we've safely obtained
549     our information needed to report the bug we can continue to do so. However,
550     before we get started on that, we need to make sure no one else has reported
551     the issue.
552     </p>
553    
554     </body>
555     </section>
556     </chapter>
557     </guide>

  ViewVC Help
Powered by ViewVC 1.1.20