/[gentoo]/xml/htdocs/doc/en/draft/debugging-howto.xml
Gentoo

Contents of /xml/htdocs/doc/en/draft/debugging-howto.xml

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (show annotations) (download) (as text)
Wed Jul 13 05:55:39 2005 UTC (9 years, 2 months ago) by fox2mike
Branch: MAIN
File MIME type: application/xml
Initial Version of the debugging-guide, originally part of the bugzilla-howto.

1 <?xml version="1.0" encoding="UTF-8"?>
2 <!DOCTYPE guide SYSTEM "/dtd/guide.dtd">
3 <!-- $Header$ -->
4
5 <guide link="/doc/en/debugging-howto.xml">
6 <title>Gentoo Linux Debugging Guide</title>
7
8 <author title="Author">
9 <mail link="chriswhite@gentoo.org">Chris White</mail>
10 </author>
11 <author title="Editor">
12 <mail link="fox2mike@gentoo.org">Shyam Mani</mail>
13 </author>
14
15 <abstract>
16 This document aims at helping the user debug various errors they may encounter
17 during day to day usage of Gentoo.
18 </abstract>
19
20 <!-- The content of this document is licensed under the CC-BY-SA license -->
21 <!-- See http://creativecommons.org/licenses/by-sa/2.5 -->
22 <license/>
23
24 <version>1.0</version>
25 <date>2005-07-13</date>
26
27 <chapter>
28 <title>Introduction</title>
29 <section>
30 <title>Preface</title>
31 <body>
32
33 <p>
34 One of the factors that delay a bug being fixed is the way it is reported. By
35 creating this guide, we hope to help improve the communication between
36 developers and users in bug resolution. Getting bugs fixed is an important, if
37 not crucial part of the quality assurance for any project and hopefully this
38 guide will help make that a success.
39 </p>
40
41 </body>
42 </section>
43 <section>
44 <title>Bugs!!!!</title>
45 <body>
46
47 <p>
48 You're emerge-ing a package or working with a program and suddenly the worst
49 happens -- you find a bug. Bugs come in many forms like emerge failures or
50 segmentation faults. Whatever the cause, the fact still remains that such a bug
51 must be fixed. Here is a few examples of such bugs.
52 </p>
53
54 <pre caption="A run time error">
55 $ <i>./bad_code `perl -e 'print Ax100'`</i>
56 Segmentation fault
57 </pre>
58
59 <pre caption="An emerge failure">
60 /usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.2/include/g++-v3/backward/backward_warning.h:32:2:
61 warning: #warning This file includes at least one deprecated or antiquated
62 header. Please consider using one of the 32 headers found in section 17.4.1.2 of
63 the C++ standard. Examples include substituting the &lt;X&gt; header for the &lt;X.h&gt;
64 header for C++ includes, or &lt;sstream&gt; instead of the deprecated header
65 &lt;strstream.h&gt;. To disable this warning use -Wno-deprecated.
66 In file included from main.cc:40:
67 menudef.h:55: error: brace-enclosed initializer used to initialize `
68 OXPopupMenu*'
69 menudef.h:62: error: brace-enclosed initializer used to initialize `
70 OXPopupMenu*'
71 menudef.h:70: error: brace-enclosed initializer used to initialize `
72 OXPopupMenu*'
73 menudef.h:78: error: brace-enclosed initializer used to initialize `
74 OXPopupMenu*'
75 main.cc: In member function `void OXMain::DoOpen()':
76 main.cc:323: warning: unused variable `FILE*fp'
77 main.cc: In member function `void OXMain::DoSave(char*)':
78 main.cc:337: warning: unused variable `FILE*fp'
79 make[1]: *** [main.o] Error 1
80 make[1]: Leaving directory
81 `/var/tmp/portage/xclass-0.7.4/work/xclass-0.7.4/example-app'
82 make: *** [shared] Error 2
83
84 !!! ERROR: x11-libs/xclass-0.7.4 failed.
85 !!! Function src_compile, Line 29, Exitcode 2
86 !!! 'emake shared' failed
87 </pre>
88
89 <p>
90 These errors can be quite troublesome. However, once you find them, what do
91 you do? The following sections will look at two important tools for handling
92 run time errors. After that, we'll take a look at compile errors, and how to
93 handle them. Let's start out with the first tool for debugging run time
94 errors -- <c>gdb</c>.
95 </p>
96
97 </body>
98 </section>
99 </chapter>
100
101
102 <chapter>
103 <title>Debugging using GDB</title>
104 <section>
105 <title>Introduction</title>
106 <body>
107
108 <p>
109 GDB, or the (G)NU (D)e(B)ugger, is a program used to find run time errors that
110 normally involve memory corruption. First off, let's take a look at what
111 debugging entails. One of the main things you must do in order to debug a
112 program is to <c>emerge</c> the program with <c>FEATURES="nostrip"</c>. This
113 prevents the stripping of debug symbols. Why are programs stripped by default?
114 The reason is the same as that for having gzipped man pages -- saving space.
115 Here's how the size of a program varies with and without debug symbol stripping.
116 </p>
117
118 <pre caption="Filesize Comparison">
119 <comment>(debug symbols stripped)</comment>
120 -rwxr-xr-x 1 chris users 3140 6/28 13:11 bad_code
121 <comment>(debug symbols intact)</comment>
122 -rwxr-xr-x 1 chris users 6374 6/28 13:10 bad_code
123 </pre>
124
125 <p>
126 Just for reference, <e>bad_code</e> is the program we'll be debugging with
127 <c>gdb</c> later on. As you can see, the program without debugging symbols is
128 3140 bytes, while the program with them is 6374 bytes. That's close to double
129 the size! Two more things can be done for debugging. The first is adding ggdb3
130 to your CFLAGS and CXXFLAGS. This flag adds more debugging information than is
131 generally included. We'll see what that means later on. This is how
132 <path>/etc/make.conf</path> <e>might</e> look with the newly added flags.
133 </p>
134
135 <pre caption="make.conf settings">
136 CFLAGS="-O2 -pipe -ggdb3"
137 CXXFLAGS="${CFLAGS}"
138 </pre>
139
140 <p>
141 Lastly, you can also add debug to the package's USE flags. This can be done with the
142 <path>package.use</path> file.
143 </p>
144
145 <pre caption="Using package.use to add debug USE flag">
146 # <i>echo "category/package debug" >> /etc/portage/package.use</i>
147 </pre>
148
149 <note>
150 The directory <path>/etc/portage</path> does not exist by default and you may
151 have to create it, if you have not already done so. If the package already has
152 USE flags set in <path>package.use</path>, you will need to manually modify them
153 in your favorite editor.
154 </note>
155
156 <p>
157 Then we re-emerge the package with the modifications we've done so far as shown
158 below.
159 </p>
160
161 <pre caption="Re-emergeing a package with debugging">
162 # <i>FEATURES="nostrip" emerge package</i>
163 </pre>
164
165 <p>
166 Now that debug symbols are setup, we can continue with debugging the program.
167 </p>
168
169 </body>
170 </section>
171 <section>
172 <title>Running the program with GDB</title>
173 <body>
174
175 <p>
176 Let's say we have a program here called "bad_code". Some person claims that the
177 program crashes and provides an example. You go ahead and test it out:
178 </p>
179
180 <pre caption="Breaking The Program">
181 $ <i>./bad_code `perl -e 'print Ax100'`</i>
182 Segmentation fault
183 </pre>
184
185 <p>
186 It seems this person was right. Since the program is obviously broken, we have
187 a bug at hand. Now, it's time to use <c>gdb</c> to help solve this matter. First
188 we run <c>gdb</c> with <c>--args</c>, then give it the full program with
189 arguments like shown:
190 </p>
191
192 <pre caption="Running Our Program Through GDB">
193 $ <i>gdb --args ./bad_code `perl -e 'print Ax100'`</i>
194 GNU gdb 6.3
195 Copyright 2004 Free Software Foundation, Inc.
196 GDB is free software, covered by the GNU General Public License, and you are
197 welcome to change it and/or distribute copies of it under certain conditions.
198 Type "show copying" to see the conditions.
199 There is absolutely no warranty for GDB. Type "show warranty" for details.
200 This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1".
201 </pre>
202
203 <note>
204 One can also debug with core dumps. These core files contain the same
205 information that the program would produce when run with gdb. In order to debug
206 with a core file with bad_code, you would run <c>gdb ./bad_code core</c> where
207 core is the name of the core file.
208 </note>
209
210 <p>
211 You should see a prompt that says "(gdb)" and waits for input. First, we have to
212 run the program. We type in <c>run</c> at the command and receive a notice like:
213 </p>
214
215 <pre caption="Running the program in GDB">
216 (gdb) <i>run</i>
217 Starting program: /home/chris/bad_code
218
219 Program received signal SIGSEGV, Segmentation fault.
220 0xb7ec6dc0 in strcpy () from /lib/libc.so.6
221 </pre>
222
223 <p>
224 Here we see the program starting, as well as a notification of SIGSEGV, or
225 Segmentation Fault. This is GDB telling us that our program has crashed. It
226 also gives the last run function it could trace when the program crashes.
227 However, this isn't too useful, as there could be multiple strcpy's in the
228 program, making it hard for developers to find which one is causing the issue.
229 In order to help them out, we do what's called a backtrace. A backtrace runs
230 backwards through all the functions that occurred upon program execution, to the
231 function at fault. Functions that return (without causing a crash) will not show
232 up on the backtrace. To get a backtrace, at the (gdb) prompt, type in <c>bt</c>.
233 You will get something like this:
234 </p>
235
236 <pre caption="Program backtrace">
237 (gdb) <i>bt</i>
238 #0 0xb7ec6dc0 in strcpy () from /lib/libc.so.6
239 #1 0x0804838c in run_it ()
240 #2 0x080483ba in main ()
241 </pre>
242
243 <p>
244 You can notice the trace pattern clearly. main() is called first, followed by
245 run_it(), and somewhere in run_it() lies the strcpy() at fault. Things such as
246 this help developers narrow down problems. There are a few exceptions to the
247 output. First off is forgetting to enable debug symbols with
248 <c>FEATURES="nostrip"</c>. With debug symbols stripped, the output looks something
249 like this:
250 </p>
251
252 <pre caption="Program backtrace With debug symbols stripped">
253 (gdb) <i>bt</i>
254 #0 0xb7e2cdc0 in strcpy () from /lib/libc.so.6
255 #1 0x0804838c in ?? ()
256 #2 0xbfd19510 in ?? ()
257 #3 0x00000000 in ?? ()
258 #4 0x00000000 in ?? ()
259 #5 0xb7eef148 in libgcc_s_personality () from /lib/libc.so.6
260 #6 0x080482ed in ?? ()
261 #7 0x080495b0 in ?? ()
262 #8 0xbfd19528 in ?? ()
263 #9 0xb7dd73b8 in __guard_setup () from /lib/libc.so.6
264 #10 0xb7dd742d in __guard_setup () from /lib/libc.so.6
265 #11 0x00000006 in ?? ()
266 #12 0xbfd19548 in ?? ()
267 #13 0x080483ba in ?? ()
268 #14 0x00000000 in ?? ()
269 #15 0x00000000 in ?? ()
270 #16 0xb7deebcc in __new_exitfn () from /lib/libc.so.6
271 #17 0x00000000 in ?? ()
272 #18 0xbfd19560 in ?? ()
273 #19 0xb7ef017c in nullserv () from /lib/libc.so.6
274 #20 0xb7dd6f37 in __libc_start_main () from /lib/libc.so.6
275 #21 0x00000001 in ?? ()
276 #22 0xbfd195d4 in ?? ()
277 #23 0xbfd195dc in ?? ()
278 #24 0x08048201 in ?? ()
279 </pre>
280
281 <p>
282 This backtrace contains a large number of ?? marks. This is because without
283 debug symbols, <c>gdb</c> doesn't know how the program was run. Hence, it is
284 crucial that debug symbols are <e>not</e> stripped. Now remember a while ago we
285 mentioned the -ggdb3 flag. Let's see what the output looks like with the flag
286 enabled:
287 </p>
288
289 <pre caption="Program backtrace with -ggdb3">
290 (gdb) <i>bt</i>
291 #0 0xb7e4bdc0 in strcpy () from /lib/libc.so.6
292 #1 0x0804838c in run_it (input=0x0) at bad_code.c:7
293 #2 0x080483ba in main (argc=1, argv=0xbfd3a434) at bad_code.c:12
294 </pre>
295
296 <p>
297 Here we see that a lot more information is available for developers. Not only is
298 function information displayed, but even the exact line numbers of the source
299 files. This method is the most preferred if you can spare the extra space.
300 Here's how much the file size varies between debug, strip, and -ggdb3 enabled
301 programs.
302 </p>
303
304 <pre caption="Filesize differences With -ggdb3 flag">
305 <comment>(debug symbols stripped)</comment>
306 -rwxr-xr-x 1 chris users 3140 6/28 13:11 bad_code
307 <comment>(debug symbols enabled)</comment>
308 -rwxr-xr-x 1 chris users 6374 6/28 13:10 bad_code
309 <comment>(-ggdb3 flag enabled)</comment>
310 -rwxr-xr-x 1 chris users 19552 6/28 13:11 bad_code
311 </pre>
312
313 <p>
314 As you can see, -ggdb3 adds about <e>13178</e> more bytes to the file size over the one
315 with debugging symbols. However, as shown above, this increase in file size can
316 be worth it if presenting debug information to developers. The backtrace can be
317 saved to a file by copying and pasting from the terminal (if it's a non-x based
318 terminal, you can use gpm. To keep this doc simple, I recommend you read up on
319 the documentation for gpm to see how to copy and paste with it). Now that we're
320 done with <c>gdb</c>, we can quit.
321 </p>
322
323 <pre caption="Quitting GDB">
324 (gdb) <i>quit</i>
325 The program is running. Exit anyway? (y or n) <i>y</i>
326 $
327 </pre>
328
329 <p>
330 This ends the walk-through of <c>gdb</c>. Using <c>gdb</c>, we hope that you will
331 be able to use it to create better bug reports. However, there are other types
332 of errors that can cause a program to fail during run time. One of the other
333 ways is through improper file access. We can find those using a nifty little
334 tool called <c>strace</c>.
335 </p>
336
337 </body>
338 </section>
339 </chapter>
340
341 <chapter>
342 <title>Finding file access errors using strace</title>
343 <section>
344 <title>Introduction</title>
345 <body>
346
347 <p>
348 Programs often use files to fetch configuration information, access hardware or
349 write logs. Sometimes, a program attempts to reach such files incorrectly. A
350 tool called <c>strace</c> was created to help deal with this. <c>strace</c>
351 traces system calls (hence the name) which include calls that use the memory and
352 files. For our example, we're going to take a program foobar2. This is an
353 updated version of foobar. However, during the change over to foobar2, you notice
354 all your configurations are missing! In foobar version 1, you had it setup to
355 say "foo", but now it's using the default "bar".
356 </p>
357
358 <pre caption="Foobar2 With an invalid configuration">
359 $ <i>./foobar2</i>
360 Configuration says: bar
361 </pre>
362
363 <p>
364 Our previous configuration specifically had it set to foo, so let's use
365 <c>strace</c> to find out what's going on.
366 </p>
367
368 </body>
369 </section>
370 <section>
371 <title>Using strace to track the issue</title>
372 <body>
373
374 <p>
375 We make <c>strace</c> log the results of the system calls. To do this, we run
376 <c>strace</c> with the -o[file] arguments. Let's use it on foobar2 as shown.
377 </p>
378
379 <pre caption="Running foobar2 through strace">
380 # <i>strace -ostrace.log ./foobar2</i>
381 </pre>
382
383 <p>
384 This creates a file called <path>strace.log</path> in the current directory. We
385 check the file, and shown below are the relevant parts from the file.
386 </p>
387
388 <pre caption="A Look At the strace Log">
389 open(".foobar2/config", O_RDONLY) = 3
390 read(3, "bar", 3) = 3
391 </pre>
392
393 <p>
394 Aha! So There's the problem. Someone moved the configuration directory to
395 <path>.foobar2</path> instead of <path>.foobar</path>. We also see the program
396 reading in "bar" as it should. In this case, we can recommend the ebuild
397 maintainer to put a warning about it. For now though, we can copy over the
398 config file from <path>.foobar</path> and modify it to produce the correct
399 results.
400 </p>
401
402 </body>
403 </section>
404 <section>
405 <title>Conclusion</title>
406 <body>
407
408 <p>
409 Now we've taken care of finding run time bugs. These bugs prove to be
410 problematic when you try and run your programs. However, run time errors are
411 the least of your concerns if your program won't compile at all. Let's take a
412 look at how to address <c>emerge</c> compile errors.
413 </p>
414
415 </body>
416 </section>
417 </chapter>
418
419 <chapter>
420 <title>Handling emerge Errors</title>
421 <section>
422 <title>Introduction</title>
423 <body>
424
425 <p>
426 <c>emerge</c> errors, such as the one displayed earlier, can be a major cause
427 of frustration for users. Reporting them is considered crucial for maintaining
428 the health of Gentoo. Let's take a look at a sample ebuild, foobar2, which
429 contains some build errors.
430 </p>
431
432 </body>
433 </section>
434 <section id="emerge_error">
435 <title>Evaluating emerge Errors</title>
436 <body>
437
438 <p>
439 Let's take a look at this very simple <c>emerge</c> error:
440 </p>
441
442 <pre caption="emerge Error">
443 gcc -D__TEST__ -D__GNU__ -D__LINUX__ -L/usr/lib -I/usr/include -L/usr/lib/nspr/ -I/usr/include/fmod -c -o foobar2-7.o foobar2-7.c
444 gcc -D__TEST__ -D__GNU__ -D__LINUX__ -L/usr/lib -I/usr/include -L/usr/lib/nspr/ -I/usr/include/fmod -c -o foobar2-8.o foobar2-8.c
445 gcc -D__TEST__ -D__GNU__ -D__LINUX__ -L/usr/lib -I/usr/include -L/usr/lib/nspr/ -I/usr/include/fmod -c -o foobar2-9.o foobar2-9.c
446 gcc -D__TEST__ -D__GNU__ -D__LINUX__ -L/usr/lib -I/usr/include -L/usr/lib/nspr/ -I/usr/include/fmod -c -o foobar2.o foobar2.c
447 foobar2.c:1:17: ogg.h: No such file or directory
448 make: *** [foobar2.o] Error 1
449
450 !!! ERROR: sys-apps/foobar2-1.0 failed.
451 !!! Function src_compile, Line 19, Exitcode 2
452 !!! Make failed!
453 !!! If you need support, post the topmost build error, NOT this status message
454 </pre>
455
456 <p>
457 The program is compiling smoothly when it suddenly stops and presents an error message. This
458 particular error can be split into 3 different sections, The compile messages, the build
459 error, and the emerge error message as shown below.
460 </p>
461
462 <pre caption="Parts of the error">
463 <comment>(Compilation Messages)</comment>
464 gcc -D__TEST__ -D__GNU__ -D__LINUX__ -L/usr/lib -I/usr/include -L/usr/lib/nspr/ -I/usr/include/fmod -c -o foobar2-7.o foobar2-7.c
465 gcc -D__TEST__ -D__GNU__ -D__LINUX__ -L/usr/lib -I/usr/include -L/usr/lib/nspr/ -I/usr/include/fmod -c -o foobar2-8.o foobar2-8.c
466 gcc -D__TEST__ -D__GNU__ -D__LINUX__ -L/usr/lib -I/usr/include -L/usr/lib/nspr/ -I/usr/include/fmod -c -o foobar2-9.o foobar2-9.c
467 gcc -D__TEST__ -D__GNU__ -D__LINUX__ -L/usr/lib -I/usr/include -L/usr/lib/nspr/ -I/usr/include/fmod -c -o foobar2.o foobar2.c
468
469 <comment>(Build Error)</comment>
470 foobar2.c:1:17: ogg.h: No such file or directory
471 make: *** [foobar2.o] Error 1
472
473 <comment>(emerge Error)</comment>
474 !!! ERROR: sys-apps/foobar2-1.0 failed.
475 !!! Function src_compile, Line 19, Exitcode 2
476 !!! Make failed!
477 !!! If you need support, post the topmost build error, NOT this status message
478 </pre>
479
480 <p>
481 The compilation messages are what lead up to the error. Most often, it's good to
482 at least include 10 lines of compile information so that the developer knows
483 where the compilation was at when the error occurred.
484 </p>
485
486 <p>
487 Make errors are the actual error and the information the developer needs. When
488 you see "make: ***", this is often where the error has occurred. Normally, you
489 can copy and paste 10 lines above it and the developer will be able to address
490 the issue. However, this may not always work and we'll take a look at an
491 alternative shortly.
492 </p>
493
494 <p>
495 The emerge error is what <c>emerge</c> throws out as an error. Sometimes, this
496 might also contain some important information. Often people make the mistake of
497 posting the emerge error and that's all. This is useless by itself, but with
498 make error and compile information, a developer can get what application and
499 what version of the package is failing. As a side note, make is commonly used as
500 the build process for programs (<b>but not always</b>). If you can't find a
501 "make: ***" error anywhere, then simply copy and paste 20 lines before the
502 emerge error. This should take care of most all build system error messages. Now
503 let's say the errors seem to be quite large. 10 lines won't be enough to catch
504 everything. That's where PORT_LOGDIR comes into play.
505 </p>
506
507 </body>
508 </section>
509 <section>
510 <title>emerge and PORT_LOGDIR</title>
511 <body>
512
513 <p>
514 PORT_LOGDIR is a portage variable that sets up a log directory for separate
515 emerge logs. Let's take a look and see what that entails. First, run your emerge
516 with PORT_LOGDIR set to your favorite log location. Let's say we have a
517 location <path>/var/log/portage</path>. We'll use that for our log directory:
518 </p>
519
520 <note>
521 In the default setup, <path>/var/log/portage</path> does not exist, and you will
522 most likely have to create it. If you do not, portage will fail to write the
523 logs.
524 </note>
525
526 <pre caption="emerge-ing With PORT_LOGDIR">
527 # <i>PORT_LOGDIR=/var/log/portage emerge foobar2</i>
528 </pre>
529
530 <p>
531 Now the emerge fails again. However, this time we have a log we can work with,
532 and attach to the bug later on. Let's take a quick look at our log directory.
533 </p>
534
535 <pre caption="PORT_LOGDIR Contents">
536 # <i>ls -la /var/log/portage</i>
537 total 16
538 drwxrws--- 2 root root 4096 Jun 30 10:08 .
539 drwxr-xr-x 15 root root 4096 Jun 30 10:08 ..
540 -rw-r--r-- 1 root root 7390 Jun 30 10:09 2115-foobar2-1.0.log
541 </pre>
542
543 <p>
544 The log files have the format [counter]-[package name]-[version].log. Counter
545 is a special variable that is meant to state this package as the n-th package
546 you've emerged. This prevents duplicate logs from appearing. A quick look at
547 the log file will show the entire emerge process. This can be attached later
548 on as we'll see in the bug reporting section. Now that we've safely obtained
549 our information needed to report the bug we can continue to do so. However,
550 before we get started on that, we need to make sure no one else has reported
551 the issue.
552 </p>
553
554 </body>
555 </section>
556 </chapter>
557 </guide>

  ViewVC Help
Powered by ViewVC 1.1.20