| 1 |
fox2mike |
1.1 |
<?xml version="1.0" encoding="UTF-8"?>
|
| 2 |
|
|
<!DOCTYPE guide SYSTEM "/dtd/guide.dtd">
|
| 3 |
swift |
1.6 |
<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/draft/debugging-howto.xml,v 1.5 2005/09/29 15:11:35 neysx Exp $ -->
|
| 4 |
fox2mike |
1.1 |
|
| 5 |
swift |
1.6 |
<guide>
|
| 6 |
fox2mike |
1.1 |
<title>Gentoo Linux Debugging Guide</title>
|
| 7 |
|
|
|
| 8 |
|
|
<author title="Author">
|
| 9 |
|
|
<mail link="chriswhite@gentoo.org">Chris White</mail>
|
| 10 |
|
|
</author>
|
| 11 |
|
|
<author title="Editor">
|
| 12 |
|
|
<mail link="fox2mike@gentoo.org">Shyam Mani</mail>
|
| 13 |
|
|
</author>
|
| 14 |
|
|
|
| 15 |
|
|
<abstract>
|
| 16 |
|
|
This document aims at helping the user debug various errors they may encounter
|
| 17 |
|
|
during day to day usage of Gentoo.
|
| 18 |
|
|
</abstract>
|
| 19 |
|
|
|
| 20 |
|
|
<!-- The content of this document is licensed under the CC-BY-SA license -->
|
| 21 |
|
|
<!-- See http://creativecommons.org/licenses/by-sa/2.5 -->
|
| 22 |
|
|
<license/>
|
| 23 |
|
|
|
| 24 |
|
|
<version>1.0</version>
|
| 25 |
|
|
<date>2005-07-13</date>
|
| 26 |
|
|
|
| 27 |
|
|
<chapter>
|
| 28 |
|
|
<title>Introduction</title>
|
| 29 |
|
|
<section>
|
| 30 |
|
|
<title>Preface</title>
|
| 31 |
|
|
<body>
|
| 32 |
|
|
|
| 33 |
|
|
<p>
|
| 34 |
|
|
One of the factors that delay a bug being fixed is the way it is reported. By
|
| 35 |
|
|
creating this guide, we hope to help improve the communication between
|
| 36 |
|
|
developers and users in bug resolution. Getting bugs fixed is an important, if
|
| 37 |
|
|
not crucial part of the quality assurance for any project and hopefully this
|
| 38 |
|
|
guide will help make that a success.
|
| 39 |
|
|
</p>
|
| 40 |
|
|
|
| 41 |
|
|
</body>
|
| 42 |
|
|
</section>
|
| 43 |
|
|
<section>
|
| 44 |
|
|
<title>Bugs!!!!</title>
|
| 45 |
|
|
<body>
|
| 46 |
|
|
|
| 47 |
|
|
<p>
|
| 48 |
|
|
You're emerge-ing a package or working with a program and suddenly the worst
|
| 49 |
|
|
happens -- you find a bug. Bugs come in many forms like emerge failures or
|
| 50 |
|
|
segmentation faults. Whatever the cause, the fact still remains that such a bug
|
| 51 |
|
|
must be fixed. Here is a few examples of such bugs.
|
| 52 |
|
|
</p>
|
| 53 |
|
|
|
| 54 |
|
|
<pre caption="A run time error">
|
| 55 |
|
|
$ <i>./bad_code `perl -e 'print Ax100'`</i>
|
| 56 |
|
|
Segmentation fault
|
| 57 |
|
|
</pre>
|
| 58 |
|
|
|
| 59 |
|
|
<pre caption="An emerge failure">
|
| 60 |
|
|
/usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.2/include/g++-v3/backward/backward_warning.h:32:2:
|
| 61 |
|
|
warning: #warning This file includes at least one deprecated or antiquated
|
| 62 |
|
|
header. Please consider using one of the 32 headers found in section 17.4.1.2 of
|
| 63 |
|
|
the C++ standard. Examples include substituting the <X> header for the <X.h>
|
| 64 |
|
|
header for C++ includes, or <sstream> instead of the deprecated header
|
| 65 |
|
|
<strstream.h>. To disable this warning use -Wno-deprecated.
|
| 66 |
|
|
In file included from main.cc:40:
|
| 67 |
|
|
menudef.h:55: error: brace-enclosed initializer used to initialize `
|
| 68 |
|
|
OXPopupMenu*'
|
| 69 |
|
|
menudef.h:62: error: brace-enclosed initializer used to initialize `
|
| 70 |
|
|
OXPopupMenu*'
|
| 71 |
|
|
menudef.h:70: error: brace-enclosed initializer used to initialize `
|
| 72 |
|
|
OXPopupMenu*'
|
| 73 |
|
|
menudef.h:78: error: brace-enclosed initializer used to initialize `
|
| 74 |
|
|
OXPopupMenu*'
|
| 75 |
|
|
main.cc: In member function `void OXMain::DoOpen()':
|
| 76 |
|
|
main.cc:323: warning: unused variable `FILE*fp'
|
| 77 |
|
|
main.cc: In member function `void OXMain::DoSave(char*)':
|
| 78 |
|
|
main.cc:337: warning: unused variable `FILE*fp'
|
| 79 |
|
|
make[1]: *** [main.o] Error 1
|
| 80 |
|
|
make[1]: Leaving directory
|
| 81 |
|
|
`/var/tmp/portage/xclass-0.7.4/work/xclass-0.7.4/example-app'
|
| 82 |
|
|
make: *** [shared] Error 2
|
| 83 |
|
|
|
| 84 |
|
|
!!! ERROR: x11-libs/xclass-0.7.4 failed.
|
| 85 |
|
|
!!! Function src_compile, Line 29, Exitcode 2
|
| 86 |
|
|
!!! 'emake shared' failed
|
| 87 |
|
|
</pre>
|
| 88 |
|
|
|
| 89 |
|
|
<p>
|
| 90 |
|
|
These errors can be quite troublesome. However, once you find them, what do
|
| 91 |
|
|
you do? The following sections will look at two important tools for handling
|
| 92 |
|
|
run time errors. After that, we'll take a look at compile errors, and how to
|
| 93 |
neysx |
1.5 |
handle them. Let's start out with the first tool for debugging run time
|
| 94 |
fox2mike |
1.1 |
errors -- <c>gdb</c>.
|
| 95 |
|
|
</p>
|
| 96 |
|
|
|
| 97 |
|
|
</body>
|
| 98 |
|
|
</section>
|
| 99 |
|
|
</chapter>
|
| 100 |
|
|
|
| 101 |
|
|
|
| 102 |
|
|
<chapter>
|
| 103 |
|
|
<title>Debugging using GDB</title>
|
| 104 |
|
|
<section>
|
| 105 |
|
|
<title>Introduction</title>
|
| 106 |
|
|
<body>
|
| 107 |
|
|
|
| 108 |
|
|
<p>
|
| 109 |
|
|
GDB, or the (G)NU (D)e(B)ugger, is a program used to find run time errors that
|
| 110 |
|
|
normally involve memory corruption. First off, let's take a look at what
|
| 111 |
|
|
debugging entails. One of the main things you must do in order to debug a
|
| 112 |
|
|
program is to <c>emerge</c> the program with <c>FEATURES="nostrip"</c>. This
|
| 113 |
|
|
prevents the stripping of debug symbols. Why are programs stripped by default?
|
| 114 |
|
|
The reason is the same as that for having gzipped man pages -- saving space.
|
| 115 |
neysx |
1.5 |
Here's how the size of a program varies with and without debug symbol stripping.
|
| 116 |
fox2mike |
1.1 |
</p>
|
| 117 |
|
|
|
| 118 |
|
|
<pre caption="Filesize Comparison">
|
| 119 |
|
|
<comment>(debug symbols stripped)</comment>
|
| 120 |
|
|
-rwxr-xr-x 1 chris users 3140 6/28 13:11 bad_code
|
| 121 |
|
|
<comment>(debug symbols intact)</comment>
|
| 122 |
|
|
-rwxr-xr-x 1 chris users 6374 6/28 13:10 bad_code
|
| 123 |
|
|
</pre>
|
| 124 |
|
|
|
| 125 |
|
|
<p>
|
| 126 |
|
|
Just for reference, <e>bad_code</e> is the program we'll be debugging with
|
| 127 |
|
|
<c>gdb</c> later on. As you can see, the program without debugging symbols is
|
| 128 |
|
|
3140 bytes, while the program with them is 6374 bytes. That's close to double
|
| 129 |
|
|
the size! Two more things can be done for debugging. The first is adding ggdb3
|
| 130 |
|
|
to your CFLAGS and CXXFLAGS. This flag adds more debugging information than is
|
| 131 |
|
|
generally included. We'll see what that means later on. This is how
|
| 132 |
neysx |
1.5 |
<path>/etc/make.conf</path> <e>might</e> look with the newly added flags.
|
| 133 |
fox2mike |
1.1 |
</p>
|
| 134 |
|
|
|
| 135 |
|
|
<pre caption="make.conf settings">
|
| 136 |
|
|
CFLAGS="-O2 -pipe -ggdb3"
|
| 137 |
|
|
CXXFLAGS="${CFLAGS}"
|
| 138 |
|
|
</pre>
|
| 139 |
|
|
|
| 140 |
|
|
<p>
|
| 141 |
neysx |
1.5 |
Lastly, you can also add debug to the package's USE flags. This can be done
|
| 142 |
swift |
1.3 |
with the <path>package.use</path> file.
|
| 143 |
neysx |
1.5 |
</p>
|
| 144 |
fox2mike |
1.1 |
|
| 145 |
|
|
<pre caption="Using package.use to add debug USE flag">
|
| 146 |
|
|
# <i>echo "category/package debug" >> /etc/portage/package.use</i>
|
| 147 |
|
|
</pre>
|
| 148 |
|
|
|
| 149 |
|
|
<note>
|
| 150 |
|
|
The directory <path>/etc/portage</path> does not exist by default and you may
|
| 151 |
|
|
have to create it, if you have not already done so. If the package already has
|
| 152 |
|
|
USE flags set in <path>package.use</path>, you will need to manually modify them
|
| 153 |
|
|
in your favorite editor.
|
| 154 |
|
|
</note>
|
| 155 |
|
|
|
| 156 |
|
|
<p>
|
| 157 |
|
|
Then we re-emerge the package with the modifications we've done so far as shown
|
| 158 |
|
|
below.
|
| 159 |
|
|
</p>
|
| 160 |
|
|
|
| 161 |
|
|
<pre caption="Re-emergeing a package with debugging">
|
| 162 |
|
|
# <i>FEATURES="nostrip" emerge package</i>
|
| 163 |
|
|
</pre>
|
| 164 |
|
|
|
| 165 |
|
|
<p>
|
| 166 |
|
|
Now that debug symbols are setup, we can continue with debugging the program.
|
| 167 |
|
|
</p>
|
| 168 |
|
|
|
| 169 |
|
|
</body>
|
| 170 |
|
|
</section>
|
| 171 |
|
|
<section>
|
| 172 |
|
|
<title>Running the program with GDB</title>
|
| 173 |
|
|
<body>
|
| 174 |
|
|
|
| 175 |
|
|
<p>
|
| 176 |
|
|
Let's say we have a program here called "bad_code". Some person claims that the
|
| 177 |
|
|
program crashes and provides an example. You go ahead and test it out:
|
| 178 |
|
|
</p>
|
| 179 |
|
|
|
| 180 |
|
|
<pre caption="Breaking The Program">
|
| 181 |
|
|
$ <i>./bad_code `perl -e 'print Ax100'`</i>
|
| 182 |
|
|
Segmentation fault
|
| 183 |
|
|
</pre>
|
| 184 |
|
|
|
| 185 |
|
|
<p>
|
| 186 |
|
|
It seems this person was right. Since the program is obviously broken, we have
|
| 187 |
|
|
a bug at hand. Now, it's time to use <c>gdb</c> to help solve this matter. First
|
| 188 |
|
|
we run <c>gdb</c> with <c>--args</c>, then give it the full program with
|
| 189 |
|
|
arguments like shown:
|
| 190 |
|
|
</p>
|
| 191 |
|
|
|
| 192 |
|
|
<pre caption="Running Our Program Through GDB">
|
| 193 |
|
|
$ <i>gdb --args ./bad_code `perl -e 'print Ax100'`</i>
|
| 194 |
|
|
GNU gdb 6.3
|
| 195 |
|
|
Copyright 2004 Free Software Foundation, Inc.
|
| 196 |
|
|
GDB is free software, covered by the GNU General Public License, and you are
|
| 197 |
|
|
welcome to change it and/or distribute copies of it under certain conditions.
|
| 198 |
|
|
Type "show copying" to see the conditions.
|
| 199 |
|
|
There is absolutely no warranty for GDB. Type "show warranty" for details.
|
| 200 |
|
|
This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1".
|
| 201 |
|
|
</pre>
|
| 202 |
|
|
|
| 203 |
|
|
<note>
|
| 204 |
|
|
One can also debug with core dumps. These core files contain the same
|
| 205 |
|
|
information that the program would produce when run with gdb. In order to debug
|
| 206 |
|
|
with a core file with bad_code, you would run <c>gdb ./bad_code core</c> where
|
| 207 |
|
|
core is the name of the core file.
|
| 208 |
|
|
</note>
|
| 209 |
|
|
|
| 210 |
|
|
<p>
|
| 211 |
|
|
You should see a prompt that says "(gdb)" and waits for input. First, we have to
|
| 212 |
|
|
run the program. We type in <c>run</c> at the command and receive a notice like:
|
| 213 |
|
|
</p>
|
| 214 |
|
|
|
| 215 |
|
|
<pre caption="Running the program in GDB">
|
| 216 |
|
|
(gdb) <i>run</i>
|
| 217 |
|
|
Starting program: /home/chris/bad_code
|
| 218 |
|
|
|
| 219 |
|
|
Program received signal SIGSEGV, Segmentation fault.
|
| 220 |
|
|
0xb7ec6dc0 in strcpy () from /lib/libc.so.6
|
| 221 |
|
|
</pre>
|
| 222 |
|
|
|
| 223 |
|
|
<p>
|
| 224 |
|
|
Here we see the program starting, as well as a notification of SIGSEGV, or
|
| 225 |
|
|
Segmentation Fault. This is GDB telling us that our program has crashed. It
|
| 226 |
|
|
also gives the last run function it could trace when the program crashes.
|
| 227 |
|
|
However, this isn't too useful, as there could be multiple strcpy's in the
|
| 228 |
|
|
program, making it hard for developers to find which one is causing the issue.
|
| 229 |
|
|
In order to help them out, we do what's called a backtrace. A backtrace runs
|
| 230 |
|
|
backwards through all the functions that occurred upon program execution, to the
|
| 231 |
|
|
function at fault. Functions that return (without causing a crash) will not show
|
| 232 |
|
|
up on the backtrace. To get a backtrace, at the (gdb) prompt, type in <c>bt</c>.
|
| 233 |
|
|
You will get something like this:
|
| 234 |
|
|
</p>
|
| 235 |
|
|
|
| 236 |
|
|
<pre caption="Program backtrace">
|
| 237 |
|
|
(gdb) <i>bt</i>
|
| 238 |
|
|
#0 0xb7ec6dc0 in strcpy () from /lib/libc.so.6
|
| 239 |
|
|
#1 0x0804838c in run_it ()
|
| 240 |
|
|
#2 0x080483ba in main ()
|
| 241 |
|
|
</pre>
|
| 242 |
|
|
|
| 243 |
|
|
<p>
|
| 244 |
|
|
You can notice the trace pattern clearly. main() is called first, followed by
|
| 245 |
|
|
run_it(), and somewhere in run_it() lies the strcpy() at fault. Things such as
|
| 246 |
|
|
this help developers narrow down problems. There are a few exceptions to the
|
| 247 |
|
|
output. First off is forgetting to enable debug symbols with
|
| 248 |
neysx |
1.5 |
<c>FEATURES="nostrip"</c>. With debug symbols stripped, the output looks
|
| 249 |
swift |
1.3 |
something like this:
|
| 250 |
fox2mike |
1.1 |
</p>
|
| 251 |
|
|
|
| 252 |
|
|
<pre caption="Program backtrace With debug symbols stripped">
|
| 253 |
|
|
(gdb) <i>bt</i>
|
| 254 |
|
|
#0 0xb7e2cdc0 in strcpy () from /lib/libc.so.6
|
| 255 |
|
|
#1 0x0804838c in ?? ()
|
| 256 |
|
|
#2 0xbfd19510 in ?? ()
|
| 257 |
|
|
#3 0x00000000 in ?? ()
|
| 258 |
|
|
#4 0x00000000 in ?? ()
|
| 259 |
|
|
#5 0xb7eef148 in libgcc_s_personality () from /lib/libc.so.6
|
| 260 |
|
|
#6 0x080482ed in ?? ()
|
| 261 |
|
|
#7 0x080495b0 in ?? ()
|
| 262 |
|
|
#8 0xbfd19528 in ?? ()
|
| 263 |
|
|
#9 0xb7dd73b8 in __guard_setup () from /lib/libc.so.6
|
| 264 |
|
|
#10 0xb7dd742d in __guard_setup () from /lib/libc.so.6
|
| 265 |
|
|
#11 0x00000006 in ?? ()
|
| 266 |
|
|
#12 0xbfd19548 in ?? ()
|
| 267 |
|
|
#13 0x080483ba in ?? ()
|
| 268 |
|
|
#14 0x00000000 in ?? ()
|
| 269 |
|
|
#15 0x00000000 in ?? ()
|
| 270 |
|
|
#16 0xb7deebcc in __new_exitfn () from /lib/libc.so.6
|
| 271 |
|
|
#17 0x00000000 in ?? ()
|
| 272 |
|
|
#18 0xbfd19560 in ?? ()
|
| 273 |
|
|
#19 0xb7ef017c in nullserv () from /lib/libc.so.6
|
| 274 |
|
|
#20 0xb7dd6f37 in __libc_start_main () from /lib/libc.so.6
|
| 275 |
|
|
#21 0x00000001 in ?? ()
|
| 276 |
|
|
#22 0xbfd195d4 in ?? ()
|
| 277 |
|
|
#23 0xbfd195dc in ?? ()
|
| 278 |
|
|
#24 0x08048201 in ?? ()
|
| 279 |
|
|
</pre>
|
| 280 |
|
|
|
| 281 |
|
|
<p>
|
| 282 |
|
|
This backtrace contains a large number of ?? marks. This is because without
|
| 283 |
|
|
debug symbols, <c>gdb</c> doesn't know how the program was run. Hence, it is
|
| 284 |
|
|
crucial that debug symbols are <e>not</e> stripped. Now remember a while ago we
|
| 285 |
|
|
mentioned the -ggdb3 flag. Let's see what the output looks like with the flag
|
| 286 |
|
|
enabled:
|
| 287 |
|
|
</p>
|
| 288 |
|
|
|
| 289 |
|
|
<pre caption="Program backtrace with -ggdb3">
|
| 290 |
|
|
(gdb) <i>bt</i>
|
| 291 |
|
|
#0 0xb7e4bdc0 in strcpy () from /lib/libc.so.6
|
| 292 |
|
|
#1 0x0804838c in run_it (input=0x0) at bad_code.c:7
|
| 293 |
|
|
#2 0x080483ba in main (argc=1, argv=0xbfd3a434) at bad_code.c:12
|
| 294 |
|
|
</pre>
|
| 295 |
|
|
|
| 296 |
|
|
<p>
|
| 297 |
|
|
Here we see that a lot more information is available for developers. Not only is
|
| 298 |
|
|
function information displayed, but even the exact line numbers of the source
|
| 299 |
|
|
files. This method is the most preferred if you can spare the extra space.
|
| 300 |
|
|
Here's how much the file size varies between debug, strip, and -ggdb3 enabled
|
| 301 |
|
|
programs.
|
| 302 |
|
|
</p>
|
| 303 |
|
|
|
| 304 |
|
|
<pre caption="Filesize differences With -ggdb3 flag">
|
| 305 |
|
|
<comment>(debug symbols stripped)</comment>
|
| 306 |
|
|
-rwxr-xr-x 1 chris users 3140 6/28 13:11 bad_code
|
| 307 |
|
|
<comment>(debug symbols enabled)</comment>
|
| 308 |
|
|
-rwxr-xr-x 1 chris users 6374 6/28 13:10 bad_code
|
| 309 |
|
|
<comment>(-ggdb3 flag enabled)</comment>
|
| 310 |
|
|
-rwxr-xr-x 1 chris users 19552 6/28 13:11 bad_code
|
| 311 |
|
|
</pre>
|
| 312 |
|
|
|
| 313 |
|
|
<p>
|
| 314 |
neysx |
1.5 |
As you can see, -ggdb3 adds about <e>13178</e> more bytes to the file size
|
| 315 |
|
|
over the one with debugging symbols. However, as shown above, this increase
|
| 316 |
swift |
1.3 |
in file size can be worth it if presenting debug information to developers.
|
| 317 |
neysx |
1.5 |
The backtrace can be saved to a file by copying and pasting from the
|
| 318 |
|
|
terminal (if it's a non-x based terminal, you can use gpm. To keep this
|
| 319 |
swift |
1.3 |
doc simple, I recommend you read up on the documentation for gpm to see
|
| 320 |
neysx |
1.5 |
how to copy and paste with it). Now that we're done with <c>gdb</c>, we
|
| 321 |
swift |
1.3 |
can quit.
|
| 322 |
fox2mike |
1.1 |
</p>
|
| 323 |
|
|
|
| 324 |
|
|
<pre caption="Quitting GDB">
|
| 325 |
|
|
(gdb) <i>quit</i>
|
| 326 |
|
|
The program is running. Exit anyway? (y or n) <i>y</i>
|
| 327 |
|
|
$
|
| 328 |
|
|
</pre>
|
| 329 |
|
|
|
| 330 |
|
|
<p>
|
| 331 |
neysx |
1.5 |
This ends the walk-through of <c>gdb</c>. Using <c>gdb</c>, we hope that you
|
| 332 |
|
|
will be able to use it to create better bug reports. However, there are other
|
| 333 |
|
|
types of errors that can cause a program to fail during run time. One of the
|
| 334 |
|
|
other ways is through improper file access. We can find those using a nifty
|
| 335 |
swift |
1.3 |
little tool called <c>strace</c>.
|
| 336 |
fox2mike |
1.1 |
</p>
|
| 337 |
|
|
|
| 338 |
|
|
</body>
|
| 339 |
|
|
</section>
|
| 340 |
|
|
</chapter>
|
| 341 |
|
|
|
| 342 |
|
|
<chapter>
|
| 343 |
|
|
<title>Finding file access errors using strace</title>
|
| 344 |
|
|
<section>
|
| 345 |
|
|
<title>Introduction</title>
|
| 346 |
|
|
<body>
|
| 347 |
|
|
|
| 348 |
|
|
<p>
|
| 349 |
|
|
Programs often use files to fetch configuration information, access hardware or
|
| 350 |
|
|
write logs. Sometimes, a program attempts to reach such files incorrectly. A
|
| 351 |
|
|
tool called <c>strace</c> was created to help deal with this. <c>strace</c>
|
| 352 |
|
|
traces system calls (hence the name) which include calls that use the memory and
|
| 353 |
|
|
files. For our example, we're going to take a program foobar2. This is an
|
| 354 |
neysx |
1.5 |
updated version of foobar. However, during the change over to foobar2, you
|
| 355 |
|
|
notice all your configurations are missing! In foobar version 1, you had it
|
| 356 |
swift |
1.3 |
setup to say "foo", but now it's using the default "bar".
|
| 357 |
fox2mike |
1.1 |
</p>
|
| 358 |
|
|
|
| 359 |
|
|
<pre caption="Foobar2 With an invalid configuration">
|
| 360 |
|
|
$ <i>./foobar2</i>
|
| 361 |
|
|
Configuration says: bar
|
| 362 |
|
|
</pre>
|
| 363 |
|
|
|
| 364 |
|
|
<p>
|
| 365 |
|
|
Our previous configuration specifically had it set to foo, so let's use
|
| 366 |
|
|
<c>strace</c> to find out what's going on.
|
| 367 |
|
|
</p>
|
| 368 |
|
|
|
| 369 |
|
|
</body>
|
| 370 |
|
|
</section>
|
| 371 |
|
|
<section>
|
| 372 |
|
|
<title>Using strace to track the issue</title>
|
| 373 |
|
|
<body>
|
| 374 |
|
|
|
| 375 |
|
|
<p>
|
| 376 |
|
|
We make <c>strace</c> log the results of the system calls. To do this, we run
|
| 377 |
|
|
<c>strace</c> with the -o[file] arguments. Let's use it on foobar2 as shown.
|
| 378 |
|
|
</p>
|
| 379 |
|
|
|
| 380 |
|
|
<pre caption="Running foobar2 through strace">
|
| 381 |
|
|
# <i>strace -ostrace.log ./foobar2</i>
|
| 382 |
|
|
</pre>
|
| 383 |
|
|
|
| 384 |
|
|
<p>
|
| 385 |
|
|
This creates a file called <path>strace.log</path> in the current directory. We
|
| 386 |
|
|
check the file, and shown below are the relevant parts from the file.
|
| 387 |
|
|
</p>
|
| 388 |
|
|
|
| 389 |
|
|
<pre caption="A Look At the strace Log">
|
| 390 |
|
|
open(".foobar2/config", O_RDONLY) = 3
|
| 391 |
|
|
read(3, "bar", 3) = 3
|
| 392 |
|
|
</pre>
|
| 393 |
|
|
|
| 394 |
|
|
<p>
|
| 395 |
|
|
Aha! So There's the problem. Someone moved the configuration directory to
|
| 396 |
|
|
<path>.foobar2</path> instead of <path>.foobar</path>. We also see the program
|
| 397 |
|
|
reading in "bar" as it should. In this case, we can recommend the ebuild
|
| 398 |
|
|
maintainer to put a warning about it. For now though, we can copy over the
|
| 399 |
|
|
config file from <path>.foobar</path> and modify it to produce the correct
|
| 400 |
neysx |
1.5 |
results.
|
| 401 |
fox2mike |
1.1 |
</p>
|
| 402 |
|
|
|
| 403 |
|
|
</body>
|
| 404 |
|
|
</section>
|
| 405 |
|
|
<section>
|
| 406 |
|
|
<title>Conclusion</title>
|
| 407 |
|
|
<body>
|
| 408 |
|
|
|
| 409 |
|
|
<p>
|
| 410 |
neysx |
1.5 |
<c>strace</c> is a great way at seeing what the kernel is doing to with the
|
| 411 |
swift |
1.3 |
filesystem. Another program exists to help users see what the kernel is doing,
|
| 412 |
|
|
and help with kernel debugging. This program is called <c>dmesg</c>.
|
| 413 |
fox2mike |
1.2 |
</p>
|
| 414 |
|
|
|
| 415 |
|
|
</body>
|
| 416 |
|
|
</section>
|
| 417 |
|
|
</chapter>
|
| 418 |
|
|
|
| 419 |
|
|
<chapter>
|
| 420 |
|
|
<title>Kernel Debugging With dmesg</title>
|
| 421 |
|
|
<section>
|
| 422 |
|
|
<title>dmesg Introduction</title>
|
| 423 |
|
|
<body>
|
| 424 |
|
|
|
| 425 |
|
|
<p>
|
| 426 |
fox2mike |
1.4 |
<c>dmesg</c> is a system program created with debugging kernel operation. It
|
| 427 |
fox2mike |
1.2 |
basically reads the kernel messages and keeps them in buffer, letting the user
|
| 428 |
fox2mike |
1.4 |
see them later on. Here's an example of what a dmesg output looks like:
|
| 429 |
fox2mike |
1.2 |
</p>
|
| 430 |
|
|
|
| 431 |
|
|
<pre caption="dmesg sample output">
|
| 432 |
|
|
SIS5513: IDE controller at PCI slot 0000:00:02.5
|
| 433 |
|
|
SIS5513: chipset revision 208
|
| 434 |
|
|
SIS5513: not 100% native mode: will probe irqs later
|
| 435 |
|
|
SIS5513: SiS 961 MuTIOL IDE UDMA100 controller
|
| 436 |
|
|
ide0: BM-DMA at 0x4000-0x4007, BIOS settings: hda:DMA, hdb:DMA
|
| 437 |
|
|
ide1: BM-DMA at 0x4008-0x400f, BIOS settings: hdc:DMA, hdd:DMA
|
| 438 |
|
|
Probing IDE interface ide0...
|
| 439 |
|
|
input: ImPS/2 Generic Wheel Mouse on isa0060/serio1
|
| 440 |
|
|
hda: WDC WD800BB-60CJA0, ATA DISK drive
|
| 441 |
|
|
hdb: CD-RW 52X24, ATAPI CD/DVD-ROM drive
|
| 442 |
|
|
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
|
| 443 |
|
|
Probing IDE interface ide1...
|
| 444 |
|
|
hdc: SAMSUNG DVD-ROM SD-616T, ATAPI CD/DVD-ROM drive
|
| 445 |
|
|
hdd: Maxtor 92049U6, ATA DISK drive
|
| 446 |
|
|
ide1 at 0x170-0x177,0x376 on irq 15
|
| 447 |
|
|
hda: max request size: 128KiB
|
| 448 |
|
|
hda: 156301488 sectors (80026 MB) w/2048KiB Cache, CHS=65535/16/63,
|
| 449 |
|
|
UDMA(100)
|
| 450 |
|
|
hda: cache flushes not supported
|
| 451 |
|
|
hda: hda1
|
| 452 |
|
|
hdd: max request size: 128KiB
|
| 453 |
|
|
hdd: 39882528 sectors (20419 MB) w/2048KiB Cache, CHS=39566/16/63,
|
| 454 |
|
|
UDMA(66)
|
| 455 |
|
|
hdd: cache flushes not supported
|
| 456 |
|
|
hdd: unknown partition table
|
| 457 |
|
|
hdb: ATAPI 52X CD-ROM CD-R/RW drive, 2048kB Cache, UDMA(33)
|
| 458 |
|
|
Uniform CD-ROM driver Revision: 3.20
|
| 459 |
|
|
hdc: ATAPI 48X DVD-ROM drive, 512kB Cache, UDMA(33)
|
| 460 |
|
|
ide-floppy driver 0.99.newide
|
| 461 |
|
|
libata version 1.11 loaded.
|
| 462 |
neysx |
1.5 |
usbmon: debugs is not available
|
| 463 |
fox2mike |
1.2 |
</pre>
|
| 464 |
|
|
|
| 465 |
|
|
<p>
|
| 466 |
fox2mike |
1.4 |
The dmesg displayed here is my machine's bootup. You can see the hard disks and
|
| 467 |
fox2mike |
1.2 |
input devices being initialized. While what you see here seems relatively
|
| 468 |
fox2mike |
1.4 |
harmless, <c>dmesg</c> is also good at showing when things go wrong. Let's take
|
| 469 |
|
|
for example an IPAQ 1945 I have. After a couple of minutes of inactivity, the
|
| 470 |
|
|
device powers off. Now, I have the device connected into the USB port in the
|
| 471 |
|
|
front of my system. Now, I want to copy over some files using libsynCE, so I go
|
| 472 |
fox2mike |
1.2 |
ahead and initiate a connection:
|
| 473 |
|
|
</p>
|
| 474 |
|
|
|
| 475 |
|
|
<pre caption="IPAQ connection attempt">
|
| 476 |
|
|
# <i>synce-serial-start</i>
|
| 477 |
|
|
/usr/sbin/pppd: In file /etc/ppp/peers/synce-device: unrecognized option
|
| 478 |
|
|
'/dev/tts/USB0'
|
| 479 |
|
|
|
| 480 |
|
|
synce-serial-start was unable to start the PPP daemon!
|
| 481 |
|
|
</pre>
|
| 482 |
|
|
|
| 483 |
|
|
<p>
|
| 484 |
|
|
The connection fails, as we see here, and we assume that only the screen is in
|
| 485 |
|
|
powersave mode, and that maybe the connection is faulty. In order to see what
|
| 486 |
|
|
truly happened, we can use <c>dmesg</c>. Now, <c>dmesg</c> tends to give a
|
| 487 |
fox2mike |
1.4 |
rather large ammount of output. One can use the <c>tail</c> command to help
|
| 488 |
fox2mike |
1.2 |
keep the output down:
|
| 489 |
|
|
</p>
|
| 490 |
|
|
|
| 491 |
|
|
<pre caption="Adjusting the output ammount with tail">
|
| 492 |
|
|
$ <i>dmesg | tail -n 4</i>
|
| 493 |
|
|
usb 1-1.2: PocketPC PDA converter now attached to ttyUSB0
|
| 494 |
|
|
usb 1-1.2: USB disconnect, address 11
|
| 495 |
|
|
PocketPC PDA ttyUSB0: PocketPC PDA converter now disconnected from ttyUSB0
|
| 496 |
|
|
ipaq 1-1.2:1.0: device disconnected
|
| 497 |
|
|
</pre>
|
| 498 |
|
|
|
| 499 |
|
|
<p>
|
| 500 |
fox2mike |
1.4 |
This gives us the last 4 lines of the <c>dmesg</c> output. Now, this is enough
|
| 501 |
|
|
to give us some information on the situation. It seems that in the first 2
|
| 502 |
|
|
lines, the pocketpc is recognized as connected. However, in the last 2 lines, it
|
| 503 |
|
|
appears to have been disconnected. With this information we check the pocketpc
|
| 504 |
|
|
again, and find out it is powered off, and now know about the powersave mode. We
|
| 505 |
|
|
can use this information to turn the feature off, or be aware of it next time.
|
| 506 |
|
|
While this is a somewhat simple example, it does go to show how well
|
| 507 |
|
|
<c>dmesg</c> can work. However, in more complex examples (such as kernel bugs),
|
| 508 |
|
|
the entire <c>dmesg</c> output may be required. To obtain that, simple redirect
|
| 509 |
|
|
to a log file as such:
|
| 510 |
fox2mike |
1.2 |
</p>
|
| 511 |
|
|
|
| 512 |
|
|
<pre caption="Saving dmesg output to a log">
|
| 513 |
|
|
$ <i>dmesg > dmesg.log</i>
|
| 514 |
|
|
</pre>
|
| 515 |
|
|
|
| 516 |
|
|
<p>
|
| 517 |
|
|
You can then attach this to a bug report, or post it online somewhere for
|
| 518 |
|
|
collaborative debugging sessions.
|
| 519 |
|
|
</p>
|
| 520 |
|
|
|
| 521 |
|
|
</body>
|
| 522 |
|
|
</section>
|
| 523 |
|
|
<section>
|
| 524 |
|
|
<title>Conclusion</title>
|
| 525 |
|
|
<body>
|
| 526 |
|
|
|
| 527 |
|
|
<p>
|
| 528 |
|
|
Now that we've taken a look at a few ways to debug runtime and kernel errors,
|
| 529 |
|
|
let's take a look at how to handle emerge errors.
|
| 530 |
fox2mike |
1.1 |
</p>
|
| 531 |
|
|
|
| 532 |
|
|
</body>
|
| 533 |
|
|
</section>
|
| 534 |
|
|
</chapter>
|
| 535 |
|
|
|
| 536 |
|
|
<chapter>
|
| 537 |
|
|
<title>Handling emerge Errors</title>
|
| 538 |
|
|
<section>
|
| 539 |
|
|
<title>Introduction</title>
|
| 540 |
|
|
<body>
|
| 541 |
|
|
|
| 542 |
|
|
<p>
|
| 543 |
|
|
<c>emerge</c> errors, such as the one displayed earlier, can be a major cause
|
| 544 |
|
|
of frustration for users. Reporting them is considered crucial for maintaining
|
| 545 |
|
|
the health of Gentoo. Let's take a look at a sample ebuild, foobar2, which
|
| 546 |
|
|
contains some build errors.
|
| 547 |
|
|
</p>
|
| 548 |
|
|
|
| 549 |
|
|
</body>
|
| 550 |
|
|
</section>
|
| 551 |
|
|
<section id="emerge_error">
|
| 552 |
|
|
<title>Evaluating emerge Errors</title>
|
| 553 |
|
|
<body>
|
| 554 |
|
|
|
| 555 |
|
|
<p>
|
| 556 |
|
|
Let's take a look at this very simple <c>emerge</c> error:
|
| 557 |
|
|
</p>
|
| 558 |
|
|
|
| 559 |
|
|
<pre caption="emerge Error">
|
| 560 |
|
|
gcc -D__TEST__ -D__GNU__ -D__LINUX__ -L/usr/lib -I/usr/include -L/usr/lib/nspr/ -I/usr/include/fmod -c -o foobar2-7.o foobar2-7.c
|
| 561 |
|
|
gcc -D__TEST__ -D__GNU__ -D__LINUX__ -L/usr/lib -I/usr/include -L/usr/lib/nspr/ -I/usr/include/fmod -c -o foobar2-8.o foobar2-8.c
|
| 562 |
|
|
gcc -D__TEST__ -D__GNU__ -D__LINUX__ -L/usr/lib -I/usr/include -L/usr/lib/nspr/ -I/usr/include/fmod -c -o foobar2-9.o foobar2-9.c
|
| 563 |
|
|
gcc -D__TEST__ -D__GNU__ -D__LINUX__ -L/usr/lib -I/usr/include -L/usr/lib/nspr/ -I/usr/include/fmod -c -o foobar2.o foobar2.c
|
| 564 |
|
|
foobar2.c:1:17: ogg.h: No such file or directory
|
| 565 |
|
|
make: *** [foobar2.o] Error 1
|
| 566 |
|
|
|
| 567 |
|
|
!!! ERROR: sys-apps/foobar2-1.0 failed.
|
| 568 |
|
|
!!! Function src_compile, Line 19, Exitcode 2
|
| 569 |
|
|
!!! Make failed!
|
| 570 |
|
|
!!! If you need support, post the topmost build error, NOT this status message
|
| 571 |
|
|
</pre>
|
| 572 |
|
|
|
| 573 |
|
|
<p>
|
| 574 |
neysx |
1.5 |
The program is compiling smoothly when it suddenly stops and presents an error
|
| 575 |
|
|
message. This particular error can be split into 3 different sections, The
|
| 576 |
swift |
1.3 |
compile messages, the build error, and the emerge error message as shown below.
|
| 577 |
fox2mike |
1.1 |
</p>
|
| 578 |
|
|
|
| 579 |
|
|
<pre caption="Parts of the error">
|
| 580 |
|
|
<comment>(Compilation Messages)</comment>
|
| 581 |
|
|
gcc -D__TEST__ -D__GNU__ -D__LINUX__ -L/usr/lib -I/usr/include -L/usr/lib/nspr/ -I/usr/include/fmod -c -o foobar2-7.o foobar2-7.c
|
| 582 |
|
|
gcc -D__TEST__ -D__GNU__ -D__LINUX__ -L/usr/lib -I/usr/include -L/usr/lib/nspr/ -I/usr/include/fmod -c -o foobar2-8.o foobar2-8.c
|
| 583 |
|
|
gcc -D__TEST__ -D__GNU__ -D__LINUX__ -L/usr/lib -I/usr/include -L/usr/lib/nspr/ -I/usr/include/fmod -c -o foobar2-9.o foobar2-9.c
|
| 584 |
|
|
gcc -D__TEST__ -D__GNU__ -D__LINUX__ -L/usr/lib -I/usr/include -L/usr/lib/nspr/ -I/usr/include/fmod -c -o foobar2.o foobar2.c
|
| 585 |
|
|
|
| 586 |
|
|
<comment>(Build Error)</comment>
|
| 587 |
|
|
foobar2.c:1:17: ogg.h: No such file or directory
|
| 588 |
|
|
make: *** [foobar2.o] Error 1
|
| 589 |
|
|
|
| 590 |
|
|
<comment>(emerge Error)</comment>
|
| 591 |
|
|
!!! ERROR: sys-apps/foobar2-1.0 failed.
|
| 592 |
|
|
!!! Function src_compile, Line 19, Exitcode 2
|
| 593 |
|
|
!!! Make failed!
|
| 594 |
|
|
!!! If you need support, post the topmost build error, NOT this status message
|
| 595 |
|
|
</pre>
|
| 596 |
|
|
|
| 597 |
|
|
<p>
|
| 598 |
|
|
The compilation messages are what lead up to the error. Most often, it's good to
|
| 599 |
|
|
at least include 10 lines of compile information so that the developer knows
|
| 600 |
|
|
where the compilation was at when the error occurred.
|
| 601 |
|
|
</p>
|
| 602 |
|
|
|
| 603 |
|
|
<p>
|
| 604 |
|
|
Make errors are the actual error and the information the developer needs. When
|
| 605 |
|
|
you see "make: ***", this is often where the error has occurred. Normally, you
|
| 606 |
|
|
can copy and paste 10 lines above it and the developer will be able to address
|
| 607 |
|
|
the issue. However, this may not always work and we'll take a look at an
|
| 608 |
|
|
alternative shortly.
|
| 609 |
|
|
</p>
|
| 610 |
|
|
|
| 611 |
|
|
<p>
|
| 612 |
|
|
The emerge error is what <c>emerge</c> throws out as an error. Sometimes, this
|
| 613 |
|
|
might also contain some important information. Often people make the mistake of
|
| 614 |
|
|
posting the emerge error and that's all. This is useless by itself, but with
|
| 615 |
|
|
make error and compile information, a developer can get what application and
|
| 616 |
|
|
what version of the package is failing. As a side note, make is commonly used as
|
| 617 |
|
|
the build process for programs (<b>but not always</b>). If you can't find a
|
| 618 |
|
|
"make: ***" error anywhere, then simply copy and paste 20 lines before the
|
| 619 |
|
|
emerge error. This should take care of most all build system error messages. Now
|
| 620 |
|
|
let's say the errors seem to be quite large. 10 lines won't be enough to catch
|
| 621 |
|
|
everything. That's where PORT_LOGDIR comes into play.
|
| 622 |
|
|
</p>
|
| 623 |
|
|
|
| 624 |
|
|
</body>
|
| 625 |
|
|
</section>
|
| 626 |
|
|
<section>
|
| 627 |
|
|
<title>emerge and PORT_LOGDIR</title>
|
| 628 |
|
|
<body>
|
| 629 |
|
|
|
| 630 |
|
|
<p>
|
| 631 |
neysx |
1.5 |
PORT_LOGDIR is a portage variable that sets up a log directory for separate
|
| 632 |
fox2mike |
1.1 |
emerge logs. Let's take a look and see what that entails. First, run your emerge
|
| 633 |
|
|
with PORT_LOGDIR set to your favorite log location. Let's say we have a
|
| 634 |
|
|
location <path>/var/log/portage</path>. We'll use that for our log directory:
|
| 635 |
|
|
</p>
|
| 636 |
|
|
|
| 637 |
|
|
<note>
|
| 638 |
|
|
In the default setup, <path>/var/log/portage</path> does not exist, and you will
|
| 639 |
|
|
most likely have to create it. If you do not, portage will fail to write the
|
| 640 |
|
|
logs.
|
| 641 |
|
|
</note>
|
| 642 |
|
|
|
| 643 |
|
|
<pre caption="emerge-ing With PORT_LOGDIR">
|
| 644 |
|
|
# <i>PORT_LOGDIR=/var/log/portage emerge foobar2</i>
|
| 645 |
|
|
</pre>
|
| 646 |
|
|
|
| 647 |
|
|
<p>
|
| 648 |
|
|
Now the emerge fails again. However, this time we have a log we can work with,
|
| 649 |
|
|
and attach to the bug later on. Let's take a quick look at our log directory.
|
| 650 |
|
|
</p>
|
| 651 |
|
|
|
| 652 |
|
|
<pre caption="PORT_LOGDIR Contents">
|
| 653 |
|
|
# <i>ls -la /var/log/portage</i>
|
| 654 |
|
|
total 16
|
| 655 |
|
|
drwxrws--- 2 root root 4096 Jun 30 10:08 .
|
| 656 |
|
|
drwxr-xr-x 15 root root 4096 Jun 30 10:08 ..
|
| 657 |
|
|
-rw-r--r-- 1 root root 7390 Jun 30 10:09 2115-foobar2-1.0.log
|
| 658 |
|
|
</pre>
|
| 659 |
|
|
|
| 660 |
|
|
<p>
|
| 661 |
|
|
The log files have the format [counter]-[package name]-[version].log. Counter
|
| 662 |
|
|
is a special variable that is meant to state this package as the n-th package
|
| 663 |
|
|
you've emerged. This prevents duplicate logs from appearing. A quick look at
|
| 664 |
|
|
the log file will show the entire emerge process. This can be attached later
|
| 665 |
|
|
on as we'll see in the bug reporting section. Now that we've safely obtained
|
| 666 |
|
|
our information needed to report the bug we can continue to do so. However,
|
| 667 |
|
|
before we get started on that, we need to make sure no one else has reported
|
| 668 |
neysx |
1.5 |
the issue.
|
| 669 |
fox2mike |
1.1 |
</p>
|
| 670 |
|
|
|
| 671 |
|
|
</body>
|
| 672 |
|
|
</section>
|
| 673 |
|
|
</chapter>
|
| 674 |
|
|
</guide>
|