One of the factors that delay a bug being fixed is the way it is reported. By creating this guide, we hope to help improve the communication between developers and users in bug resolution. Getting bugs fixed is an important, if not crucial part of the quality assurance for any project and hopefully this guide will help make that a success.
You're emerge-ing a package or working with a program and suddenly the worst happens -- you find a bug. Bugs come in many forms like emerge failures or segmentation faults. Whatever the cause, the fact still remains that such a bug must be fixed. Here is a few examples of such bugs.
$ ./bad_code `perl -e 'print Ax100'` Segmentation fault
/usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.2/include/g++-v3/backward/backward_warning.h:32:2: warning: #warning This file includes at least one deprecated or antiquated header. Please consider using one of the 32 headers found in section 17.4.1.2 of the C++ standard. Examples include substituting the <X> header for the <X.h> header for C++ includes, or <sstream> instead of the deprecated header <strstream.h>. To disable this warning use -Wno-deprecated. In file included from main.cc:40: menudef.h:55: error: brace-enclosed initializer used to initialize ` OXPopupMenu*' menudef.h:62: error: brace-enclosed initializer used to initialize ` OXPopupMenu*' menudef.h:70: error: brace-enclosed initializer used to initialize ` OXPopupMenu*' menudef.h:78: error: brace-enclosed initializer used to initialize ` OXPopupMenu*' main.cc: In member function `void OXMain::DoOpen()': main.cc:323: warning: unused variable `FILE*fp' main.cc: In member function `void OXMain::DoSave(char*)': main.cc:337: warning: unused variable `FILE*fp' make[1]: *** [main.o] Error 1 make[1]: Leaving directory `/var/tmp/portage/xclass-0.7.4/work/xclass-0.7.4/example-app' make: *** [shared] Error 2 !!! ERROR: x11-libs/xclass-0.7.4 failed. !!! Function src_compile, Line 29, Exitcode 2 !!! 'emake shared' failed
These errors can be quite troublesome. However, once you find them, what do
you do? The following sections will look at two important tools for handling
run time errors. After that, we'll take a look at compile errors, and how to
handle them. Let's start out with the first tool for debugging run time
errors --
GDB, or the (G)NU (D)e(B)ugger, is a program used to find run time errors that
normally involve memory corruption. First off, let's take a look at what
debugging entails. One of the main things you must do in order to debug a
program is to
(debug symbols stripped) -rwxr-xr-x 1 chris users 3140 6/28 13:11 bad_code(debug symbols intact) -rwxr-xr-x 1 chris users 6374 6/28 13:10 bad_code
Just for reference,
CFLAGS="-O2 -pipe -ggdb3"
CXXFLAGS="${CFLAGS}"
Lastly, you can also add debug to the package's USE flags. This can be done
with the
# echo "category/package debug" >> /etc/portage/package.use
Then we re-emerge the package with the modifications we've done so far as shown below.
# FEATURES="nostrip" emerge package
Now that debug symbols are setup, we can continue with debugging the program.
Let's say we have a program here called "bad_code". Some person claims that the program crashes and provides an example. You go ahead and test it out:
$ ./bad_code `perl -e 'print Ax100'` Segmentation fault
It seems this person was right. Since the program is obviously broken, we have
a bug at hand. Now, it's time to use
$ gdb --args ./bad_code `perl -e 'print Ax100'` GNU gdb 6.3 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1".
You should see a prompt that says "(gdb)" and waits for input. First, we have to
run the program. We type in
(gdb) run Starting program: /home/chris/bad_code Program received signal SIGSEGV, Segmentation fault. 0xb7ec6dc0 in strcpy () from /lib/libc.so.6
Here we see the program starting, as well as a notification of SIGSEGV, or
Segmentation Fault. This is GDB telling us that our program has crashed. It
also gives the last run function it could trace when the program crashes.
However, this isn't too useful, as there could be multiple strcpy's in the
program, making it hard for developers to find which one is causing the issue.
In order to help them out, we do what's called a backtrace. A backtrace runs
backwards through all the functions that occurred upon program execution, to the
function at fault. Functions that return (without causing a crash) will not show
up on the backtrace. To get a backtrace, at the (gdb) prompt, type in
(gdb) bt #0 0xb7ec6dc0 in strcpy () from /lib/libc.so.6 #1 0x0804838c in run_it () #2 0x080483ba in main ()
You can notice the trace pattern clearly. main() is called first, followed by
run_it(), and somewhere in run_it() lies the strcpy() at fault. Things such as
this help developers narrow down problems. There are a few exceptions to the
output. First off is forgetting to enable debug symbols with
(gdb) bt #0 0xb7e2cdc0 in strcpy () from /lib/libc.so.6 #1 0x0804838c in ?? () #2 0xbfd19510 in ?? () #3 0x00000000 in ?? () #4 0x00000000 in ?? () #5 0xb7eef148 in libgcc_s_personality () from /lib/libc.so.6 #6 0x080482ed in ?? () #7 0x080495b0 in ?? () #8 0xbfd19528 in ?? () #9 0xb7dd73b8 in __guard_setup () from /lib/libc.so.6 #10 0xb7dd742d in __guard_setup () from /lib/libc.so.6 #11 0x00000006 in ?? () #12 0xbfd19548 in ?? () #13 0x080483ba in ?? () #14 0x00000000 in ?? () #15 0x00000000 in ?? () #16 0xb7deebcc in __new_exitfn () from /lib/libc.so.6 #17 0x00000000 in ?? () #18 0xbfd19560 in ?? () #19 0xb7ef017c in nullserv () from /lib/libc.so.6 #20 0xb7dd6f37 in __libc_start_main () from /lib/libc.so.6 #21 0x00000001 in ?? () #22 0xbfd195d4 in ?? () #23 0xbfd195dc in ?? () #24 0x08048201 in ?? ()
This backtrace contains a large number of ?? marks. This is because without
debug symbols,
(gdb) bt #0 0xb7e4bdc0 in strcpy () from /lib/libc.so.6 #1 0x0804838c in run_it (input=0x0) at bad_code.c:7 #2 0x080483ba in main (argc=1, argv=0xbfd3a434) at bad_code.c:12
Here we see that a lot more information is available for developers. Not only is function information displayed, but even the exact line numbers of the source files. This method is the most preferred if you can spare the extra space. Here's how much the file size varies between debug, strip, and -ggdb3 enabled programs.
(debug symbols stripped) -rwxr-xr-x 1 chris users 3140 6/28 13:11 bad_code(debug symbols enabled) -rwxr-xr-x 1 chris users 6374 6/28 13:10 bad_code(-ggdb3 flag enabled) -rwxr-xr-x 1 chris users 19552 6/28 13:11 bad_code
As you can see, -ggdb3 adds about
(gdb) quit The program is running. Exit anyway? (y or n) y $
This ends the walk-through of
Programs often use files to fetch configuration information, access hardware or
write logs. Sometimes, a program attempts to reach such files incorrectly. A
tool called
$ ./foobar2 Configuration says: bar
Our previous configuration specifically had it set to foo, so let's use
We make
# strace -ostrace.log ./foobar2
This creates a file called
open(".foobar2/config", O_RDONLY) = 3
read(3, "bar", 3) = 3
Aha! So There's the problem. Someone moved the configuration directory to
SIS5513: IDE controller at PCI slot 0000:00:02.5 SIS5513: chipset revision 208 SIS5513: not 100% native mode: will probe irqs later SIS5513: SiS 961 MuTIOL IDE UDMA100 controller ide0: BM-DMA at 0x4000-0x4007, BIOS settings: hda:DMA, hdb:DMA ide1: BM-DMA at 0x4008-0x400f, BIOS settings: hdc:DMA, hdd:DMA Probing IDE interface ide0... input: ImPS/2 Generic Wheel Mouse on isa0060/serio1 hda: WDC WD800BB-60CJA0, ATA DISK drive hdb: CD-RW 52X24, ATAPI CD/DVD-ROM drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 Probing IDE interface ide1... hdc: SAMSUNG DVD-ROM SD-616T, ATAPI CD/DVD-ROM drive hdd: Maxtor 92049U6, ATA DISK drive ide1 at 0x170-0x177,0x376 on irq 15 hda: max request size: 128KiB hda: 156301488 sectors (80026 MB) w/2048KiB Cache, CHS=65535/16/63, UDMA(100) hda: cache flushes not supported hda: hda1 hdd: max request size: 128KiB hdd: 39882528 sectors (20419 MB) w/2048KiB Cache, CHS=39566/16/63, UDMA(66) hdd: cache flushes not supported hdd: unknown partition table hdb: ATAPI 52X CD-ROM CD-R/RW drive, 2048kB Cache, UDMA(33) Uniform CD-ROM driver Revision: 3.20 hdc: ATAPI 48X DVD-ROM drive, 512kB Cache, UDMA(33) ide-floppy driver 0.99.newide libata version 1.11 loaded. usbmon: debugs is not available
The dmesg displayed here is my machine's bootup. You can see the hard disks and
input devices being initialized. While what you see here seems relatively
harmless,
# synce-serial-start /usr/sbin/pppd: In file /etc/ppp/peers/synce-device: unrecognized option '/dev/tts/USB0' synce-serial-start was unable to start the PPP daemon!
The connection fails, as we see here, and we assume that only the screen is in
powersave mode, and that maybe the connection is faulty. In order to see what
truly happened, we can use
$ dmesg | tail -n 4 usb 1-1.2: PocketPC PDA converter now attached to ttyUSB0 usb 1-1.2: USB disconnect, address 11 PocketPC PDA ttyUSB0: PocketPC PDA converter now disconnected from ttyUSB0 ipaq 1-1.2:1.0: device disconnected
This gives us the last 4 lines of the
$ dmesg > dmesg.log
You can then attach this to a bug report, or post it online somewhere for collaborative debugging sessions.
Now that we've taken a look at a few ways to debug runtime and kernel errors, let's take a look at how to handle emerge errors.
Let's take a look at this very simple
gcc -D__TEST__ -D__GNU__ -D__LINUX__ -L/usr/lib -I/usr/include -L/usr/lib/nspr/ -I/usr/include/fmod -c -o foobar2-7.o foobar2-7.c gcc -D__TEST__ -D__GNU__ -D__LINUX__ -L/usr/lib -I/usr/include -L/usr/lib/nspr/ -I/usr/include/fmod -c -o foobar2-8.o foobar2-8.c gcc -D__TEST__ -D__GNU__ -D__LINUX__ -L/usr/lib -I/usr/include -L/usr/lib/nspr/ -I/usr/include/fmod -c -o foobar2-9.o foobar2-9.c gcc -D__TEST__ -D__GNU__ -D__LINUX__ -L/usr/lib -I/usr/include -L/usr/lib/nspr/ -I/usr/include/fmod -c -o foobar2.o foobar2.c foobar2.c:1:17: ogg.h: No such file or directory make: *** [foobar2.o] Error 1 !!! ERROR: sys-apps/foobar2-1.0 failed. !!! Function src_compile, Line 19, Exitcode 2 !!! Make failed! !!! If you need support, post the topmost build error, NOT this status message
The program is compiling smoothly when it suddenly stops and presents an error message. This particular error can be split into 3 different sections, The compile messages, the build error, and the emerge error message as shown below.
(Compilation Messages) gcc -D__TEST__ -D__GNU__ -D__LINUX__ -L/usr/lib -I/usr/include -L/usr/lib/nspr/ -I/usr/include/fmod -c -o foobar2-7.o foobar2-7.c gcc -D__TEST__ -D__GNU__ -D__LINUX__ -L/usr/lib -I/usr/include -L/usr/lib/nspr/ -I/usr/include/fmod -c -o foobar2-8.o foobar2-8.c gcc -D__TEST__ -D__GNU__ -D__LINUX__ -L/usr/lib -I/usr/include -L/usr/lib/nspr/ -I/usr/include/fmod -c -o foobar2-9.o foobar2-9.c gcc -D__TEST__ -D__GNU__ -D__LINUX__ -L/usr/lib -I/usr/include -L/usr/lib/nspr/ -I/usr/include/fmod -c -o foobar2.o foobar2.c(Build Error) foobar2.c:1:17: ogg.h: No such file or directory make: *** [foobar2.o] Error 1(emerge Error) !!! ERROR: sys-apps/foobar2-1.0 failed. !!! Function src_compile, Line 19, Exitcode 2 !!! Make failed! !!! If you need support, post the topmost build error, NOT this status message
The compilation messages are what lead up to the error. Most often, it's good to at least include 10 lines of compile information so that the developer knows where the compilation was at when the error occurred.
Make errors are the actual error and the information the developer needs. When you see "make: ***", this is often where the error has occurred. Normally, you can copy and paste 10 lines above it and the developer will be able to address the issue. However, this may not always work and we'll take a look at an alternative shortly.
The emerge error is what
PORT_LOGDIR is a portage variable that sets up a log directory for separate
emerge logs. Let's take a look and see what that entails. First, run your emerge
with PORT_LOGDIR set to your favorite log location. Let's say we have a
location
# PORT_LOGDIR=/var/log/portage emerge foobar2
Now the emerge fails again. However, this time we have a log we can work with, and attach to the bug later on. Let's take a quick look at our log directory.
# ls -la /var/log/portage total 16 drwxrws--- 2 root root 4096 Jun 30 10:08 . drwxr-xr-x 15 root root 4096 Jun 30 10:08 .. -rw-r--r-- 1 root root 7390 Jun 30 10:09 2115-foobar2-1.0.log
The log files have the format [counter]-[package name]-[version].log. Counter is a special variable that is meant to state this package as the n-th package you've emerged. This prevents duplicate logs from appearing. A quick look at the log file will show the entire emerge process. This can be attached later on as we'll see in the bug reporting section. Now that we've safely obtained our information needed to report the bug we can continue to do so. However, before we get started on that, we need to make sure no one else has reported the issue.