Troubleshooting Linux Kernel Modules and Device Drivers - PowerPoint PPT Presentation

About This Presentation
Title:

Troubleshooting Linux Kernel Modules and Device Drivers

Description:

When the Linux kernel determines that a fatal error has occurred, and no ... Typically results in a system reboot on an embedded Linux target ... – PowerPoint PPT presentation

Number of Views:869
Avg rating:3.0/5.0
Slides: 36
Provided by: linux2
Category:

less

Transcript and Presenter's Notes

Title: Troubleshooting Linux Kernel Modules and Device Drivers


1
Troubleshooting Linux Kernel Modules and Device
Drivers
  • Mike Anderson
  • Chief Scientist
  • The PTR Group, Inc.
  • mike_at_ThePTRGroup.com

Source www.crobike.de
2
What Well Talk About
  • How do errors show up in the kernel?
  • Watching kernel/user-space interaction via
    strace
  • Debugging with printk
  • Using the /proc file system
  • Using the kgdb debugger
  • Debugging with hardware ala LEDs or a JTAG unit

3
Challenges of Kernel Debugging
  • There are many features that can make kernel
    debugging especially difficult
  • Optimizing compilers can rearrange code
  • Instruction pointer seems to jump around
  • The use of the MMU can obfuscate addresses
  • Physical vs. virtual addresses
  • Startup code is particularly difficult to debug
    because of its closeness to the metal
  • No equivalent for the user-space gdbserver for
    drivers
  • Early kernel debugging may require hardware
    assistance

4
Device Drivers/Kernel Modules
  • Assuming that your kernel is otherwise working,
    most of the problems that youll encounter are
    related to device drivers
  • Drivers can either be statically linked or
    dynamically loaded to the kernel
  • A dynamically loaded driver takes the form of a
    kernel module
  • Can be dynamically loaded and unloaded at kernel
    run time
  • Frequently handled by daemons such as udev

5
Example Module
  • include
  • include
  • include
  • MODULE_LICENSE("GPL and additional rights")
  • MODULE_AUTHOR( Driver_Author_at_someplace.org )
  • MODULE_DESCRIPTION( My first driver! )
  • int __init mymodule_init_module(void)
  • printk(KERN_DEBUG mymodule_init_module()
    called, ")
  • return 0
  • void __exit mymodule_cleanup_module(void)
  • printk(KERN_DEBUG mymodule_cleanup_module()
    called\n")
  • module_init(mymodule_init_module)
  • module_exit(mymodule_cleanup_module)

6
When Things go Wrong
  • Problems in device drivers typically manifest
    themselves in one of three ways
  • Kernel panic
  • Fatal to the system
  • Kernel oops
  • Near fatal to the system
  • Hardware just doesnt work correctly
  • Could be fatal to you!

Source picasaweb.google.com
7
Kernel Panic
  • When the Linux kernel determines that a fatal
    error has occurred, and no recovery is possible,
    it panics
  • Frequently, an exception in an interrupt context
  • Panic outputs a message to the console
  • The output will help you find the source of the
    bug
  • Typically results in a system reboot on an
    embedded Linux target
  • Or, blinking keyboard LEDs on some desktop
    versions of Linux

Source regmedia.co.uk
8
Sample Panic Output
  • EIP 0060 Not tainted VLI
  • EFLAGS 00010286 (2.6.18)
  • EIP is at nf_queue0x16/0x240
  • eax 00000000 ebx 00000001 ecx 00000000 edx
    00000002
  • esi dece0920 edi c03a3aa8 ebp c0277e70 esp
    cf165e78
  • ds 007b es 007b ss 0068
  • Process sh (pid5278, threadinfocf164000
    taskdc5b4ab0)
  • Stack cf165f00 dae29000 00000000 c0277e70
    00000001 cf165f00 c03a3aa8 c0277e70
  • c02b1f7c cf165f00 dece0920 00000002 00000001
    dae29000 00000000 c0277e70
  • 00000000 dece0920 00000000 da641b40 da54bcde
    cc318abc c0277c10 00000002
  • Call Trace
  • ip_local_deliver_finish0x0/0x230
  • ip_local_deliver_finish0x0/0x230
  • deleted for space
  • do_softirq0x26/0x30
  • do_IRQ0x1e/0x30
  • common_interrupt0x1a/0x20

9
Kernel oops
  • An oops message is displayed when a recoverable
    error has occurred in kernel space
  • Access to bad address, e.g., through a NULL
    pointer
  • Illegal or invalid instruction
  • Etc
  • The calling user process is killed
  • The system should be considered unstable at this
    point
  • The oops message displays
  • The state of the processor at the time of the
    fault, including registers and address of
    faulting instruction
  • function call stack traceback
  • The addresses are replaced with symbols if the
    kallsyms kernel configuration option is selected
    at kernel compile time

10
Example Oops
  • Modules linked in arc4 ecb crypto_blkcipher
    ieee80211_crypt_wep vfat fat ipt_MASQUERADE
    iptable_nat nf_nat bridge bnep rfcomm l2cap
    bluetooth ib_iser rdma_cm ib_cm iw_cm ib_sa
    ib_mad ib_core ib_addr iscsi_tcp libiscsi
    scsi_transport_iscsi sunrpc ipt_REJECT
    nf_conntrack_ipv4 iptable_filter ip_tables
    ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 xt_state
    nf_conntrack ip6table_filter ip6_tables x_tables
    cpufreq_ondemand acpi_cpufreq fuse loop
    dm_multipath ipv6 ppdev parport_pc parport video
    8139cp output ac battery firewire_ohci fglrx(P)
    firewire_core crc_itu_t 8139too mii ipw2200
    ieee80211 ieee80211_crypt snd_intel8x0m
    asus_laptop snd_intel8x0 button snd_seq_dummy
    snd_ac97_codec ac97_bus snd_seq_oss
    snd_seq_midi_event joydev snd_seq pcspkr
    serio_raw snd_seq_device snd_pcm_oss
    snd_mixer_oss snd_pcm usb_storage iTCO_wdt
    snd_timer snd iTCO_vendor_support soundcore
    snd_page_alloc sr_mod cdrom sg dm_snapshot
    dm_zero dm_mirror dm_mod pata_acpi ata_generic
    ata_piix libata sd_mod scsi_mod ext3 jbd mbcache
    uhci_hcd ohci_hcd ehci_hcd last unloaded
    microcode
  • Pid 5488, comm Xorg Tainted P
    (2.6.25.3-18.fc9.i686 1)
  • EIP 0060 EFLAGS 00213246 CPU 0
  • EIP is at task_has_capability0x48/0x76
  • EAX 00000030 EBX f6852030 ECX f6a8af28 EDX
    00000000
  • ESI f36e50e0 EDI f354fec8 EBP f354fed4 ESP
    f354fe84
  • DS 007b ES 007b FS 00d8 GS 0033 SS 0068
  • Process Xorg (pid 5488, tif354f000
    taskf35f0e90 task.tif354f000)
  • Stack c06d48b5 f6852030 f35f0e90 00000003
    f35f0e90 f6852030 00000000 00000000
  • 00000000 00000000 00000000 00000000
    00000000 00000000 00000000 00000000
  • 00000000 f6852030 f35f0e90 f66f8000
    f354fee4 c04d03da f35f0e90 f8d52580

11
Example Oops part 2
  • Call Trace
  • ? selinux_capable0x1f/0x23
  • ? security_capable0xc/0xe
  • ? __capable0xb/0x22
  • ? firegl_version0x0/0x1b0 fglrx
  • ? capable0x10/0x12
  • ? firegl_ioctl0xe7/0x220 fglrx
  • ? mutex_lock0x1d/0x2d
  • ? file_has_perm0x7c/0x85
  • ? ip_firegl_ioctl0xe/0x10 fglrx
  • ? vfs_ioctl0x50/0x69
  • ? do_vfs_ioctl0x239/0x24c
  • ? selinux_file_ioctl0xa8/0xab
  • ? sys_ioctl0x40/0x5b
  • ? syscall_call0x7/0xb
  • Code 00 89 d0 f3 ab 8b 4d
    b8 89 d8 b2 04 c1 f8 05 c6 45 bc 03 89 5d c4 89
    4d c0 74 16 48 b2 45 74 11 53 68 b5 48 6d c0 e8
    11 f3 15 00 0b 59 5b eb fe 8b 46 08 83 e3 1f
    0f b7 f2 8d 55 bc 88 d9 52
  • EIP task_has_capability0x48/0x76
    SSESP 0068f354fe84
  • --- end trace 9451402cb10d9c54 ---

12
Enabling Symbolic Kernel Output
13
Module Debugging Techniques
  • Examine the interaction with the kernel via
    strace
  • The next line of defense is printk
  • There may be additional output youre not seeing
  • Next, we can try adding /proc file system
    entries
  • Instrument the driver for debugging
  • Enable source debugging via kgdb
  • Using hardware debuggers and blinky lights

14
Using strace to Watch System Calls
  • When debugging what appears to be a kernel-space
    error, it can be helpful to watch the system
    calls that are made from user-space
  • See what events lead to the error
  • strace displays all system calls made by a
    program
  • Can display timestamp information per system call
    as well

15
Using strace to Watch System Calls 2
  • strace displays each system calls arguments and
    return values
  • string arguments are printed very helpful!
  • errno values displayed symbolically
  • The program being traced runs normally
  • Not under control of a debugger
  • No need to specially compile the user
    application
  • You can attach to a running program
  • And trace forked applications as well

16
Example strace Output
  • / strace ls /dev/labdev
  • execve("/bin/ls", "ls", "/dev/labdev", / 8
    vars /) 0
  • fcntl64(0, F_GETFD) 0
  • fcntl64(1, F_GETFD) 0
  • fcntl64(2, F_GETFD) 0
  • geteuid() 0
  • getuid() 0
  • getegid() 0
  • getgid() 0
  • brk(0)
    0x1028ad68
  • brk(0x1028bd68)
    0x1028bd68
  • brk(0x1028c000)
    0x1028c000
  • ioctl(1, TIOCGWINSZ or TIOCGWINSZ, ws_row0,
    ws_col0, ws_xpixel0, ws_ypixel0
  • ioctl(1, TCGETS or TCGETS, B9600 opost isig
    icanon echo ...) 0
  • ioctl(1, TCGETS or TCGETS, B9600 opost isig
    icanon echo ...) 0
  • lstat("/dev/labdev", st_modeS_IFCHR0644,
    st_rdevmakedev(254, 0), ...) 0
  • open("/etc/localtime", O_RDONLY) -1
    ENOENT (No such file or directory)
  • lstat("/dev/labdev", st_modeS_IFCHR0644,
    st_rdevmakedev(254, 0), ...) 0
  • fstat64(1, st_modeS_IFCHR0600,
    st_rdevmakedev(4, 64), ...) 0

17
Debugging with printk
  • printk debugging is the debug method preferred by
    Linus
  • At least, according to his email traffic
  • Insert messages to be displayed at points of
    interest in kernel-space code
  • E.g., printk(KERN_INFO my_x d\n, my_x)
  • Printk works like printf does in user-space
    except that printk can only print integers,
    strings addresses
  • printk can also be called from within ISRs
  • Messages can have importance settings that
    allow filtering
  • Importance is set by prepending a 3-character
    string to the output message

18
Debugging with printk 2
  • The messages are placed in a circular buffer that
    can be retrieved post mortem if needed
  • The importance string that is prepended to the
    printk message can be found in include/linux/kerne
    l.h
  • define KERN_EMERG "" / system is
    unusable /define KERN_ALERT "" / action
    to be taken immediately /define KERN_CRIT
    "" / critical conditions /define KERN_ERR
    "" / error conditions /define
    KERN_WARNING "" / warning conditions
    /define KERN_NOTICE "" / normal but
    significant condition /define KERN_INFO
    "" / informational /define KERN_DEBUG
    "" / debug-level messages /

19
Debugging with printk 3
  • Some level of control for printk output can be
    found in /proc/sys/kernel/printk
  • Lets look at the following output
  • cat /proc/sys/kernel/printk
  • 7 4 1 7
  • This indicates
  • The console_loglevel is 7, so messages with
    importance of 0..6 will currently go to the
    console
  • The default message log level is 4, so messages
    that do not specify an importance are treated as
    level 4
  • The minimum console log level is 1, so
    console_loglevel cannot be set to any value less
    than 1
  • The default console log level is 7, so
    console_loglevel starts out set to 7

20
Debugging with printk 4
  • You may control the console_loglevel by writing
    to /proc/sys/kernel/printk
  • To enable all printk messages with importance
    levels 0..7
  • echo 8 /proc/sys/kernel/printk
  • If the kernel command line contains the word
    debug, the console_loglevel starts with a value
    of 10

21
Using the /proc File System
  • Use /proc entries for driver instrumentation
  • It is possible to register write functions that
    allow us to dynamically modify values in the
    kernel (or drivers, or modules)
  • Write values to the /proc file system entry
  • Read /proc entries allow us to retrieve
    information from a running kernel entity
  • Information is provided live
  • Look at the driver source, there may already be
    /proc entries that can help you

22
Techniques for Source Debugging
  • The two primary ways to provide source debugging
    in the Linux kernel are based on either kgdb or
    on the use of a hardware JTAG probe
  • Unfortunately, kgdb is not a standard feature of
    the kernel a/o 2.6.25.8
  • Youll have to patch your kernel to enable it
  • Either technique will require the use of a kernel
    image that is compiled with debugging symbols
  • Unless you like debugging in assembly language

23
Compiling the Kernel with Debug Info
  • This will increase the size of the debug kernel
    image by about 30
  • However, you dont need to load the debug version
    of the kernel
  • Load the non-debug version to the target, but use
    the debug version for the debugger/JTAG probe
  • Save off the vmlinux and System.map file because
    these are used by the debugger or by you to find
    key addresses
  • The (b)zImage can be loaded on the target as
    normal

24
Enabling Debugging in the Kernel
25
Kernel gdb (kgdb)
  • If you are using a stock kernel, kgdb is not
    included
  • Linus doesnt believe in a source debugger in the
    kernel
  • Many commercial Linux vendors do include it in
    their distributions though
  • kgdb can be downloaded fromhttp//kgdb.linsyssof
    t.com/downloads.htmor http//sourceforge.net and
    look for kgdb
  • Youll need to patch the kernel
  • A new kgdb light is in the works for 2.6.26
  • Kgdb over the system console

26
Enable kgdb in the Kernel
27
Kgdb Light in 2.6.26-rc8
  • Uses system console for I/O

28
kgdb Lash up
  • kgdb supports debugging via the serial port
  • The gdb debugger is running on a second machine
    using the vmlinux you compiled with debugging
    symbols
  • You attach to the system being debugged using
    gdbs target remote command

Network
Host
Target
Ethernet
RS-232
29
Hardware-assisted Debugging
  • There are a number of devices that can help with
    debugging
  • LEDs, JTAGs, logic analyzers, oscilloscopes, bus
    analyzers and more
  • These can range from a few cents to implement to
    several 10s of thousands of dollars
  • You typically get what you pay for

30
Debugging with LEDs
  • Very simple
  • Blink on/off in various code sections under
    debug
  • Blink in sequences
  • Can display multiple-bit codes if multiple LEDs
    are available
  • Very fast, little impact on run-time performance
  • Adding LED debug code will likely not make the
    problem go away
  • These may be the only option for debugging early
    x86 code

31
Debugging with LEDs, Caveats
  • The LED(s) must be free for use
  • Not tied in hardware to a network PHY or to
    displaying power status, for example
  • LED are not very verbose
  • You must decipher what the blinking means
  • Can be difficult to determine where you are in
    the code
  • You can also attach an oscilloscope to the GPIO
    pins found on many processors for more information

32
Hardware Debuggers
  • In the past, in-circuit emulators (ICE) where the
    debugger of choice
  • You pulled the CPU, plugged the ICE in and
    plugged the CPU into the ICE
  • But, these where 80K each
  • Logic analyzers are also good to have
  • But, they are 35K for an empty mainframe
  • PC-based versions can be had for
  • IEEE 1149.1 (JTAG) has become the debugger da
    jour
  • JTAG uses a boundary-scan protocol
  • These range from 70 to 20K depending on model
    and features
  • At a minimum, a JTAG is really a must-have for
    firmware and board bring up

33
Debugging with a JTAG Probe
  • Debugging with a JTAG unit is much less involved
    than using kgdb
  • Compile the kernel with debugging enabled
  • No need to patch the kernel for kgdb
  • Assumes your platform supports a JTAG interface
  • You connect to the JTAG unit using whatever
    technique your JTAG probe requires
  • Your JTAG GUI is dependent on the vendor
  • For those JTAG units that are gdb-aware, use the
    appropriate target remote commands

34
Example JTAG Usage
  • Connect the JTAG to the target and the host
  • Start the host application to control the JTAG
  • Reset the target and load the register
    configuration settings to the JTAG unit
  • Load code and enjoy!
  • Useful for debugging drivers as well as bringing
    up new firmware and BSPs

35
Summary
  • Real developers use printk or at least Linus
    does ?
  • Tools like strace allow you see the flow of
    execution
  • The /proc filesystem gives you a window into the
    kernel/drivers
  • Kgdb uses familiar gdb technology but at the
    kernel level
  • LEDs are a fast and easy way to get info out of
    the machine as well
  • Hardware JTAG debug tools may be available
  • These can be invaluable if you can get one
Write a Comment
User Comments (0)
About PowerShow.com