Title: LKCD Linux Kernel Crash Dump
1LKCD Linux Kernel Crash Dump
2 3The Journey
- Introduction
- LKCD Process
- Design Considerations
- Kernel Implementation
- User Level Analysis (Lcrash)
4Introduction
- LKCD is a set of kernel and application code to
configure, implement, and analyze system crash
dumps
- Objectives
- Post-failure kernel analysis
- Kernel problems are resolved more quickly
- As the Linux kernel becomes more complex, the
need for LKCD increases
5LKCD - Process
6LKCD Kernel Design Considerations
- The biggest design considerations were
- Dump Save Mechanism
- Raw I/O vs. Buffer Cache I/O
- Kernel Code Location
- Dump Storage
7LKCD Kernel Design Considerations
- 1. Dump Save Mechanism
- PROM Save Method
- Crash, reset the system, and have the
hardware's PROM save the memory image to disk.
- Kernel Save Method
- Crash, save the memory image to disk, and then
reset the system
8LKCD Kernel Design Considerations
- 1. Dump Save Mechanism
- Kernel save method chosen because
- PROM/BIOS is too architecture-specific
- reset/power-off may clear memory
- kernel disk driver restrictions
- code can be modified in kernel PROM code is
difficult to make changes
9LKCD Kernel Design Considerations
- 2. Raw I/O vs. Buffer Cache I/O
- Buffer cache locking prevents handling dump
workaround without major performance hit on basic
I/O
- Raw I/O was not fully supported in Linux (in the
kernel)
- IDE, RAID, etc., drivers need raw I/O hooks
(current plan is to create driver layer above to
avoid necessary locking)
10LKCD Kernel Design Considerations
- 3. Kernel Code Location
- Code changes are separated into generic and
architecture-specific files
- kernel/vmdump.c
- arch//kernel/vmdump.c
- Additional modifications made to
linux/include/sysctl.h, kernel/sysctl.c, and
kernel crash hook functions
11LKCD Kernel Design Considerations
- 4. Dump Storage
- Memory dumps are saved to swap space
- Swapping during boot-up is an issue
- Disk partition tables in memory -- could this
cause a data corruption problem?
- Cannot assume filesystem layer will be available
during crash
12LKCD - Kernel Implementation
- Dump Process Activation
- Kernel Hooks for executing dump process
- The kernel directly calls panic()
- A kernel exception occurs due to a system fault,
calls die_if_kernel()
- In both instances dump_execute is called, which
in turn calls architecture specific
__dump_execute() to save dump to disk
13LKCD - Kernel Implementation
Dump Header
Dump Page Headers
Dump pages
14LKCD - Kernel Implementation
- Storing Crash Dumps
- The first 64K of the crash dump contains the dump
header, which show the system state at the time
of the kernel failure
- Memory pages are written next, each with a page
header containing
- virtual address of the page in memory
- size of page (important if compressed)
- page flags (compressed, raw, dump end)
- page header with a special end marker is written
and the dump process completes
15Kernel Dump Tunables
- The set of kernel dump tunable are listed in
/etc/sysconfig/vmdump which configures the
behavior of LKCD system
- The tunables are
- DUMP_ACTIVE
- DUMPDEV
- DUMPDIR
- DUMP_LEVEL
- DUMP_COMPRESS_PAGES
- PANIC_TIMEOUT
16User Level Analysis - LCrash
- lcrash is a utility that generates detailed
kernel information about crash dumps. It contains
many features for displaying information about
the events leading up to a system crash in a
clear, easy-to-read manner - It basically operates in two modes
- Crash Dump Report Generation
- Interactive Crash Dump Analysis
17User Level Analysis - LCrash
- Crash Dump Report Generation
- This report contains selected pieces of
information from the kernel considered most
useful when trying to identify the cause of a
crash. The LCRASH report includes the following
information - General system information
- Type of crash
- Dump of system log_buf
- CPU summary
- Kernel stack trace leading up to the system PANIC
- Disassembly of instructions before and after the
instructions that caused the crash
18User Level Analysis - LCrash
- LCRASH Interactive Commands
- For a more detailed examination of the elements
of a crash
- Kernel data displayed in a clear, easy-to-read
manner
- Invoked via an ASCII command line user interface
featuring command line editing and command
history
- Command output can be piped to utilities such as
more and grep
19User Level Analysis - LCrash
- LCRASH Interactive Commands example
- Stat Displays pertinent system information and
the contents of the log_buf array.
- Vtop Displays virtual to physical address
mappings for both kernel and application virtual
addresses
- Symbol Maps kernel symbols to virtual addresses
20User Level Analysis - LCrash
- LCRASH Interactive Commands example
- Dump Dumps the contents of system memory in a
variety of bases (hexadecimal, decimal, or octal)
and data sizes (byte, short, int, or long)
- Task Displays relevant information for selected
tasks or all tasks running at the time of the
crash
- Trace Displays a kernel stack backtrace for
selected tasks, or for all tasks running on the
system
- Dis Disassembles one or more machine instructions
21lcrash Example Output
- stat head
- sysname Linux
- nodename crashme.atmyhouse.com
- release 2.4.8
- version 9 SMP Mon Dec 10 000519 PST 2001
- machine i686
- domainname (none)
- LOG_BUF
- dump log_buf 10
- 0xc0332c60 4c3e343c 78756e69 72657620 6e6f6973
Linux version
- 0xc0332c70 342e3220 2820382e 746f6f72 74617740
2.4.8 (root_at_cra
- 0xc0332c80 79657265 70612e65
shme.atm
22lcrash Example Output
- task
- ADDR UID PID PPID STATE FLAGS
CPU NAME
- 0xc02e4000 0 0 0 0 0
- swapper
- 0xdfffc000 0 1 0 0 0x100
- init
- 0xdfff2000 0 2 1 1 0x40
- keventd
- 0xdffee000 0 3 0 0 0x40
- ksoftirqd_CPU0
- . . .
- 0xde47a000 0 867 1 1 0x100
- mingetty
- 0xda0fe000 0 1017 660 0 0x140
- sshd
- 0xd9c06000 0 1018 1017 1 0x100
- bash
- 0xde4b4000 0 1101 1018 0 0x100
0 insmod
- 31 active task structs found
23lcrash Example Output
- t 0xda0fe000
- STACK TRACE FOR TASK 0xda0fe000(sshd)
- 0 schedule1040 0xc0111250
- 1 schedule_timeout121 0xc0110d89
- 2 do_select506 0xc014251a
- 3 sys_select820 0xc01428c4
- 4 system_call44 0xc0106ed4
24- Reference
- http\\lkcd.sourceforge.net
- Contact
- harish_at_motorola.com
25Questions/Comments?