Improving Dependability of Commodity Operating Systems with Program Analysis - PowerPoint PPT Presentation

1 / 34

About This Presentation

Title:

Improving Dependability of Commodity Operating Systems with Program Analysis

Description:

Light annotations in extension code and host API ... Free resources and undo state changes done by driver ... Video: nvidia. 29. Evaluation: Recovery Rate ... – PowerPoint PPT presentation

Number of Views:105

Avg rating:3.0/5.0

Slides: 35

Provided by: billmcc3

Category:

more less

Transcript and Presenter's Notes

Title: Improving Dependability of Commodity Operating Systems with Program Analysis

1
Improving Dependability of Commodity Operating
Systems with Program Analysis

Feng Zhou
Ph.D. Dissertation Talk
10/8/2007

2
OS Dependability Challenge

Dependability problems
Problems dealing with latest or peculiar hardware
Worms because of so many security flaws
More problems for long running servers or large
clusters Bligh et al. 07
Reason 1 Unsafe languages used (C/C)
Buffer overruns can be eliminated with safe
languages

3
More Reasons

Reason 2 Complexity
Windows Vista 50M LOC
Linux kernel 8.3M LOC (86 lines/hr for last 2
years)
Will likely keep growing
Multicore ? more parallelism
ccNUMA, smarter devices
Reason 3 The cost of defects is increasing
Large portion of computers online ? remote
exploits
Critical infrastructure using commodity systems
OS monoculture

4
Approach

A holistic approach,
Static analysis
Dynamic analysis/instrumentation
Changes to OS architecture or coding practice
Principles
Require scalability to sizes of commodity OSes
Favor sound analyses (vs. heuristics-based ones)
Favor general techniques (vs. system-specific
ones)

5
Roadmap

Static analysis for Linux kernel locking safety
One class of problems interaction between
interrupts and locks
CtxCheck, a tool to statically find these bugs in
Linux
Type-safe and recoverable device drivers
3rd party drivers a big headache for reliability
SafeDrive, a hybrid system for device driver
safety

6
Linux Locking Background

Many concurrency control primitives
Semaphores, mutexes, spinlocks, RCU.
Common locking problems
Race conditions
Deadlocks
Context-related problem (kernel specific)
Context state of the current processor
Interrupt context vs. process context software
interrupt context
Lockdep, in-kernel dynamic checker

7
Problem Example
/ Process context / spinlock_t lock foo()
spin_lock(lock) spin_unlock(lock)
/ Hardirq context / isr()
spin_lock(lock) spin_unlock(lock)
spins forever for lock
interrupt!

Spin_lock() spins until the lock is acquired

8
Correct Usage
/ Process context / spinlock_t lock foo()
spin_lock_irq(lock) spin_unlock_irq(lock)

/ Hardirq context / isr()
spin_lock(lock) spin_unlock(lock)

Rule used when interrupt is ON is not
compatible with used in Hardirq context

9
Another Example
mutex_t m util() mutex_acquire(m)
mutex_release(m) isr() / Hardirq context
/ util()
mutexes cannot be used in IRQ context at all
10
CtxCheck Analysis Overview

Goal find out possible contexts at each program
point
Whole program analysis flow-sensitive,
inter-procedural, context-insensitive
Dataflow analysis for each function ? a summary
flags (in_irq, irqs_enabled) //
preconditions
must_in_irq true/false/unknown
irqs_must_enabled t/f/u
action enable/disable/unchanged
Propagate among functions to compute fix point
Check results against a set of safety rules

11
Dataflow Analysis

Input flags summaries of other funcs called
Output action changes to other funcs
preconditions
Transfer functions
Calls to other funcs apply actions
Primitives local_irq_disable(), spin_lock_irq()
Special IRQ_SAVE annotation for
spin_lock_irqsave() etc.
Guards for if statements
E.g. if (in_interrupt()) else

12
Points-to Analysis for Function Ptrs

Need to resolve function pointers for dataflow
Found standard points-to analysis inadequate
Not accurate enough (tried One-Level-Flow Das
00), or too slow
Observation function ptrs are simpler
No arithmetics, no dynamic allocation
Type-based support for idioms
Vtable-like structs associate funcs with fields
Similar for global arrays
Callbacks special annotations

13
Annotations and Assertions

Supports 15 annotations and 7 assertions
Seed the analysis (e.g. syscall entries)
Work around limitations of the analysis
Inference to lower burden
Examples
IRQ_ON/OFF
IN_IRQ/NOT_IN_IRQ
ACT_ENABLED/DISABLED/UNCHANGED
ASSERT_IRQ_ON

14
Checking the Linux Kernel

CtxCheck implementation 3000 lines of OCaml
Minimally configured 2.6.20.7 855K LOC
preprocessed
SMP, networking stack, 32 NIC drivers, No FS
Changed 1792 lines (0.2), added 1277 annotations
Checking took 83 seconds
Annotation coverage

15
Example Output
3 callsites
16
Example Output cont.
17
Bugs Found

Wasnt expecting to find many bugs
2.6.20.7 is a stable version
Lockdep has been merged for 6 months
After adding annotations (3 days of work), the
report shows 81 warnings
9 we believe are bugs
All are on error/side paths
4 confirmed (fixed in 2.6.11), all in qla3xxx.c
Forget to unlock on error path

18
OS Extension Safety

OS extensions are often buggier than hosts
Device drivers cause a large percentage of
Windows crashes
Xbox hacked due to memory bugs in games
Separate hardware protection domains Nooks
Swift et al, L4 LeVasseur et al, Xen Fraser
et al
Relatively high overhead due to cross-domain
calls, system specific
Binary instrumentation SFI Wahbe et al,
Small/Seltzer

19
A Language-Based Approach to Extension Safety

Light annotations in extension code and host API
Buffer bounds, non-null pointers, nullterm
strings, tagged unions
Deputy src-to-src compiler emits safety checks
when necessary
Key compatible extension-host binary interface
Runtime tracks resource usage and restores system
invariants at fail time

Annot.Source
Deputy
C w/ checks
GCC
DriverModule
SafeDrive Runtime Recovery
Linux Kernel
Kernel Address Space
20
Deputy Bounds Annotations

struct
unsigned int len
int count(len) data
x
for(i 0 i lt x.len i)
if (ilt0igtx.len) abort()
x.datai

Annotations use existing bounds info in programs,
or constants
Compiler emits runtime checks
No memory layout change? Can be applied to one
extension a time
Many checks can be optimized away

21
Deputy Features

Bounds safe,count(n), bound(lo,hi)
Default safe
Other annotations
Null terminated string/buffer
Tagged unions
Open arrays
Checks for printf() arguments
Automatic bounds variables for local variables?
reduced annotation burden

22
Deputy Guarantees

Deputy guarantees type-safety if,
Programmer correctly annotates globals and
function parameters used by the extension
Deallocation does not create dangling pointers
Trusted casts are correct
External modules / trusted code establish and
preserve Deputy annotations

23
Failure Handling

Everything runs inside the same protection domain
After Deputy check failure could just halt
More useful clean-up extension and let host
continue
Assumption restarts should fix most transient
failures

Annot.Driver
Deputy
C w/ checks
GCC
DriverModule
SafeDrive Runtime Recovery
Linux Kernel
Kernel Address Space
24
Update Tracking and Restarts

Free resources and undo state changes done by
driver
Kernel API functions wrapped to do update
tracking
Compensations spin_lock(l) vs. spin_unlock(l)
After failure, undo updates in LIFO order
Then restart driver

Annot.Driver
Deputy
C w/ checks
GCC
DriverModule
Recovery
UpdateTracking
Wrappers
Linux Kernel
Kernel Address Space
25
Return Gracefully from Failure

Invariants
No driver code is executed after failure

Kernel foo()
Driver bar1()
Driver bar2()
Err code
26
Return Gracefully from Failure

Invariants
No driver code is executed after failure
No kernel function is forced to return early

Kernel foo1()
Driver bar1()
Kernel foo2()
Driver bar2()
lock()
unlock()
27
Discussion

Compared to Nooks
Significantly less interception ? Much simpler
overall
Deputy does fine-grained per-allocation checks ?
No separate heap/stack
No help from virtual memory hardware
Works for user-level applications and safe
languages

28
Implementation

Deputy compiler 20K lines of OCaml
Kernel patch to 2.6.15.5 1K lines
Kernel headers patch 1.9K lines
Patch for 6 drivers in 4 categories
Network e1000, tg3
USB usb-storage
Sound intel8x0, emu10k1
Video nvidia

29
Evaluation Recovery Rate

Inject random errors with compile-time injection
5 errors from one of 7 categories each time
Faults chosen following empirical studies
Sullivan Chillarege 91, Christmansson
Chillarege 96
Scan overrun, loop fault, corrupt parameter,
off-by-one, flipped condition, missing call,
missing assignment
Load buggy e1000 driver w/ and w/o SafeDrive
Exercise by downloading a 89MB file, verifying it
and unloading the driver
Then rerun with original driver

30
Recovery Rate Results

140 runs, 20 per fault category

SafeDrive is effective at detecting and
recovering from crashing problems, and can detect
some statically.

31
Annotation Burden

1-4 of lines with Deputy annotations
Recovery wrappers can be automatically generated

32
Annotations Break-down

Common reasons for trusted casts and trusted code
Polymorphic private data, e.g. netdev-gtpriv
Small number of cases where buffer bounds are not
available
Code manipulating pointer values directly, e.g.
PTR_ERR(x)

33
Performance
e1000 TCP recv e1000 UDP recv e1000 TCP
send e1000 UDP send tg3 TCP recv tg3 TCP
send usb-storage untar emu10k aplay intel8x0
aplay nvidia xinit

Nooks (Linux 2.4) e1000 TCP recv 46 (vs. 4),
e1000 TCP send
111 (vs. 12)

34
Conclusion

We discussed two case studies of using program
analysis to improve OS dependability
It is feasible to understand complicated issues
(concurrency) and provide guarantees (memory
safety) for large systems like Linux
Sound static analysis tools have the benefit of
better coverage than pure runtime tools
A holistic approach combining the language, tool,
runtime and OS code itself is a productive way in
achieving these