Title: Improving Extension Reliability Using LanguageBased Techniques
1Improving Extension ReliabilityUsing
Language-Based Techniques
- Ph.D. Qualifying Examination
- Feng Zhou
- CS, UC Berkeley
- 11/21/2006
2Motivation
- OSes and applications often run loadable
extensions - e.g. Linux kernel, Apache, Firefox
- Run in the same protection domain
- Extensions are often buggier than hosts
- Device drivers cause a large percentage of
Windows crashes - Xbox hacked due to memory bugs in games
3Problem Statement
- The Extension Isolation Problem
- Detect extension failures and protect other parts
of the system from these failures
4Why Extensions Fail?
- Memory-safety problems
- Null pointer dereference
- Buffer overrun
- Dangling pointers
- Concurrency problems
- Race conditions
- Deadlocks
- Domain specific problems (e.g. interrupt-related)
- Improper API usage
- Using a file descriptor after closing it
- See Sullivan91, Christmansson96, Chou01
5Previous Approaches
- User-level drivers (e.g. in Windows Driver
Foundation) - Currently for non-interrupt drivers and
non-performance-critical ones only - Separate hardware protection domain virtual
machines Nooks Swift et al, L4 LeVasseur et
al, Xen Fraser et al - Coarse-grained, high overhead, system specific
- Binary instrumentation SFI Wahbe et al,
Small/Seltzer - Coarse-grained, system specific
- Static analysis software guards XFI
Erlingsson et al - Verification works at binary level
6Conjecture
- We can get more reliable extensions with,
- a bit more info. in the language (C w/
annotations) - more advanced and cooperated compiler/runtime
support
7A Language-Based Approach to Extension Isolation
- Light-weight annotations in extension code and
host API - A src-to-src compiler tries to verify safety
- Emits runtime checks when necessary
- Hybrid checking
- Runtime tracks resource usage and restores system
invariants when extension fail
Annot.Source
Src2src compiler
C w/ checks
GCC
Extension
Runtime Recovery
Host Program
Address Space
8Goals and Non-goals
- Fine-grained safety checks and few false
positives - Low performance overhead
- Require few changes and no hardware support
- Detect popular bugs e.g. memory concurrency
- Do not aim to block malicious code
- Gain knowledge about how to improve API for
better safety and recoverability
9Outline
- Introduction
- SafeDrive
- Memory Safety Checking
- Extension Recovery
- Future work
- SafeDrive Improvements
- Locking safety
- Better device driver API for Linux
- Timeline
10SafeDrive Overview
- Detects and recovers from memory safety problems
in Linux device drivers - OSDI06
- Adds fined-grained type-safety, to extensions
only - Maintains compatible kernel-driver binary
interface - A way to recover from detected failures by
restarting drivers
11Deputy Compiler
- Deputy compiler by Jeremy Condit et al.
- Compiler emits runtime checks
- No memory layout change? Can be applied to one
extension a time - Bounds safe,count(n)
- Null term strings, tagged unions, open arrays,
printf
- struct
- unsigned int len
- int count(len) data
- x
- for(i 0 i lt x.len i)
- if (ilt0igtx.len) abort()
- x.datai
-
-
- void clear(char count(size) buf, int size)
12Deputy Guarantees
- Deputy guarantees type-safety if,
- Programmer correctly annotates globals and
function parameters used by the extension - Deallocation does not create dangling pointers
- Trusted casts are correct
- External modules / trusted code establish and
preserve invariants specified by existing
annotations - Concurrent accesses are properly synchronized
13SafeDrives use of Deputy
Annot.Driver
Annot.Kernel Headers
- Kernel API functions and data structures used are
annotated (header files) - One time cost
- Function parameters and global data structures in
drivers annotated - 1-4 of lines
- Kernel needs no annotations and is trusted.
Deputy
C w/ checks
GCC
InstrumentedDriver Module
14Failure Handling
- Everything runs inside the same protection domain
- After Deputy check failure could just halt
- More useful clean-up extension and let host
continue - Assumption restarts should fix most transient
failures
Annot.Driver
Deputy
C w/ checks
GCC
DriverModule
SafeDrive Runtime Recovery
Linux Kernel
Kernel Address Space
15Update Tracking and Restarts
- Free resources and undo state changes done by
driver - Kernel API functions wrapped to do update
tracking - Compensations spin_lock(l) vs. spin_unlock(l)
- After failure, undo updates in LIFO order
- Then restart driver
Annot.Driver
Deputy
C w/ checks
GCC
DriverModule
Recovery
UpdateTracking
Wrappers
Linux Kernel
Kernel Address Space
16Return Gracefully from Failure
- Invariants
- No driver code is executed after failure
Kernel foo()
Driver bar1()
Driver bar2()
Err code
17Return Gracefully from Failure
- Invariants
- No driver code is executed after failure
- No kernel function is forced to return early
Kernel foo1()
Driver bar1()
Kernel foo2()
Driver bar2()
lock()
unlock()
18Discussion
- Compared to Nooks
- Significantly less interception ? Much simpler
overall - Deputy does fine-grained per-allocation checks ?
No separate heap/stack - No help from virtual memory hardware
- Works for user-level applications and safe
languages - Compared to C/Java exceptions
- Compensation does not contain any code from
driver - Only restores host state, not extension state
19Implementation
- Deputy compiler 20K lines of OCaml
- Kernel patch to 2.6.15.5 1K lines
- Kernel headers patch 1.9K lines
- Patch for 6 drivers in 4 categories
- Network e1000, tg3
- USB usb-storage
- Sound intel8x0, emu10k1
- Video nvidia
20Evaluation Recovery Rate
- Inject random errors with compile-time injection
5 errors from one of 7 categories each time - Faults chosen following empirical studies
Sullivan Chillarege 91, Christmansson
Chillarege 96 - Scan overrun, loop fault, corrupt parameter,
off-by-one, flipped condition, missing call,
missing assignment - Load buggy e1000 driver w/ and w/o SafeDrive
- Exercise by downloading a 89MB file, verifying it
and unloading the driver - Then rerun with original driver
21Recovery Rate Results
- 140 runs, 20 per fault category
- SafeDrive is effective at detecting and
recovering from crashing problems, and can detect
some statically.
22More Results
- Annotation burden
- 1-4 of driver code changed for annotations
- Less amount of wrapper code. Can be automatically
generated in the future - Performance
- lt25 overhead for driver micro-benchmark
- E.g. TCP send w/ Netperf,
- About 1/10 the overhead of Nooks in two
comparable experiments
23Outline
- Introduction
- SafeDrive
- Memory Safety Checking
- Extension Recovery
- Future work
- SafeDrive Improvements
- Locking safety
- Better device driver API for Linux
- Timeline
24SafeDrive Improvements
- Separate wrappers from driver/kernel headers
- To evolve with new versions of drivers and kernel
- Needs kernel loader support
- Tools for usability
- Identify driver entry functions and generate
wrappers - List all unwrapped kernel functions called, to
help identify API functions to wrap - Annotate more drivers and run on live servers
- Make code release
25Common locking problems in kernel
- Race conditions
- Deadlocks
- Acquiring the same spinlock multiple times
- Acquiring multiple locks in different orders
- Using locks in wrong contexts
- Acquiring a mutex in interrupt context
- Acquiring a spinlock in interrupt context and
also in process context with interrupt enabled
26Hybrid lock checking
- Assign a static name to each lock
- Combine dynamically allocated locks to static
ones - Some functions are annotated with
process/interrupt contexts - Inference propagate these annotations
- An analysis checks locking safety
- Context constraints
- Consistent global ordering
27Hybrid lock checking (2)
- When not sure, emit runtime checks
- Runtime checks done with lockdep in Linux kernel
- Lockdep Molnar06 builds lock ordering and
context constraints at runtime - Store lock orderings in a big hash table
- Consumes memory and causes significant slowdown
- With the hybrid checking
- Locks verified to be safe do not need to be
tracked by lockdep
28Better driver API for Linux
- In most OSes, drivers communicate with the kernel
with a wide and trusted API - 2500 symbols exported to drivers in Linux
- Some Linux driver API functions are not checkable
for memory safety - Driver API improvements
- Introduce shim between common functions to do
parameter/invariant checking - Fix legacy functions where memory safety are not
checkable - Change kernel data structures for memory safety
29Better driver API for Linux (2)
- Gauge of success
- Whether more checking finds undetected problems
in previous experiments - How many lines of trusted code are eliminated
- Find real bugs?
- Related work Windows Driver Foundation (WDF)
- A wrapper API on top of existing Windows driver
API - Better default values, parameter checking and
back-ward compatibility - Backward compatibility not 100 necessary for
Linux
30Outline
- Introduction
- SafeDrive
- Memory Safety Checking
- Extension Recovery
- Future work
- SafeDrive Improvements
- Locking safety
- Better device driver API for Linux
- Timeline
31Timeline
- Phase 1 - Oct. 2006
- Basic SafeDrive design and implementation
- Phase 2 Oct. 2006 Dec. 2006
- Tools for SafeDrive
- Locking safety
- Phase 3 Jan. 2007 May 2007
- Modular locking safety
- Better driver API
- Phase 4 June 2007 Dec. 2007
- Wrap up and dissertation writing
32(No Transcript)
33Classification of OS problems
- Due to Sullivan91 on IBM field data
- Problems corrupting program data
- 75 memory-safety related
- 8 concurrency related
- 17 others
- Regular problems
- 30 memory-safety related
- 14 concurrency related
- 56 others
34How do you change bounds/tags
struct unsigned int len int count(len)
data x x.data NULL if (x.data!NULL
(Alt0Agtlen)) abort x.len A if
(Bltsizeof(int)x.len) abort x.data malloc(B)
1
2
3
35Related Work
- Improving memory safety of C
- Safe C-like language Cyclone Morrisett et al
- Hybrid checking (non-modular) CCured Necula et
al - Type qualifiers for static checking CQual
Foster et al, Johnson/Wagner, Sparse Torvalds - Improving OS/extension reliability
- Hardware protection Nooks Swift et al, L4
LeVasseur et al, Xen Fraser et al - Binary instrumentation SFI Wahbe et al,
Small/Seltzer, XFI Erlingsson - Using Cyclone OKE Bos/Samwel
- Static validation of API usage SLAM Ball et al
- Writing OS with safe language Singularity Patel
et al
36Performance
e1000 TCP recv e1000 UDP recv e1000 TCP
send e1000 UDP send tg3 TCP recv tg3 TCP
send usb-storage untar emu10k aplay intel8x0
aplay nvidia xinit
- Nooks (Linux 2.4) e1000 TCP recv 46 (vs. 4),
e1000 TCP send
111 (vs. 12)
37Annotation Burden
- 1-4 of lines with Deputy annotations
- Recovery wrappers can be automatically generated
38Annotations Break-down
- Common reasons for trusted casts and trusted code
- Polymorphic private data, e.g. netdev-gtpriv
- Small number of cases where buffer bounds are not
available - Code manipulating pointer values directly, e.g.
PTR_ERR(x)