Improving Dependability of Commodity Operating Systems with Program Analysis - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Improving Dependability of Commodity Operating Systems with Program Analysis

Description:

Light annotations in extension code and host API ... Free resources and undo state changes done by driver ... Video: nvidia. 29. Evaluation: Recovery Rate ... – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 35
Provided by: billmcc3
Category:

less

Transcript and Presenter's Notes

Title: Improving Dependability of Commodity Operating Systems with Program Analysis


1
Improving Dependability of Commodity Operating
Systems with Program Analysis
  • Feng Zhou
  • Ph.D. Dissertation Talk
  • 10/8/2007

2
OS Dependability Challenge
  • Dependability problems
  • Problems dealing with latest or peculiar hardware
  • Worms because of so many security flaws
  • More problems for long running servers or large
    clusters Bligh et al. 07
  • Reason 1 Unsafe languages used (C/C)
  • Buffer overruns can be eliminated with safe
    languages

3
More Reasons
  • Reason 2 Complexity
  • Windows Vista 50M LOC
  • Linux kernel 8.3M LOC (86 lines/hr for last 2
    years)
  • Will likely keep growing
  • Multicore ? more parallelism
  • ccNUMA, smarter devices
  • Reason 3 The cost of defects is increasing
  • Large portion of computers online ? remote
    exploits
  • Critical infrastructure using commodity systems
    OS monoculture

4
Approach
  • A holistic approach,
  • Static analysis
  • Dynamic analysis/instrumentation
  • Changes to OS architecture or coding practice
  • Principles
  • Require scalability to sizes of commodity OSes
  • Favor sound analyses (vs. heuristics-based ones)
  • Favor general techniques (vs. system-specific
    ones)

5
Roadmap
  • Static analysis for Linux kernel locking safety
  • One class of problems interaction between
    interrupts and locks
  • CtxCheck, a tool to statically find these bugs in
    Linux
  • Type-safe and recoverable device drivers
  • 3rd party drivers a big headache for reliability
  • SafeDrive, a hybrid system for device driver
    safety

6
Linux Locking Background
  • Many concurrency control primitives
  • Semaphores, mutexes, spinlocks, RCU.
  • Common locking problems
  • Race conditions
  • Deadlocks
  • Context-related problem (kernel specific)
  • Context state of the current processor
  • Interrupt context vs. process context software
    interrupt context
  • Lockdep, in-kernel dynamic checker

7
Problem Example
/ Process context / spinlock_t lock foo()
spin_lock(lock) spin_unlock(lock)
/ Hardirq context / isr()
spin_lock(lock) spin_unlock(lock)
spins forever for lock
interrupt!
  • Spin_lock() spins until the lock is acquired

8
Correct Usage
/ Process context / spinlock_t lock foo()
spin_lock_irq(lock) spin_unlock_irq(lock)

/ Hardirq context / isr()
spin_lock(lock) spin_unlock(lock)
  • Rule used when interrupt is ON is not
    compatible with used in Hardirq context

9
Another Example
mutex_t m util() mutex_acquire(m)
mutex_release(m) isr() / Hardirq context
/ util()
mutexes cannot be used in IRQ context at all
10
CtxCheck Analysis Overview
  • Goal find out possible contexts at each program
    point
  • Whole program analysis flow-sensitive,
    inter-procedural, context-insensitive
  • Dataflow analysis for each function ? a summary
  • flags (in_irq, irqs_enabled) //
    preconditions
  • must_in_irq true/false/unknown
  • irqs_must_enabled t/f/u
  • action enable/disable/unchanged
  • Propagate among functions to compute fix point
  • Check results against a set of safety rules

11
Dataflow Analysis
  • Input flags summaries of other funcs called
  • Output action changes to other funcs
    preconditions
  • Transfer functions
  • Calls to other funcs apply actions
  • Primitives local_irq_disable(), spin_lock_irq()
  • Special IRQ_SAVE annotation for
    spin_lock_irqsave() etc.
  • Guards for if statements
  • E.g. if (in_interrupt()) else

12
Points-to Analysis for Function Ptrs
  • Need to resolve function pointers for dataflow
  • Found standard points-to analysis inadequate
  • Not accurate enough (tried One-Level-Flow Das
    00), or too slow
  • Observation function ptrs are simpler
  • No arithmetics, no dynamic allocation
  • Type-based support for idioms
  • Vtable-like structs associate funcs with fields
  • Similar for global arrays
  • Callbacks special annotations

13
Annotations and Assertions
  • Supports 15 annotations and 7 assertions
  • Seed the analysis (e.g. syscall entries)
  • Work around limitations of the analysis
  • Inference to lower burden
  • Examples
  • IRQ_ON/OFF
  • IN_IRQ/NOT_IN_IRQ
  • ACT_ENABLED/DISABLED/UNCHANGED
  • ASSERT_IRQ_ON

14
Checking the Linux Kernel
  • CtxCheck implementation 3000 lines of OCaml
  • Minimally configured 2.6.20.7 855K LOC
    preprocessed
  • SMP, networking stack, 32 NIC drivers, No FS
  • Changed 1792 lines (0.2), added 1277 annotations
  • Checking took 83 seconds
  • Annotation coverage

15
Example Output
3 callsites
16
Example Output cont.
17
Bugs Found
  • Wasnt expecting to find many bugs
  • 2.6.20.7 is a stable version
  • Lockdep has been merged for 6 months
  • After adding annotations (3 days of work), the
    report shows 81 warnings
  • 9 we believe are bugs
  • All are on error/side paths
  • 4 confirmed (fixed in 2.6.11), all in qla3xxx.c
  • Forget to unlock on error path

18
OS Extension Safety
  • OS extensions are often buggier than hosts
  • Device drivers cause a large percentage of
    Windows crashes
  • Xbox hacked due to memory bugs in games
  • Separate hardware protection domains Nooks
    Swift et al, L4 LeVasseur et al, Xen Fraser
    et al
  • Relatively high overhead due to cross-domain
    calls, system specific
  • Binary instrumentation SFI Wahbe et al,
    Small/Seltzer

19
A Language-Based Approach to Extension Safety
  • Light annotations in extension code and host API
  • Buffer bounds, non-null pointers, nullterm
    strings, tagged unions
  • Deputy src-to-src compiler emits safety checks
    when necessary
  • Key compatible extension-host binary interface
  • Runtime tracks resource usage and restores system
    invariants at fail time

Annot.Source
Deputy
C w/ checks
GCC
DriverModule
SafeDrive Runtime Recovery
Linux Kernel
Kernel Address Space
20
Deputy Bounds Annotations
  • struct
  • unsigned int len
  • int count(len) data
  • x
  • for(i 0 i lt x.len i)
  • if (ilt0igtx.len) abort()
  • x.datai
  • Annotations use existing bounds info in programs,
    or constants
  • Compiler emits runtime checks
  • No memory layout change? Can be applied to one
    extension a time
  • Many checks can be optimized away

21
Deputy Features
  • Bounds safe,count(n), bound(lo,hi)
  • Default safe
  • Other annotations
  • Null terminated string/buffer
  • Tagged unions
  • Open arrays
  • Checks for printf() arguments
  • Automatic bounds variables for local variables?
    reduced annotation burden

22
Deputy Guarantees
  • Deputy guarantees type-safety if,
  • Programmer correctly annotates globals and
    function parameters used by the extension
  • Deallocation does not create dangling pointers
  • Trusted casts are correct
  • External modules / trusted code establish and
    preserve Deputy annotations

23
Failure Handling
  • Everything runs inside the same protection domain
  • After Deputy check failure could just halt
  • More useful clean-up extension and let host
    continue
  • Assumption restarts should fix most transient
    failures

Annot.Driver
Deputy
C w/ checks
GCC
DriverModule
SafeDrive Runtime Recovery
Linux Kernel
Kernel Address Space
24
Update Tracking and Restarts
  • Free resources and undo state changes done by
    driver
  • Kernel API functions wrapped to do update
    tracking
  • Compensations spin_lock(l) vs. spin_unlock(l)
  • After failure, undo updates in LIFO order
  • Then restart driver

Annot.Driver
Deputy
C w/ checks
GCC
DriverModule
Recovery
UpdateTracking
Wrappers
Linux Kernel
Kernel Address Space
25
Return Gracefully from Failure
  • Invariants
  • No driver code is executed after failure

Kernel foo()
Driver bar1()
Driver bar2()
Err code
26
Return Gracefully from Failure
  • Invariants
  • No driver code is executed after failure
  • No kernel function is forced to return early

Kernel foo1()
Driver bar1()
Kernel foo2()
Driver bar2()
lock()
unlock()
27
Discussion
  • Compared to Nooks
  • Significantly less interception ? Much simpler
    overall
  • Deputy does fine-grained per-allocation checks ?
    No separate heap/stack
  • No help from virtual memory hardware
  • Works for user-level applications and safe
    languages

28
Implementation
  • Deputy compiler 20K lines of OCaml
  • Kernel patch to 2.6.15.5 1K lines
  • Kernel headers patch 1.9K lines
  • Patch for 6 drivers in 4 categories
  • Network e1000, tg3
  • USB usb-storage
  • Sound intel8x0, emu10k1
  • Video nvidia

29
Evaluation Recovery Rate
  • Inject random errors with compile-time injection
    5 errors from one of 7 categories each time
  • Faults chosen following empirical studies
    Sullivan Chillarege 91, Christmansson
    Chillarege 96
  • Scan overrun, loop fault, corrupt parameter,
    off-by-one, flipped condition, missing call,
    missing assignment
  • Load buggy e1000 driver w/ and w/o SafeDrive
  • Exercise by downloading a 89MB file, verifying it
    and unloading the driver
  • Then rerun with original driver

30
Recovery Rate Results
  • 140 runs, 20 per fault category
  • SafeDrive is effective at detecting and
    recovering from crashing problems, and can detect
    some statically.

31
Annotation Burden
  • 1-4 of lines with Deputy annotations
  • Recovery wrappers can be automatically generated

32
Annotations Break-down
  • Common reasons for trusted casts and trusted code
  • Polymorphic private data, e.g. netdev-gtpriv
  • Small number of cases where buffer bounds are not
    available
  • Code manipulating pointer values directly, e.g.
    PTR_ERR(x)

33
Performance
e1000 TCP recv e1000 UDP recv e1000 TCP
send e1000 UDP send tg3 TCP recv tg3 TCP
send usb-storage untar emu10k aplay intel8x0
aplay nvidia xinit
  • Nooks (Linux 2.4) e1000 TCP recv 46 (vs. 4),
    e1000 TCP send
    111 (vs. 12)

34
Conclusion
  • We discussed two case studies of using program
    analysis to improve OS dependability
  • It is feasible to understand complicated issues
    (concurrency) and provide guarantees (memory
    safety) for large systems like Linux
  • Sound static analysis tools have the benefit of
    better coverage than pure runtime tools
  • A holistic approach combining the language, tool,
    runtime and OS code itself is a productive way in
    achieving these
Write a Comment
User Comments (0)
About PowerShow.com