Title: Iterative Context Bounding for Systematic Testing of Multithreaded Programs
1Iterative Context Bounding for Systematic
Testing of Multithreaded Programs
- Madan Musuvathi
- Shaz Qadeer
- Microsoft Research
2Testing multithreaded programs is HARD
- Specific thread interleavings expose subtle
errors - Testing often misses these errors
- Even when found, errors are hard to debug
- No repeatable trace
- Source of the bug is far away from where it
manifests
3Current practice
- Concurrency testing Stress testing
- Example testing a concurrent queue
- Create 100 threads performing queue operations
- Run for days/weeks
- Pepper the code with sleep( random() )
- Stress increases the likelihood of rare
interleavings - Makes any error found hard to debug
4CHESS Unit testing for concurrency
- Example testing a concurrent queue
- Create 1 reader thread and 1 writer thread
- Exhaustively try all thread interleavings
- Run the test repeatedly on a specialized
scheduler - Explore a different thread interleaving each time
- Use model checking techniques to avoid redundancy
- Check for assertions and deadlocks in every run
- The error-trace is repeatable
5State space explosion
Thread 1
Thread 2
x 1 y 1
x 2 y 2
Init state x 0, y 0
0,0
1,0
2,0
x 1
1,1
2,0
2,2
1,0
y 1
2,1
2,1
1,2
1,2
x 2
2,2
1,1
y 2
1,1
1,2
1,1
2,2
2,1
2,2
6State space explosion
Thread 1
Thread n
- Number of executions
- O( nnk )
- Exponential in both n and k
- Typically n lt 10 k gt 100
- Limits scalability to large programs (large k)
x 2 y 2
x 1 y 1
k steps each
n threads
7Techniques
-
- Iterative context bounding
- Strategy for searching large state spaces
- State space optimization
- Reduces the size of the state space
8Iterative context bounding
- Prioritize executions with small number of
preemptions - Two kinds of context switches
- Preemptions forced by the scheduler
- e.g. Time-slice expiration
- Non-preemptions a thread voluntarily yields
- e.g. Blocking on an unavailable lock, thread end
Thread 1
Thread 2
x 1 if (p ! 0) x p-gtf
x 1 if (p ! 0)
p 0
preemption
x p-gtf
non-preemption
9Iterative context-bounding algorithm
- The scheduler has a budget of c preemptions
- Nondeterministically choose the preemption points
- Resort to non-preemptive scheduling after c
preemptions - Run each thread to the next yield point
- Once all executions explored with c preemptions
- Try with c1 preemptions
- Iterative context-bounding has desirable
properties - Property 0 Easy to implement
10Property 1 Polynomial state space
- n threads, k steps each, c preemptions
- Number of executions lt nkCc . (nc)!
-
O( (n2k)c. n! ) - Exponential in n
and c, but not in k
Thread 1
Thread 2
- Choose c preemption points
x 1 y 1
x 2 y 2
x 1
x 2
y 1
y 2
11Property 2 Deep exploration possible with small
bounds
- A context-bounded execution has unbounded depth
- A thread may execute unbounded number of steps
within each context - Can reach a terminating state from an arbitrary
state with zero preemptions - Perform non-preemptive scheduling
- Leave the number of non-preemptions unbounded
12Property 3 Coverage metric
- If search terminates with c preemptions,
- any remaining error must require at least c1
preemptions - Intuitive estimate for
- the complexity of the bugs remaining in the
program - the chance of their occurrence in practice
13Property 4 Finds the simplest error trace
- Finds the smallest number of preemptions to the
error - Number of preemptions better metric of error
complexity than execution length
14Property 5 Lots of bugs with small number of
preemptions
Program KLOC Max Num Threads Bugs Reachable with Preemption Count Bugs Reachable with Preemption Count Bugs Reachable with Preemption Count Bugs Reachable with Preemption Count Bugs Reachable with Preemption Count
Program KLOC Max Num Threads 0 1 2 3 Total
Bluetooth 0.4 3 0 1 0 0 1
Work-Stealing Queue 1.3 3 0 1 2 0 3
Transaction Manager 7.0 2 0 0 2 1 3
APE 18.9 4 2 1 1 - 4
Dryad Channels 16.0 5 1 5 1 - 7
15Most states are covered with small number of
preemptions
16Coverage vs Time (Dryad)
17Techniques
-
- Iterative context-bounding
- Strategy for searching large state spaces
- State space optimization
18Optimization for race-free programs
- Insert context-switches only at synchronization
points - Massive state-space reduction
- Num steps (k) num synch. operations (not memory
accesses) - Run data-race detection to check race-free
assumption - Goldilocks algorithm PLDI 07 implemented for
x86 - Theorem When search terminates for context-bound
c - Either find an erroneous execution
- Or find a data-race
- Or the program has no errors reachable with c
preemptions
19Conclusion
- Iterative context-bounding algorithm
- Effective search strategy for multi-threaded bugs
- Exposes many concurrency bugs
- Implemented in the CHESS model checking tool
- Applying CHESS to Windows drivers, SQL, Cosmos,
Singularity - Visit http//research.microsoft.com/projects/CHESS
/
20Extra Slides
21Partial-order reduction
- Many thread interleavings are equivalent
- Accesses to separate memory locations by
different threads can be reordered - Avoid exploring equivalent thread interleavings
T1 x 1
T2 y 2
T2 y 2
T1 x 1
22Optimistic dynamic partial-order reduction
- Algorithm Bruening 99
- Assume the program is data-race free
- Context switch only at synchronization points
- Check for data-races in each execution
- Theorem Stoller 00
- If the algorithm terminates without reporting
races - Then the program has no assertion failures
- Massive reduction
- k number of synchronization accesses (not
memory accesses)
23Combining with context-bounding
- Algorithm
- Assume the program is data-race free
- Context switch only at synchronization points
- Explore executions with c preemptions
- Check for data-races in each execution
- Theorem
- If the algorithm terminates without reporting
races, - Then the program has no assertion failures
reachable with c preemptions - Requires that a thread can block only at
synchronization points