Title: Using The RaceFinder Tool PADTAD, Nice, April 03
1Using The RaceFinder ToolPADTAD, Nice, April 03
- Heuristics for Finding Concurrent Bugs
Yosi Ben-Asher, Yaniv Eytani and Eitan Farchi
2Outline
- Java's non-determinism and its implications
- RaceFinder overview
- Decision policies and scheduling heuristics
- Experimental results
3Why do we need concurrent methodologies and
tools?
- Server side concurrent programming is common
- Non-determinism is hard to analyze and debug
- Race conditions and deadlocks are common, yet
difficult to uncover - And often remain undetected until after the
product deployment
4The interleaving space
- Access to shared variables and synchronization
operations are called critical events - A thread schedule is determined by the order of
critical events occurrence - A possible thread scheduling is called an
interleaving - The set of all possible interleavings is called
the interleaving space
5An example of the interleaving space
- Initially x0
- thread1 thread2
thread3 -
- x x
x -
- Three threads are advancing x by 1, what are
the possible outcomes? -
- X1 or x2 or x3
6An example of the interleaving space (Continued)
Thread1 Thread2 Thread3
global x0 . . global x1 . . global
x2 . . global x3
read x0 local x write x1
read x1 local x write x2
read x2 local x write x3
7An example of the interleaving space (Continued)
thread1 thread2 thread3
global x0 . (global x0) global x1 global
x1 . . global x2
read x0 local x write x1
read x0 local x write x1
read x1 local x write x2
thread2 reads a stale copy of x
8Outline
- Java's non-determinism and its implications
- RaceFinder overview
- Decision policies and scheduling heuristics
- Experimental results
9Why use heuristic search?
- For a given functional test, the size of the set
of possible interleavings is unbounded - Only a fraction of the interleavings actually
produce race condition - Generate biased interleavings that produce race
conditions, revealing concurrent bugs with high
probability
10Runtime measures
- Total contention the maximal number of times a
shared variable was accessed in a previous run - Object contention the maximal number of classes
that accessed a shared variable in a previous run - Thread contention the maximal number of threads
that accessed a shared variable in a previous run
11How do we rank shared variables?
- One of the three measures are used to order
shared variables of the program under test - Variables that are more contended are ranked
higher - Noise (randomly introduced context switches) is
first applied to variables that have higher
ranking
12RaceFinder and Related work
- ConTest and rstest increase the probability that
concurrent bugs occur using un-biased heuristics - RaceFinder applies a biased approach which
- Increases the probability of such faults
occurring over un-biased heuristics - Applying noise on one variable was more effective
than applying noise on a subset of variables - The probability of observing the concurrent
faults without noise application is very low
13Less noise can be more effective
- Initially x0
- thread1 thread2
- x y1
- x
-
- Two threads are advancing x by one, thread2
also assigns one to y - A buggy run will result in x1 at the end!
Raced variable!
14Less noise can be more effective (continued)
Making noise Causes a context switch!
Thread1 Thread2
Global x0 . . Global x0 . . Global
x1 . . Global x2
Read X0 Local X Make noise Write x1 Make
noise
The noise on y cancel the effect of the noise
on x
Write Y1 Make noise Read X1 Local X Write x2
15Outline
- Java's non-determinism and its implications
- RaceFinder overview
- Decision policies and scheduling heuristics
- Experimental results
16What is a decision policy?
- A decision policy determines at each critical
event if to cause a context switch using yield()
or wait() primitives - A decision policy is aimed at changing the
runtime interleaving of the program so that the
bug manifests - As a result additional runs introduce new
interelavings
17Contention based decision policies
- One characteristic of a data race is multiple
accesses to a global shared variable - RaceFinder guesses there is a data race if
contention is high - If there is a data race on a shared variable,
changing the timing of shared variable access
increases the probability of bug manifestation
18Contention based decision policies (continued)
- The decision procedure
- Focuses on the variable that has the highest
contention - Execute seeded primitives at critical events
related to that variable - This procedure is repeated with the variable
ranked second in the contention list and so forth
19Scheduling heuristics
- The decision policy identifies a contended
variable - Thread barriers or context switches are
introduced before and/or after accessing that
variable - This increases contention over the shared
variable and uncovers non-atomic code blocks
20The yield() scheduling heuristic
- At a critical event chosen by the decision
policy, the heuristic randomly chooses if to
execute the yield() primitive - Each run context switches occur in different
critical events generating new interleaving - Applying noise to the set of all critical events
is called White Noise
21Crawler bug example
The programmer considered this line as an
atomic code block
- A method in Thread1 has the following line
-
- if(connection ! null) connection.setStopFlag( )
- Thread2 has the assignment connection null
- Interleaving
- Thread1 if(connection ! null)
- Yield ()
- Thread2 connection null
- Yield ()
- Thread1 connection.setStopFlag( )
Yield() heuristic causes context switches
java.lang.NullPointerException
22The barrier scheduling heuristic
- The barrier is implemented by using a counting
semaphore - The semaphore causes threads to wait just before
the shared variable is accessed - When more than one thread is waiting, then
notifyAll() is used to simultaneously advance the
waiting threads - Thus, threads access the variable simultaneously
and the probability of the bug occurring increases
23Barrier example
thread1 thread2 thread3
global x0 . . global x0 . . global
x0 . global x1
read X0 local X write x1
read X0 local X write x1
A barrier Delays writing Local x into Global
x
read X0 local X write x1
24Outline
- Java's non-determinism and its implications
- RaceFinder overview
- Decision policies and scheduling heuristics
- Experimental results
25Experimental results
- We compare white noise, yield() and barrier
heuristics - A set of programs containing race related bugs
are used. Some are industrial examples and some
are known examples from the literature - Each program has a functional test that is run
many times (about 1000) using one of the three
heuristics - As the race condition for each program in the
experiment set is known, we could easily check if
the decision policy "chose" the right variable
26Raced variable!
Raced variable ranks highest in total and thread
contention!
27Heuristics Barrier gt Yield gt White noise
Yield noise level high gt med gt low
28Conclusion
- A two-level scheme (choosing a variable and
applying noise on it) that increases the
probability of manifesting race related bugs is
developed - Whenever the raced variable is focused on by the
biased decision policy, the barrier scheduling
heuristic produces the bug more often than the
yield() scheduling heuristic - Both scheduling heuristics produce the bug more
often than unbiased white noise - Biased heuristics are better in finding
concurrent bugs than un-biased heuristics!
29