Title: Survey of Race Condition Analysis Techniques
1Survey of Race Condition Analysis Techniques
- Team Extremely Awesome
- Nels Beckman
- Project Presentation
- 17-654 Analysis of Software Artifacts
2A Goal-Based Literature Search
- This semester we explored many fundamental style
of software analysis. - How might each one be applied to the same goal?
- (Finding race conditions)
- Purpose
- Analyze strengths of different analysis styles
normalized to one defect type. - See how you might decide amongst different
techniques on a real project.
3What is a Race Condition?
- One Definition
- A race occurs when two threads can access (read
or write) a data variable simultaneously and at
least one of the two accesses is a write.
(Henzinger 04) - Note
- Locks not specifically mentioned.
4Why Race Conditions?
- Race conditions are insidious bugs
- Can corrupt memory.
- Often not detected until later in execution.
- Appearance is non-deterministic.
- Difficult to reason about the interaction of
multiple threads. - My intuition?
- It should be relatively easy to ensure that I am
at least locking properly.
5But First Locking Discipline
- Mutual Exclusion Locking Discipline
- A programing discipline that will ensure an
absence of race conditions. - Requires a lock be held on every access to a
shared variable. - Not the only way to achieve freedom from races!
- See example, next slide.
- Some tools check MLD, not race safety.
6Example (Yu '05)
t
u
v
tFork(u)
tLock(a) tWrite(x) tUnlock(a)
uLock(a) uWrite(x) uUnlock(a)
tJoin(u) tWrite(x) tFork(v)
tLock(a) tWrite(x) tUnlock(a)
vLock(a) vWrite(x) vUnlock(a)
tJoin(v)
7Four Broad Analysis Types
- Type-Based Race Prevention
- Languages that cannot express racy programs.
- Dynamic Race Detectors
- Using instrumented code to detect races.
- Model-Checkers
- Searching for reachable race states.
- Flow-Based Race Detectors
- Of the style seen in this course.
8Dimensions of Comparison
- Ease of Use
- Annotations
- What is the associated burden with annotating the
code? - Expression
- Does tools restrict my ability to say what I
want? - Scalability
- Could this tool legitamately claim to work on a
large code base? - Soundness
- What level of assurance is provided?
- Precision
- Can I have confidence in the results?
9Type-Based Race Prevention
- Goal
- To prevent race conditions using the language
itself. - Method
- Encode locking discipline into language.
- Relate shared state and the locks that protect
them. - Use typing annotations.
- Recall ownership types this will seem familiar.
10Example Race-Free Cyclone
- To give a better feel, let's look at Cyclone.
- Other type-based systems are very similar.
11Example Race-Free Cyclone
- Things we want to express
- This lock protects this variable.
intl p1 new 42 intloc p2 new 43
12Example Race-Free Cyclone
- Things we want to express
- This lock protects this variable.
intl p1 new 42 intloc p2 new 43
Declares a variable of type an integer protected
by the lock named l.
13Example Race-Free Cyclone
- Things we want to express
- This lock protects this variable.
intl p1 new 42 intloc p2 new 43
(loc is a special lock name. It means this
variable is never shared.)
14Example Race-Free Cyclone
- Things we want to express
- This is a new lock.
let lkltlgt newlock()
15Example Race-Free Cyclone
- Things we want to express
- This is a new lock.
let lkltlgt newlock()
Variable name
16Example Race-Free Cyclone
- Things we want to express
- This is a new lock.
let lkltlgt newlock()
Lock type name
17Example Race-Free Cyclone
- Things we want to express
- This function should only be called when in
posession of this lock.
void incltlLUgt(intl pl) // blah blah
18Example Race-Free Cyclone
- Things we want to express
- This function should only be called when in
posession of this lock.
void incltlLUgt(intl pl) // blah blah
This can be ignored for now...
19Example Race-Free Cyclone
- Things we want to express
- This function should only be called when in
posession of this lock.
void incltlLUgt(intl pl) // blah blah
When passed an int whose protection lock is l...
20Example Race-Free Cyclone
- Things we want to express
- This function should only be called when in
posession of this lock.
void incltlLUgt(intl pl) // blah blah
The caller must already possess lock l...
21Example Race-Free Cyclone
- void incltlLUgt(intl pl)
- p p 1
-
- void inc2ltlLUgt(lock_tltlgt plk, intl p)
- sync(plk) inc(p)
-
- void f()
- let lkltlgt newlock()
- intl p1 new 42
- intloc p2 new 43
- spawn(g)
- inc2(lk, p1)
- inc2(nonlock, p2)
22Example Race-Free Cyclone
- void incltlLUgt(intl pl)
- p p 1
-
- void inc2ltlLUgt(lock_tltlgt plk, intl p)
- sync(plk) inc(p)
-
- void f()
- let lkltlgt newlock()
- intl p1 new 42
- intloc p2 new 43
- spawn(g)
- inc2(lk, p1)
- inc2(nonlock, p2)
It would be a type error to call inc without
possessing the lock for the first argument.
23Example Race-Free Cyclone
- void incltlLUgt(intl p)
- p p 1
-
- void inc2ltlLUgt(lock_tltlgt plk, intl p)
- sync(plk) inc(p)
-
- void f()
- let lkltlgt newlock()
- intl p1 new 42
- intloc p2 new 43
- spawn(g)
- inc2(lk, p1)
- inc2(nonlock, p2)
Imagine if the effects clause were empty...
24Example Race-Free Cyclone
- void incltlLUgt(intl p)
- p p 1
-
- void inc2ltlLUgt(lock_tltlgt plk, intl p)
- sync(plk) inc(p)
-
- void f()
- let lkltlgt newlock()
- intl p1 new 42
- intloc p2 new 43
- spawn(g)
- inc2(lk, p1)
- inc2(nonlock, p2)
A dereference would also signal a compiler error,
since it is unprotected.
25Type-Based Race Prevention
- Positives
- Soundness
- Programs are race-free by construction.
- Familiarity
- Languages are usually based on well-known
languages. - Locking discipline is a very common paradigm.
- Relatively Expressive
- These type systems have been integrated with
polymorphism, object migration. - Classes can be parameterized by different locks
- Types Can Often be Inferred
- Intra-procedural (thanks to effects clauses)
26Type-Based Race Prevention
- Negatives
- Restrictive
- Not all race-free programs are legal.
- e.g. Object initialization, other forms of
syncrhonization (fork/join, etc.). - Annotation Burden
- Lots of annotations to write, even for non-shared
data. - Especially to make more complicate features, like
polymorphism, work. - Another Language
27Type-Based Race Prevention
- Open Research Questions
- Reduce Restrictions as Much as Possible
- Initialization phase
- Subclassing without run-time checks in OO
- Encoding of thread starts and stops
- Remove annotations for non-threaded code
28Type-Based Race Prevention
- Open Research Questions
- Personally, sceptical that inference can improve
a whole lot. - Programmer intent still must be specified somehow
in locking discipline. - But escape analysis could infer thread-locals.
29Dynamic Race Detectors
- Find race conditions by
- Instrumenting the source code.
- Running lockset and happens-before analyses.
- Lockset has no false-negatives.
- Happens-before has no false positives.
- Instrumented source code will be represented by
us. - We see all (inside the program)!
30Lockset Analysis
- Imagine were watching the program execute
... marbury 5 madison 5 makeStuffHappen() .
..
31Lockset Analysis
- Whenever a lock is acquired, add that to the set
of held locks.
... roe 5 wade 5 synchronize(my_object)
...
32Lockset Analysis
- Likewise, remove locks when they are released.
... brown 43 board yes // end synch ...
33Lockset Analysis
- The first time a variable is accessed, set its
candidate set to be the set of held locks.
... rob_frost false ...
34Lockset Analysis
- The next time that variable is accessed, take the
intersection of the candidate set and the set of
currently held locks
... if(!rob_frost) ...
n
35Lockset Analysis
- If the intersection is empty, flag a potential
race condition!
... if(!rob_frost) ...
n
36Happens-Before Analysis
- More complicated.
- Intuition
- Certain operations define an ordering between
operations of threads. - Establish thread counters to create a partial
ordering. - When a variable access occurs that cant
establish itself as being after the previous
one, we have detected an actual race.
37Happens-Before on our Example
t
u
1
tFork(u)
tLock(a) tWrite(x) tUnlock(a)
uLock(a) uWrite(x) uUnlock(a)
1
2
tJoin(u) tWrite(x) tFork(v)
38Happens-Before on our Example
t
u
1
tFork(u)
tLock(a) tWrite(x) tUnlock(a)
uLock(a) uWrite(x) uUnlock(a)
1
2
tJoin(u) tWrite(x) tFork(v)
Clock value.
39Happens-Before on our Example
t
u
1
tFork(u)
tLock(a) tWrite(x) tUnlock(a)
uLock(a) uWrite(x) uUnlock(a)
1
2
tJoin(u) tWrite(x) tFork(v)
Each variable stores the thread clock value for
the most recent access of each thread.
40Happens-Before on our Example
t
u
1
tFork(u)
tLock(a) tWrite(x) tUnlock(a)
uLock(a) uWrite(x) uUnlock(a)
1
2
tJoin(u) tWrite(x) tFork(v)
Also, threads learn about and store the clock
values of other threads through synchronization
activities.
41Happens-Before on our Example
t
u
1
tFork(u)
1
tLock(a) tWrite(x) tUnlock(a)
2
32
tJoin(u) tWrite(x) tFork(v)
If u were to go off, incrementing its count and
accessing variables, t would find out after the
join.
42Happens-Before on our Example
t
When an access does occur, it is a requirement
that for each previous thread access of x
ts knowledge of that threads time
xs knowledge of that
threads time
tJoin(u) tWrite(x) tFork(v)
43So, combining the two
- Modern dynamic race detectors use both
techniques. - Lockset analysis will detect any violation of
locking discipline. - This means we will get plenty of false positives
when strict locking discipline is not followed. - Simple requires less memory and fewer cycles.
44So, combining the two
- Modern dynamic race detectors use both
techniques. - Happens-Before will report actual race conditions
that were detected. - Extremely path sensitive.
- No false positives!
- False negatives can be a problem.
- High memory and CPU overhead.
- As we have seen, happens-before does not merely
enforce locking discipline. - Works when threads are ordered.
45So, combining the two
- Performance-wise
- Use lockset, then switch to happens-before for
variables where a race is detected. - Of course this is dynamic! No guarantee or
reoccurrence! - Similarly, modify detection granularity at
runtime.
46Future Research
- Use static tools to limit search space
- We can soundly approximate every location where
race might occur. - Performance improvements
- Could be used for in-field monitoring.
- Improve chances of HB hitting?
47Model-Checking for Race Conditons
- The Art of Model Checking
- Develop a model of your software system that can
be completely explored to find reachable error
states
48Model-Checking for Race Conditons
- Normally, scope of model determines whether or
not model checking is feasible. - Detailed model Model checking takes longer.
- Simple model Must be detailed enough to capture
principles of interest.
49Model-Checking for Race Conditons
- Model-checking concurrent programs is quite a
challenge - Take a large state space
- Add all possible thread interleavings
- Result Very large state space
- Details of specific models would be too muc to go
into
50Model-Checking for Race Conditons
- Strategies
- Persistent Sets
- Eliminate pointless thread interleavings
- Sometimes known as partial order reduction
- Contexts
- Represent every other thread with one abstract
state machine. - Like CEGAR, only refine as much as needed.
51Model-Checking for Race Conditons
- Ease of use?
- Annotations
- None
- Expression
- Some tools use model-checking to implement
lockset which does not allow much expression. - Others allow us to find actual race conditions!
- Scalability
- A Question Mark Is the state space small enough?
- Previous tools using partial order reduction have
been used on large software, not for races
52Model-Checking for Race Conditons
- Soundness?
- Yes, model-checking in this manner is sound, as
long as it terminates. - Precision?
- Depends on how your model is used.
- In one model lockset analysis is used. Tends to
be imprecise. - Another model directly searches for racy
states, which makes it very precise, but it
doesn't yet work in the presence of aliasing.
53Good 'ole Flow-Based Analysis
- Has been approached in a few ways
- Engineering Approach
- Sacrifice Soundness
- Increase Precision as Much as Possible
- Rank Results
- Use Heuristics and Good Judgement
- Think of PREfix or Coverity
- Rely on Alias Analysis
- Rely on Programmer Annotations
54Good 'ole Flow-Based Analysis
- Engineering Approach
- Start with interprocedural lockset analysis
- Make simple improvements
- use statistical analysis to computer the
probability that s ... similar to known locks. - realize that the first, last or only shared data
in a critical section are special. - if the number of distinct entry locksets in a
function exceeds a fixed limit we skip the
function - (Engler 03)
55Many Benefits
- Ease of Use?
- Annotations
- None or a constant number that give immidiate
precision improvements. - Expression
- Non-lock based idioms are 'hard-coded' by
heuristics. - Scalability
- More than any other.
- Linux, FreeBSD, Commercial OS
- 1.8MLOC in 2-14 minutes
56Many Benefits
- Soundness?
- Not sound in a few specific ways.
- Ability to detect some false negative.
- Precision?
- Fewer false positives than traditional lockset
tools. - 6 when run on Linux 2.5.
- 10s, 100s, 1000s in other static tools on smaller
applications.
57Other Flow-Based Tools
- Some Rely on Alias Analysis
- Limited by Current State-of-the-Art
- Still Many False Positives
- May not Scale
- Some Rely on Programmer Annotations to
distinguish all the hard cases - May impose programmer burden
58So, Lets Do a Final Comparison
59Annotations
- Type-Based Systems
- Annotations are a major limiting factor. They can
be inferred, but they must be understood by the
programmer. - Dynamic Tools
- Unnecessary
- Model-Checking
- Unnecessary
- Flow-Based Analysis
- Necessary in some form or another
60Expression
- Type-Based Systems
- Limited to strict locking discipline.
- Dynamic Tools
- Thanks to combination of lockset and
happens-before, relative freedom. - Model-Checking
- Can allow great expression (Depends on
technology). - Flow-Based Analysis
- Expression can be traded for soundness or
annotations.
61Scalability
- Type-Based Systems
- Scalability Limited by Annotations
- Dynamic Tools
- Getting better, but performance still a major
issue (1-3x mem. Usage, 1.5x CPU usage) - Model-Checking
- Not extremely scalable. Depends highly on number
of processes. - Flow-Based Analysis
- Has shown the best scalability.
62Soundness
- Type-Based Systems
- Sound
- Dynamic Tools
- Fundamentally unsound but lockset will catch
most possible races in execution. - Model-Checking
- Also sound. May not terminate.
- Flow-Based Analysis
- Different techniques trade soundness for
precision.
63Precision
- Type-Based Systems
- Low precision. Strict MLD.
- Dynamic Tools
- Better precision.
- Model-Checking
- Can be very high. Not complete (undecidability of
reachability). - Flow-Based Analysis
- High precision using an engineering approach.
64Questions