Title: Efficient OntheFly Data Race Detection in Multithreaded C Programs
1Efficient On-the-FlyData Race Detection
inMultithreaded C Programs
- Eli Pozniansky Assaf Schuster
2Table of Contents
- What is a Data-Race?
- Why Data-Races are Undesired?
- How Data-Races Can be Prevented?
- Can Data-Races be Easily Detected?
- Feasible and Apparent Data-Races
- Complexity of Data-Race Detection
3Table of ContentsCont.
- Approaches to Detection of Apparent Data-Races
- Static Methods
- Dynamic Methods
- Post-Mortem Methods
- On-The-Fly Methods
4Table of ContentsCont.
- Closer Look at Dynamic Methods
- DJIT
- Lockset
- Results
- Summary Conclusions
- Future Ongoing work
- References
5What is a Data Race?
- Two concurrent accesses to a shared location, at
least one of them for writing. - Indicative of a bug
Thread 1 Thread
2 X TY Z2 TX
6Why Data-Races areUndesired?
- Programs which contain data-races usually
demonstrate unexpected and even non-deterministic
behavior. - The outcome might depend on specific execution
order (A.K.A threads interleaving). - Re-running the program may not always produce the
same results. - Thus, hard to debug and hard to write correct
programs.
7Why Data-Races areUndesired? - Example
- First Interleaving Thread 1 Thread 2
- 1. X0
- 2. TX
- 3. X
- Second Interleaving Thread 1 Thread 2
- 1. X0
- 2. X
- 3. TX
- T0 or T1?
8How Data-Races Can be Prevented? Explicit
Synchronization
- Idea In order to prevent undesired concurrent
accesses to shared locations, we must explicitly
synchronize between threads. - The means for explicit synchronization are
- Locks, Mutexes and Critical Sections
- Barriers
- Binary Semaphores and Counting Semaphores
- Monitors
- Single-Writer/Multiple-Readers (SWMR) Locks
- Others
9Synchronization Bad Bank Account Example
- Thread 1 Thread 2
- Deposit( amount ) Withdraw( amount )
- balanceamount if (balanceltamount)
- print( Error )
- else
- balanceamount
-
- Deposit and Withdraw are not atomic!!!
- What is the final balance after a series of
concurrent deposits and withdraws?
10Synchronization Good Bank Account Example
- Thread 1 Thread 2
- Deposit( amount ) Withdraw( amount )
- Lock( m ) Lock( m )
- balanceamount if (balanceltamount)
- Unlock( m ) print( Error )
- else
- balanceamount
- Unlock( m )
-
- Since critical sections can never execute
concurrently, this version exhibits no data-races.
Critical Sections
11Is This Sufficient?
- Yes!
- No!
- Programmer dependent
- Correctness programmer may forget to synch
- Need tools to detect data races
- Expensive
- Efficiency to achieve correctness, programmer
may overdo. - Need tools to remove excessive synchs
12Can Data-Races be Easily Detected? No!
- Unfortunately, deciding if a given program
contains potential data-races is computationally
hard!!! - There are a lot of execution orders. For t
threads of n instructions each the number of
possible orders is about tnt. - In addition to all different schedulings, all
possible inputs should be tested as well. - inserting a detection code in a program can
change its execution schedule enough to make all
errors disappear.
13Feasible Data-Races
- Feasible Data-Races races that are based on the
possible behavior of the program (i.e. semantics
of the programs computation). - These are the actual (!) data-races that can
possibly happen in any specific execution. - Locating feasible data-races requires full
analyzing of the programs semantics to determine
if the execution could have allowed a and b
(accesses to same shared variable) to execute
concurrently.
14Apparent Data-Races
- Apparent Data-Races approximations (!) of
feasible data-races that are based on only the
behavior of the explicit synchronization
performed by some feasible execution (and not the
semantics of the programs computation, i.e.
ignoring all conditional statements). - Important, since data-races are usually a result
of improper synchronization. Thus easier to
detect, but less accurate.
15Apparent Data-Races Cont.
- For example, a and b, accesses to same shared
variable in some execution, are said to be
ordered, if there is a chain of corresponding
explicit synchronization events between them. - Similarly, a and b are said to have potentially
executed concurrently if no explicit
synchronization prevented them from doing so.
16Feasible vs. ApparentExample 1
- Thread 1 F?false Thread 2
- X
- Ftrue
- if (Ftrue)
- X
- Apparent data-races in the execution above 1
2. - Feasible data-races 1 only!!! No feasible
execution exists, in which X-- is performed
before X (suppose F is false at start). - Protecting F only will protect X as well.
1
2
17Feasible vs. Apparent Example 2
- Thread 1 F?false Thread 2
- X Lock( m )
- Lock( m ) T F
- Ftrue Unlock( m )
- Unlock( m ) if (Ttrue)
- X
- No feasible or apparent data-races exist under
any execution order!!! - F is protected by a lock. Also X and X
are always ordered and properly synchronized. - Rather there is a sync chain of
Unlock(m)-Lock(m) between X and X , or
only X executes.
18Complexity ofData-Race Detection
- Exactly locating the feasible data-races is an
NP-hard problem. Thus, the apparent races, which
are simpler to locate, must be detected for
debugging. - Fortunately, apparent data-races exist if and
only if at least one feasible data-race exists
somewhere in the execution. - Yet, the problem of exhaustively locating all
apparent data-races still remains NP-hard.
19Why Data-Race Detectionis NP-Hard?
- How can we know that in a program P two accesses,
a and b, to the same shared variable are
concurrent? - Intuitively we must check all execution orders
of P and see. If we discover an execution order,
in which a and b are concurrent, we can report on
data-race and stop. Otherwise we should continue
checking.
20Approaches to Detection ofApparent Data-Races
Static
- There are two main approaches to detection of
apparent data-races (sometimes a combination of
both is used) - Static Methods perform a compile-time analysis
of the code. - Too conservative. Cant know or understand the
semantics of the program. Result in excessive
number of false alarms that hide the real
data-races. - Test the program globally see the full code
of the tested program and can warn about all
possible errors in all possible executions.
21Approaches to Detection ofApparent Data-Races
Dynamic
- Dynamic Methods use tracing mechanism to detect
whether a particular execution of a program
actually exhibited data-races. - Detect only those apparent data-races that
occur during a feasible execution. - Test the program locally - consider only one
specific execution path of the program each time. - Post-Mortem Methods after the execution
terminates, analyze the trace of the run and warn
about possible data-races that were found. - On-The-Fly Methods buffer partial trace
information in memory, analyze it and detect
races as they occur.
22MultiRace Approach
- On-the-fly detection of apparent data races
- Two detection algorithms (improved versions)
- Lockset Savage, Burrows, Nelson, Sobalvarro,
Anderson 97 - Djit Itzkovitz, Schuster, Zeev-ben-Mordechai
99 - Correct even for weak memory systems ?
- Flexible detection granularity
- Variables and Objects
- Especially suited for OO programming languages
- Source-code (C) instrumentation Memory
mappings - Transparent ?
- Low overhead ?
23Where is Waldo?
- define N 100
- Type g_stack new TypeN
- int g_counter 0
- Lock g_lock
- void push( Type obj )lock(g_lock)...unlock(g_lo
ck) - void pop( Type obj ) lock(g_lock)...unlock(g_lo
ck) - void popAll( )
- lock(g_lock)
- delete g_stack
- g_stack new TypeN
- g_counter 0
- unlock(g_lock)
-
- int find( Type obj, int number )
- lock(g_lock)
- for (int i 0 i lt number i)
- if (obj g_stacki) break // Found!!!
- if (i number) i -1 // Not found Return
-1 to caller
24Can You Find the Race?
- define N 100
- Type g_stack new TypeN
- int g_counter 0
- Lock g_lock
- void push( Type obj )lock(g_lock)...unlock(g_lo
ck) - void pop( Type obj ) lock(g_lock)...unlock(g_lo
ck) - void popAll( )
- lock(g_lock)
- delete g_stack
- g_stack new TypeN
- g_counter 0
- unlock(g_lock)
-
- int find( Type obj, int number )
- lock(g_lock)
- for (int i 0 i lt number i)
- if (obj g_stacki) break // Found!!!
- if (i number) i -1 // Not found Return
-1 to caller
Similar problem was found in java.util.Vector
write
read
25Apparent Data Races
- Based only the behavior of the explicit synch
- not on program semantics
- Easier to locate
- Less accurate
- Exist iff real (feasible) data race exist ?
- Detection is still NP-hard ?
26Lamports Happens-Before Partial Order
- The happens-before partial order, denoted hb?, is
defined for access events (reads, writes,
releases and acquires) that happen in a specific
execution, as follows - Program Order If a and b are events performed by
the same thread, with a preceding b in program
order, then a hb? b. - Release and Acquire Let a be a release and b be
an acquire. If a and b take part in the same
synchronization event, then a hb? b. - Transitivity If a hb? b and b hb? c, then a hb?
c. - Shared accesses a and b are concurrent, a hb? b,
if neither a hb? b nor b hb? a holds.
27Djit Itskovitz et.al. 1999Apparent Data Races
- Based on Lamports happens-before partial order
- a,b concurrent if neither a hb? b nor b hb? a
- ? Apparent data race
- Otherwise, they are synchronized
- Djit basic idea check each access performed
against all previously performed accesses
a hb? b
28DjitLocal Time Frames (LTF)
- The execution of each thread is split into a
sequence of time frames. - A new time frame starts on each unlock.
- For every access there is a timestamp a vector
of LTFs known to the thread at the moment the
access takes place
29DjitLocal Time Frames (LTF)
- A vector ltft. for each thread t
- ltftt is the LTF of thread t
- ltftu stores the latest LTF of u known to t
- If u is an acquirer of ts unlock
- for k0 to maxthreads-1
- ltfuk max( ltfuk, ltftk )
30DjitVector Time Frames Example
write x
unlock( m1 )
read z
(2 1 1)
lock( m1 )
read y
(2 1 1)
unlock( m2 )
write x
(2 2 1)
lock( m2 )
write x
(2 2 1)
31Realizing the hb? relation
- a is access to thread ta at time frame Ta and b
is an access to thread tb (!ta) - a hb? b iff at the moment that b occurs Ta lt
ltftbta - There exists a chain of releases and
corresponding acquires and ta local time frame
propagates to tb
32 DjitChecking Concurrency
- P(a,b) ? ( a.type write ? b.type write ) ?
- ? ( a.ltf b.timestampa.thread_id )
- a was logged earlier than b.
- P returns TRUE iff a and b are racing.
Problem Too much logging, too
many checks.
33DjitWhich Accesses to Check?
- a in thread t1, and b and c in thread t2 in same
ltf - b precedes c in the program order.
- If a and b are synchronized, then a and c are
synchronized as well.
b
c No logging
? It is sufficient to record only the first read
access and the first write access to a variable
in each ltf.
a
No logging
race
34Djit Which LTFs to Check?
- a occurs in t1
- b and c previously occur in t2
- If a is synchronized with c then it must also be
synchronized with b.
? It is sufficient to check a current access
with the most recent accesses in each of the
other threads.
35DjitAccess History
- For every variable v for each of the threads
- The last ltf in which the thread read from v
- The last ltf in which the thread wrote to v
- On each first read and first write to v in a ltf
every thread updates the access history of v - If the access to v is a read, the thread checks
all recent writes by other threads to v - If the access is a write, the thread checks all
recent reads as well as all recent writes by
other threads to v
36Djit Pros and Cons
- ? No false alarms
- ? No missed races (in a given scheduling)
- ? Very sensitive to differences in scheduling
- ? Requires enormous number of runs. Yet cannot
prove tested program is race free. - Can be extended to support other synchronization
primitives, like barriers and counting semaphores - Correct on relaxed memory systems AdveHill,
1990 data-race-free-1
37LocksetThe Basic Algorithm
- C(v) set of locks that protected all accesses tov
so far - locks_held(t) set of currently acquired locks
- Algorithm
- - For each v, init C(v) to the set of all
possible locks - - On each access to v by thread t
- - lhv?locks_held(t)
- - if it is a read, then lhv?lhv ?
readers_lock - - C(v)?C(v) n lhv
- - if C(v)Ø, issue a warning
38LocksetExample
lock( m1 )
read v
m1,r_l
m1,r_l
Warning locking discipline for v is Violated !!!
unlock( m1 )
lock( m2 )
write v
m2
unlock( m2 )
39Lockset Savage et.al. 1997
- Locking discipline every shared location is
consistently protected by a lock. - Lockset detects violations of this locking
discipline.
Warning locking Discipline for v is Violated
40Lockset vs. Djit
1 hb? 2, yet there might be a data race on y
under a different scheduling ? the locking
discipline is violated
41Lockset Which Accesses to Check?
- a and b in same thread, same time frame, a
precedes b, then Locksa(v) ? Locksb(v) - Locksu(v) is set of locks held during access u to
v.
? Only first accesses need be checked in every
time frame
? Lockset can use same logging (access history)
as DJIT
42LocksetPros and Cons
- ? Less sensitive to scheduling
- ? Detects a superset of all apparently raced
locations in an execution of a program - races cannot be missed
- ? Lots of false alarms
- ? Still dependent on scheduling
- cannot prove tested program is race free
43Combining Djit and Lockset
- Lockset can detect suspected races in more
execution orders - Djit can filter out the spurious warnings
reported by Lockset - Lockset can help reduce number of checks
performed by Djit - If C(v) is not empty yet, Djit should not check
v for races - The implementation overhead comes mainly from the
access logging mechanism - Can be shared by the algorithms
44Disabling Detection
- Obviously, Lockset can report false alarms.
- Also Djit detects apparent races that are not
necessarily feasible races - Intentional races
- Unrefined granularity
- Private synchronization
- Detection can be disabled through the use of
source code annotations.
45Implementing Access Logging
- In order to record only the first accesses (reads
and writes) to shared locations in each of the
time frames, we use the concept of views. - A view is a region in virtual memory.
- Each view has its own protection NoAccess /
ReadOnly / ReadWrite. - Each shared object in physical memory can be
accessed through each of the three views. - Helps to distinguish between reads and writes.
- Enables the realization of the dynamic detection
unit and avoids false sharing problem.
46Implementing Access LoggingRecording First LTF
Accesses
- An access attempt with wrong permissions
generates a fault - The fault handler activates the logging and the
detection mechanisms, and switches views
47Swizzling Between Views
unlock(m)
read fault
read x
write fault
write x
unlock(m)
write fault
write x
48Minipages and Dynamic Granularity of Detection
- Minipage is a shared location that can be
accessed using the approach of views. - We detect races on minipages and not on fixed
number of bytes. - Each minipage is associated with the access
history of Djit and Lockset state. - The size of a minipage can vary.
49Detection Granularity
- A minipage ( detection unit) can contain
- Objects of primitive types char, int, double,
etc. - Objects of complex types classes and structures
- Entire arrays of complex or primitive types
- An array can be placed on a single minipage or
split across several minipages. - Array still occupies contiguous addresses.
50Playing with Detection Granularity to Reduce
Overhead
- Larger minipages ? reduced overhead
- Less faults
- A minipage should be refined into smaller
minipages when suspicious alarms occur - Replay technology can help (if available)
- When suspicion resolved regroup
- May disable detection on the accesses involved
51Detection Granularity
52Overheads
- The overheads are steady for 1-4 threads we are
scalable in number of CPUs. - The overheads increase for high number of
threads. - Number of page faults (both read and write)
increases linearly with number of threads. - In fact, any on-the-fly tool for data race
detection will be unscalable in number of threads
when number of CPUs is fixed.
53Overheads
- The testing platform
- 4-way IBM Netfinity, 550 MHz
- 2GB RAM
- Microsoft Windows NT
54Overheads
55Benchmark Overheads (4-way IBM Netfinity server,
550MHz, Win-NT)
56Overhead Breakdown
- Numbers above bars are write/read faults.
- Most of the overhead come from page faults.
- Overhead due to detection algorithms is small.
57Breakdowns of Overheads
58Reporting Races in MultiRace
59SummaryMultiRace is
- Transparent
- Supports two-way and global synchronization
primitives locks and barriers - Detects races that actually occurred (Djit)
- Usually does not miss races that could occur with
a different scheduling (Lockset) - Correct for weak memory models
- Scalable
- Exhibits flexible detection granularity
60Conclusions
- MultiRace makes it easier for programmer to trust
his programs - No need to add synchronization just in case
- In case of doubt - MultiRace should be activated
each time the program executes
61Future/Ongoing work
- Implement instrumenting pre-compiler
- Higher transparency
- Higher scalability
- Automatic dynamic granularity adaptation
- Integrate with scheduling-generator
- Integrate with record/replay
- Integrate with the compiler/debugger
- May get rid of faults and views
- Optimizations through static analysis
- Implement extensions for semaphores and other
synch primitives - Etc.
62References
- T. Brecht and H. Sandhu. The Region Trap Library
Handling traps on application-defined regions of
memory. In USENIX Annual Technical Conference,
Monterey, CA, June 1999. - A. Itzkovitz, A. Schuster, and O.
Zeev-Ben-Mordechai. Towards Integration of Data
Race Detection in DSM System. In The Journal of
Parallel and Distributed Computing (JPDC), 59(2)
pp. 180-203, Nov. 1999 - L. Lamport. Time, Clock, and the Ordering of
Events in a Distributed System. In Communications
of the ACM, 21(7) pp. 558-565, Jul. 1978 - F. Mattern. Virtual Time and Global States of
Distributed Systems. In Parallel Distributed
Algorithms, pp. 215226, 1989.
63ReferencesCont.
- R. H. B. Netzer and B. P. Miller. What Are Race
Conditions? Some Issues and Formalizations. In
ACM Letters on Programming Languages and Systems,
1(1) pp. 74-88, Mar. 1992. - R. H. B. Netzer and B. P. Miller. On the
Complexity of Event Ordering for Shared-Memory
Parallel Program Executions. In 1990
International Conference on Parallel Processing,
2 pp. 9397, Aug. 1990 - R. H. B. Netzer and B. P. Miller. Detecting Data
Races in Parallel Program Executions. In Advances
in Languages and Compilers for Parallel
Processing, MIT Press 1991, pp. 109-129.
64ReferencesCont.
- S. Savage, M. Burrows, G. Nelson, P. Sobalvarro,
and T.E. Anderson. Eraser A Dynamic Data Race
Detector for Multithreaded Programs. In ACM
Transactions on Computer Systems, 15(4) pp.
391-411, 1997 - E. Pozniansky. Efficient On-the-Fly Data Race
Detection in Multithreaded C Programs. Research
Thesis. - O. Zeev-Ben-Mordehai. Efficient Integration of
On-The-Fly Data Race Detection in Distributed
Shared Memory and Symmetric Multiprocessor
Environments. Research Thesis, May 2001.
65The End