Efficient OntheFly Data Race Detection in Multithreaded C Programs

About This Presentation

Title:

Efficient OntheFly Data Race Detection in Multithreaded C Programs

Description:

First Interleaving: Thread 1 Thread 2. 1. X=0. 2. T=X. 3. X ... Static Methods perform a compile-time analysis of the code. Too conservative. ... – PowerPoint PPT presentation

Number of Views:77

Avg rating:3.0/5.0

Slides: 62

Provided by: elip

Category:

more less

Transcript and Presenter's Notes

Title: Efficient OntheFly Data Race Detection in Multithreaded C Programs

1
Efficient On-the-FlyData Race Detection
inMultithreaded C Programs

Eli Pozniansky Assaf Schuster

2
Table of Contents

What is a Data-Race?
Why Data-Races are Undesired?
How Data-Races Can be Prevented?
Can Data-Races be Easily Detected?
Feasible and Apparent Data-Races
Complexity of Data-Race Detection

3
Table of ContentsCont.

Approaches to Detection of Apparent Data-Races
Static Methods
Dynamic Methods
Post-Mortem Methods
On-The-Fly Methods

4
Table of ContentsCont.

Closer Look at Dynamic Methods
DJIT
Lockset
Results
Summary Conclusions
Future Ongoing work
References

5
What is a Data Race?

Two concurrent accesses to a shared location, at
least one of them for writing.
Indicative of a bug

Thread 1 Thread
2 X TY Z2 TX
6
Why Data-Races areUndesired?

Programs which contain data-races usually
demonstrate unexpected and even non-deterministic
behavior.
The outcome might depend on specific execution
order (A.K.A threads interleaving).
Re-running the program may not always produce the
same results.
Thus, hard to debug and hard to write correct
programs.

7
Why Data-Races areUndesired? - Example

First Interleaving Thread 1 Thread 2
1. X0
2. TX
3. X
Second Interleaving Thread 1 Thread 2
1. X0
2. X
3. TX
T0 or T1?

8
How Data-Races Can be Prevented? Explicit
Synchronization

Idea In order to prevent undesired concurrent
accesses to shared locations, we must explicitly
synchronize between threads.
The means for explicit synchronization are
Locks, Mutexes and Critical Sections
Barriers
Binary Semaphores and Counting Semaphores
Monitors
Single-Writer/Multiple-Readers (SWMR) Locks
Others

9
Synchronization Bad Bank Account Example

Thread 1 Thread 2
Deposit( amount ) Withdraw( amount )
balanceamount if (balanceltamount)
print( Error )
else
balanceamount
Deposit and Withdraw are not atomic!!!
What is the final balance after a series of
concurrent deposits and withdraws?

10
Synchronization Good Bank Account Example

Thread 1 Thread 2
Deposit( amount ) Withdraw( amount )
Lock( m ) Lock( m )
balanceamount if (balanceltamount)
Unlock( m ) print( Error )
else
balanceamount
Unlock( m )
Since critical sections can never execute
concurrently, this version exhibits no data-races.

Critical Sections
11
Is This Sufficient?

Yes!
No!
Programmer dependent
Correctness programmer may forget to synch
Need tools to detect data races
Expensive
Efficiency to achieve correctness, programmer
may overdo.
Need tools to remove excessive synchs

12
Can Data-Races be Easily Detected? No!

Unfortunately, deciding if a given program
contains potential data-races is computationally
hard!!!
There are a lot of execution orders. For t
threads of n instructions each the number of
possible orders is about tnt.
In addition to all different schedulings, all
possible inputs should be tested as well.
inserting a detection code in a program can
change its execution schedule enough to make all
errors disappear.

13
Feasible Data-Races

Feasible Data-Races races that are based on the
possible behavior of the program (i.e. semantics
of the programs computation).
These are the actual (!) data-races that can
possibly happen in any specific execution.
Locating feasible data-races requires full
analyzing of the programs semantics to determine
if the execution could have allowed a and b
(accesses to same shared variable) to execute
concurrently.

14
Apparent Data-Races

Apparent Data-Races approximations (!) of
feasible data-races that are based on only the
behavior of the explicit synchronization
performed by some feasible execution (and not the
semantics of the programs computation, i.e.
ignoring all conditional statements).
Important, since data-races are usually a result
of improper synchronization. Thus easier to
detect, but less accurate.

15
Apparent Data-Races Cont.

For example, a and b, accesses to same shared
variable in some execution, are said to be
ordered, if there is a chain of corresponding
explicit synchronization events between them.
Similarly, a and b are said to have potentially
executed concurrently if no explicit
synchronization prevented them from doing so.

16
Feasible vs. ApparentExample 1

Thread 1 F?false Thread 2
X
Ftrue
if (Ftrue)
X
Apparent data-races in the execution above 1
2.
Feasible data-races 1 only!!! No feasible
execution exists, in which X-- is performed
before X (suppose F is false at start).
Protecting F only will protect X as well.

1
2
17
Feasible vs. Apparent Example 2

Thread 1 F?false Thread 2
X Lock( m )
Lock( m ) T F
Ftrue Unlock( m )
Unlock( m ) if (Ttrue)
X
No feasible or apparent data-races exist under
any execution order!!!
F is protected by a lock. Also X and X
are always ordered and properly synchronized.
Rather there is a sync chain of
Unlock(m)-Lock(m) between X and X , or
only X executes.

18
Complexity ofData-Race Detection

Exactly locating the feasible data-races is an
NP-hard problem. Thus, the apparent races, which
are simpler to locate, must be detected for
debugging.
Fortunately, apparent data-races exist if and
only if at least one feasible data-race exists
somewhere in the execution.
Yet, the problem of exhaustively locating all
apparent data-races still remains NP-hard.

19
Why Data-Race Detectionis NP-Hard?

How can we know that in a program P two accesses,
a and b, to the same shared variable are
concurrent?
Intuitively we must check all execution orders
of P and see. If we discover an execution order,
in which a and b are concurrent, we can report on
data-race and stop. Otherwise we should continue
checking.

20
Approaches to Detection ofApparent Data-Races
Static

There are two main approaches to detection of
apparent data-races (sometimes a combination of
both is used)
Static Methods perform a compile-time analysis
of the code.
Too conservative. Cant know or understand the
semantics of the program. Result in excessive
number of false alarms that hide the real
data-races.
Test the program globally see the full code
of the tested program and can warn about all
possible errors in all possible executions.

21
Approaches to Detection ofApparent Data-Races
Dynamic

Dynamic Methods use tracing mechanism to detect
whether a particular execution of a program
actually exhibited data-races.
Detect only those apparent data-races that
occur during a feasible execution.
Test the program locally - consider only one
specific execution path of the program each time.
Post-Mortem Methods after the execution
terminates, analyze the trace of the run and warn
about possible data-races that were found.
On-The-Fly Methods buffer partial trace
information in memory, analyze it and detect
races as they occur.

22
MultiRace Approach

On-the-fly detection of apparent data races
Two detection algorithms (improved versions)
Lockset Savage, Burrows, Nelson, Sobalvarro,
Anderson 97
Djit Itzkovitz, Schuster, Zeev-ben-Mordechai
99
Correct even for weak memory systems ?
Flexible detection granularity
Variables and Objects
Especially suited for OO programming languages
Source-code (C) instrumentation Memory
mappings
Transparent ?
Low overhead ?

23
Where is Waldo?

define N 100
Type g_stack new TypeN
int g_counter 0
Lock g_lock
void push( Type obj )lock(g_lock)...unlock(g_lo
ck)
void pop( Type obj ) lock(g_lock)...unlock(g_lo
ck)
void popAll( )
lock(g_lock)
delete g_stack
g_stack new TypeN
g_counter 0
unlock(g_lock)
int find( Type obj, int number )
lock(g_lock)
for (int i 0 i lt number i)
if (obj g_stacki) break // Found!!!
if (i number) i -1 // Not found Return
-1 to caller

24
Can You Find the Race?

define N 100
Type g_stack new TypeN
int g_counter 0
Lock g_lock
void push( Type obj )lock(g_lock)...unlock(g_lo
ck)
void pop( Type obj ) lock(g_lock)...unlock(g_lo
ck)
void popAll( )
lock(g_lock)
delete g_stack
g_stack new TypeN
g_counter 0
unlock(g_lock)
int find( Type obj, int number )
lock(g_lock)
for (int i 0 i lt number i)
if (obj g_stacki) break // Found!!!
if (i number) i -1 // Not found Return
-1 to caller

Similar problem was found in java.util.Vector
write
read
25
Apparent Data Races

Based only the behavior of the explicit synch
not on program semantics
Easier to locate
Less accurate
Exist iff real (feasible) data race exist ?
Detection is still NP-hard ?

26
Lamports Happens-Before Partial Order

The happens-before partial order, denoted hb?, is
defined for access events (reads, writes,
releases and acquires) that happen in a specific
execution, as follows
Program Order If a and b are events performed by
the same thread, with a preceding b in program
order, then a hb? b.
Release and Acquire Let a be a release and b be
an acquire. If a and b take part in the same
synchronization event, then a hb? b.
Transitivity If a hb? b and b hb? c, then a hb?
c.
Shared accesses a and b are concurrent, a hb? b,
if neither a hb? b nor b hb? a holds.

27
Djit Itskovitz et.al. 1999Apparent Data Races

Based on Lamports happens-before partial order
a,b concurrent if neither a hb? b nor b hb? a
? Apparent data race
Otherwise, they are synchronized
Djit basic idea check each access performed
against all previously performed accesses

a hb? b
28
DjitLocal Time Frames (LTF)

The execution of each thread is split into a
sequence of time frames.
A new time frame starts on each unlock.
For every access there is a timestamp a vector
of LTFs known to the thread at the moment the
access takes place

29
DjitLocal Time Frames (LTF)

A vector ltft. for each thread t
ltftt is the LTF of thread t
ltftu stores the latest LTF of u known to t
If u is an acquirer of ts unlock
for k0 to maxthreads-1
ltfuk max( ltfuk, ltftk )

30
DjitVector Time Frames Example
write x
unlock( m1 )
read z
(2 1 1)
lock( m1 )
read y
(2 1 1)
unlock( m2 )
write x
(2 2 1)
lock( m2 )
write x
(2 2 1)
31
Realizing the hb? relation

a is access to thread ta at time frame Ta and b
is an access to thread tb (!ta)
a hb? b iff at the moment that b occurs Ta lt
ltftbta
There exists a chain of releases and
corresponding acquires and ta local time frame
propagates to tb

32
DjitChecking Concurrency

P(a,b) ? ( a.type write ? b.type write ) ?
? ( a.ltf b.timestampa.thread_id )

a was logged earlier than b.
P returns TRUE iff a and b are racing.

Problem Too much logging, too
many checks.
33
DjitWhich Accesses to Check?

a in thread t1, and b and c in thread t2 in same
ltf
b precedes c in the program order.
If a and b are synchronized, then a and c are
synchronized as well.

b
c No logging
? It is sufficient to record only the first read
access and the first write access to a variable
in each ltf.
a
No logging
race
34
Djit Which LTFs to Check?

a occurs in t1
b and c previously occur in t2
If a is synchronized with c then it must also be
synchronized with b.

? It is sufficient to check a current access
with the most recent accesses in each of the
other threads.
35
DjitAccess History

For every variable v for each of the threads
The last ltf in which the thread read from v
The last ltf in which the thread wrote to v

On each first read and first write to v in a ltf
every thread updates the access history of v
If the access to v is a read, the thread checks
all recent writes by other threads to v
If the access is a write, the thread checks all
recent reads as well as all recent writes by
other threads to v

36
Djit Pros and Cons

? No false alarms
? No missed races (in a given scheduling)
? Very sensitive to differences in scheduling
? Requires enormous number of runs. Yet cannot
prove tested program is race free.
Can be extended to support other synchronization
primitives, like barriers and counting semaphores
Correct on relaxed memory systems AdveHill,
1990 data-race-free-1

37
LocksetThe Basic Algorithm

C(v) set of locks that protected all accesses tov
so far
locks_held(t) set of currently acquired locks

Algorithm
- For each v, init C(v) to the set of all
possible locks
- On each access to v by thread t
- lhv?locks_held(t)
- if it is a read, then lhv?lhv ?
readers_lock
- C(v)?C(v) n lhv
- if C(v)Ø, issue a warning

38
LocksetExample
lock( m1 )
read v
m1,r_l
m1,r_l
Warning locking discipline for v is Violated !!!
unlock( m1 )

lock( m2 )
write v
m2

unlock( m2 )

39
Lockset Savage et.al. 1997

Locking discipline every shared location is
consistently protected by a lock.
Lockset detects violations of this locking
discipline.

Warning locking Discipline for v is Violated
40
Lockset vs. Djit
1 hb? 2, yet there might be a data race on y
under a different scheduling ? the locking
discipline is violated
41
Lockset Which Accesses to Check?

a and b in same thread, same time frame, a
precedes b, then Locksa(v) ? Locksb(v)
Locksu(v) is set of locks held during access u to
v.

? Only first accesses need be checked in every
time frame
? Lockset can use same logging (access history)
as DJIT
42
LocksetPros and Cons

? Less sensitive to scheduling
? Detects a superset of all apparently raced
locations in an execution of a program
races cannot be missed
? Lots of false alarms
? Still dependent on scheduling
cannot prove tested program is race free

43
Combining Djit and Lockset

Lockset can detect suspected races in more
execution orders
Djit can filter out the spurious warnings
reported by Lockset
Lockset can help reduce number of checks
performed by Djit
If C(v) is not empty yet, Djit should not check
v for races
The implementation overhead comes mainly from the
access logging mechanism
Can be shared by the algorithms

44
Disabling Detection

Obviously, Lockset can report false alarms.
Also Djit detects apparent races that are not
necessarily feasible races
Intentional races
Unrefined granularity
Private synchronization
Detection can be disabled through the use of
source code annotations.

45
Implementing Access Logging

In order to record only the first accesses (reads
and writes) to shared locations in each of the
time frames, we use the concept of views.
A view is a region in virtual memory.
Each view has its own protection NoAccess /
ReadOnly / ReadWrite.
Each shared object in physical memory can be
accessed through each of the three views.
Helps to distinguish between reads and writes.
Enables the realization of the dynamic detection
unit and avoids false sharing problem.

46
Implementing Access LoggingRecording First LTF
Accesses

An access attempt with wrong permissions
generates a fault
The fault handler activates the logging and the
detection mechanisms, and switches views

47
Swizzling Between Views
unlock(m)
read fault
read x
write fault
write x
unlock(m)
write fault
write x
48
Minipages and Dynamic Granularity of Detection

Minipage is a shared location that can be
accessed using the approach of views.
We detect races on minipages and not on fixed
number of bytes.
Each minipage is associated with the access
history of Djit and Lockset state.
The size of a minipage can vary.

49
Detection Granularity

A minipage ( detection unit) can contain
Objects of primitive types char, int, double,
etc.
Objects of complex types classes and structures
Entire arrays of complex or primitive types
An array can be placed on a single minipage or
split across several minipages.
Array still occupies contiguous addresses.

50
Playing with Detection Granularity to Reduce
Overhead

Larger minipages ? reduced overhead
Less faults
A minipage should be refined into smaller
minipages when suspicious alarms occur
Replay technology can help (if available)
When suspicion resolved regroup
May disable detection on the accesses involved

51
Detection Granularity
52
Overheads

The overheads are steady for 1-4 threads we are
scalable in number of CPUs.
The overheads increase for high number of
threads.
Number of page faults (both read and write)
increases linearly with number of threads.
In fact, any on-the-fly tool for data race
detection will be unscalable in number of threads
when number of CPUs is fixed.

53
Overheads

The testing platform
4-way IBM Netfinity, 550 MHz
2GB RAM
Microsoft Windows NT

54
Overheads
55
Benchmark Overheads (4-way IBM Netfinity server,
550MHz, Win-NT)
56
Overhead Breakdown

Numbers above bars are write/read faults.
Most of the overhead come from page faults.
Overhead due to detection algorithms is small.

57
Breakdowns of Overheads
58
Reporting Races in MultiRace
59
SummaryMultiRace is

Transparent
Supports two-way and global synchronization
primitives locks and barriers
Detects races that actually occurred (Djit)
Usually does not miss races that could occur with
a different scheduling (Lockset)
Correct for weak memory models
Scalable
Exhibits flexible detection granularity

60
Conclusions

MultiRace makes it easier for programmer to trust
his programs
No need to add synchronization just in case
In case of doubt - MultiRace should be activated
each time the program executes

61
Future/Ongoing work

Implement instrumenting pre-compiler
Higher transparency
Higher scalability
Automatic dynamic granularity adaptation
Integrate with scheduling-generator
Integrate with record/replay
Integrate with the compiler/debugger
May get rid of faults and views
Optimizations through static analysis
Implement extensions for semaphores and other
synch primitives
Etc.

62
References

T. Brecht and H. Sandhu. The Region Trap Library
Handling traps on application-defined regions of
memory. In USENIX Annual Technical Conference,
Monterey, CA, June 1999.
A. Itzkovitz, A. Schuster, and O.
Zeev-Ben-Mordechai. Towards Integration of Data
Race Detection in DSM System. In The Journal of
Parallel and Distributed Computing (JPDC), 59(2)
pp. 180-203, Nov. 1999
L. Lamport. Time, Clock, and the Ordering of
Events in a Distributed System. In Communications
of the ACM, 21(7) pp. 558-565, Jul. 1978
F. Mattern. Virtual Time and Global States of
Distributed Systems. In Parallel Distributed
Algorithms, pp. 215226, 1989.

63
ReferencesCont.

R. H. B. Netzer and B. P. Miller. What Are Race
Conditions? Some Issues and Formalizations. In
ACM Letters on Programming Languages and Systems,
1(1) pp. 74-88, Mar. 1992.
R. H. B. Netzer and B. P. Miller. On the
Complexity of Event Ordering for Shared-Memory
Parallel Program Executions. In 1990
International Conference on Parallel Processing,
2 pp. 9397, Aug. 1990
R. H. B. Netzer and B. P. Miller. Detecting Data
Races in Parallel Program Executions. In Advances
in Languages and Compilers for Parallel
Processing, MIT Press 1991, pp. 109-129.

64
ReferencesCont.

S. Savage, M. Burrows, G. Nelson, P. Sobalvarro,
and T.E. Anderson. Eraser A Dynamic Data Race
Detector for Multithreaded Programs. In ACM
Transactions on Computer Systems, 15(4) pp.
391-411, 1997
E. Pozniansky. Efficient On-the-Fly Data Race
Detection in Multithreaded C Programs. Research
Thesis.
O. Zeev-Ben-Mordehai. Efficient Integration of
On-The-Fly Data Race Detection in Distributed
Shared Memory and Symmetric Multiprocessor
Environments. Research Thesis, May 2001.

65
The End

Write a Comment

User Comments (0)