Efficient OntheFly Data Race Detection in Multithreaded C Programs - PowerPoint PPT Presentation

1 / 61
About This Presentation
Title:

Efficient OntheFly Data Race Detection in Multithreaded C Programs

Description:

First Interleaving: Thread 1 Thread 2. 1. X=0. 2. T=X. 3. X ... Static Methods perform a compile-time analysis of the code. Too conservative. ... – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 62
Provided by: elip
Category:

less

Transcript and Presenter's Notes

Title: Efficient OntheFly Data Race Detection in Multithreaded C Programs


1
Efficient On-the-FlyData Race Detection
inMultithreaded C Programs
  • Eli Pozniansky Assaf Schuster

2
Table of Contents
  • What is a Data-Race?
  • Why Data-Races are Undesired?
  • How Data-Races Can be Prevented?
  • Can Data-Races be Easily Detected?
  • Feasible and Apparent Data-Races
  • Complexity of Data-Race Detection

3
Table of ContentsCont.
  • Approaches to Detection of Apparent Data-Races
  • Static Methods
  • Dynamic Methods
  • Post-Mortem Methods
  • On-The-Fly Methods

4
Table of ContentsCont.
  • Closer Look at Dynamic Methods
  • DJIT
  • Lockset
  • Results
  • Summary Conclusions
  • Future Ongoing work
  • References

5
What is a Data Race?
  • Two concurrent accesses to a shared location, at
    least one of them for writing.
  • Indicative of a bug

Thread 1 Thread
2 X TY Z2 TX
6
Why Data-Races areUndesired?
  • Programs which contain data-races usually
    demonstrate unexpected and even non-deterministic
    behavior.
  • The outcome might depend on specific execution
    order (A.K.A threads interleaving).
  • Re-running the program may not always produce the
    same results.
  • Thus, hard to debug and hard to write correct
    programs.

7
Why Data-Races areUndesired? - Example
  • First Interleaving Thread 1 Thread 2
  • 1. X0
  • 2. TX
  • 3. X
  • Second Interleaving Thread 1 Thread 2
  • 1. X0
  • 2. X
  • 3. TX
  • T0 or T1?

8
How Data-Races Can be Prevented? Explicit
Synchronization
  • Idea In order to prevent undesired concurrent
    accesses to shared locations, we must explicitly
    synchronize between threads.
  • The means for explicit synchronization are
  • Locks, Mutexes and Critical Sections
  • Barriers
  • Binary Semaphores and Counting Semaphores
  • Monitors
  • Single-Writer/Multiple-Readers (SWMR) Locks
  • Others

9
Synchronization Bad Bank Account Example
  • Thread 1 Thread 2
  • Deposit( amount ) Withdraw( amount )
  • balanceamount if (balanceltamount)
  • print( Error )
  • else
  • balanceamount
  • Deposit and Withdraw are not atomic!!!
  • What is the final balance after a series of
    concurrent deposits and withdraws?

10
Synchronization Good Bank Account Example
  • Thread 1 Thread 2
  • Deposit( amount ) Withdraw( amount )
  • Lock( m ) Lock( m )
  • balanceamount if (balanceltamount)
  • Unlock( m ) print( Error )
  • else
  • balanceamount
  • Unlock( m )
  • Since critical sections can never execute
    concurrently, this version exhibits no data-races.

Critical Sections
11
Is This Sufficient?
  • Yes!
  • No!
  • Programmer dependent
  • Correctness programmer may forget to synch
  • Need tools to detect data races
  • Expensive
  • Efficiency to achieve correctness, programmer
    may overdo.
  • Need tools to remove excessive synchs

12
Can Data-Races be Easily Detected? No!
  • Unfortunately, deciding if a given program
    contains potential data-races is computationally
    hard!!!
  • There are a lot of execution orders. For t
    threads of n instructions each the number of
    possible orders is about tnt.
  • In addition to all different schedulings, all
    possible inputs should be tested as well.
  • inserting a detection code in a program can
    change its execution schedule enough to make all
    errors disappear.

13
Feasible Data-Races
  • Feasible Data-Races races that are based on the
    possible behavior of the program (i.e. semantics
    of the programs computation).
  • These are the actual (!) data-races that can
    possibly happen in any specific execution.
  • Locating feasible data-races requires full
    analyzing of the programs semantics to determine
    if the execution could have allowed a and b
    (accesses to same shared variable) to execute
    concurrently.

14
Apparent Data-Races
  • Apparent Data-Races approximations (!) of
    feasible data-races that are based on only the
    behavior of the explicit synchronization
    performed by some feasible execution (and not the
    semantics of the programs computation, i.e.
    ignoring all conditional statements).
  • Important, since data-races are usually a result
    of improper synchronization. Thus easier to
    detect, but less accurate.

15
Apparent Data-Races Cont.
  • For example, a and b, accesses to same shared
    variable in some execution, are said to be
    ordered, if there is a chain of corresponding
    explicit synchronization events between them.
  • Similarly, a and b are said to have potentially
    executed concurrently if no explicit
    synchronization prevented them from doing so.

16
Feasible vs. ApparentExample 1
  • Thread 1 F?false Thread 2
  • X
  • Ftrue
  • if (Ftrue)
  • X
  • Apparent data-races in the execution above 1
    2.
  • Feasible data-races 1 only!!! No feasible
    execution exists, in which X-- is performed
    before X (suppose F is false at start).
  • Protecting F only will protect X as well.

1
2
17
Feasible vs. Apparent Example 2
  • Thread 1 F?false Thread 2
  • X Lock( m )
  • Lock( m ) T F
  • Ftrue Unlock( m )
  • Unlock( m ) if (Ttrue)
  • X
  • No feasible or apparent data-races exist under
    any execution order!!!
  • F is protected by a lock. Also X and X
    are always ordered and properly synchronized.
  • Rather there is a sync chain of
    Unlock(m)-Lock(m) between X and X , or
    only X executes.

18
Complexity ofData-Race Detection
  • Exactly locating the feasible data-races is an
    NP-hard problem. Thus, the apparent races, which
    are simpler to locate, must be detected for
    debugging.
  • Fortunately, apparent data-races exist if and
    only if at least one feasible data-race exists
    somewhere in the execution.
  • Yet, the problem of exhaustively locating all
    apparent data-races still remains NP-hard.

19
Why Data-Race Detectionis NP-Hard?
  • How can we know that in a program P two accesses,
    a and b, to the same shared variable are
    concurrent?
  • Intuitively we must check all execution orders
    of P and see. If we discover an execution order,
    in which a and b are concurrent, we can report on
    data-race and stop. Otherwise we should continue
    checking.

20
Approaches to Detection ofApparent Data-Races
Static
  • There are two main approaches to detection of
    apparent data-races (sometimes a combination of
    both is used)
  • Static Methods perform a compile-time analysis
    of the code.
  • Too conservative. Cant know or understand the
    semantics of the program. Result in excessive
    number of false alarms that hide the real
    data-races.
  • Test the program globally see the full code
    of the tested program and can warn about all
    possible errors in all possible executions.

21
Approaches to Detection ofApparent Data-Races
Dynamic
  • Dynamic Methods use tracing mechanism to detect
    whether a particular execution of a program
    actually exhibited data-races.
  • Detect only those apparent data-races that
    occur during a feasible execution.
  • Test the program locally - consider only one
    specific execution path of the program each time.
  • Post-Mortem Methods after the execution
    terminates, analyze the trace of the run and warn
    about possible data-races that were found.
  • On-The-Fly Methods buffer partial trace
    information in memory, analyze it and detect
    races as they occur.

22
MultiRace Approach
  • On-the-fly detection of apparent data races
  • Two detection algorithms (improved versions)
  • Lockset Savage, Burrows, Nelson, Sobalvarro,
    Anderson 97
  • Djit Itzkovitz, Schuster, Zeev-ben-Mordechai
    99
  • Correct even for weak memory systems ?
  • Flexible detection granularity
  • Variables and Objects
  • Especially suited for OO programming languages
  • Source-code (C) instrumentation Memory
    mappings
  • Transparent ?
  • Low overhead ?

23
Where is Waldo?
  • define N 100
  • Type g_stack new TypeN
  • int g_counter 0
  • Lock g_lock
  • void push( Type obj )lock(g_lock)...unlock(g_lo
    ck)
  • void pop( Type obj ) lock(g_lock)...unlock(g_lo
    ck)
  • void popAll( )
  • lock(g_lock)
  • delete g_stack
  • g_stack new TypeN
  • g_counter 0
  • unlock(g_lock)
  • int find( Type obj, int number )
  • lock(g_lock)
  • for (int i 0 i lt number i)
  • if (obj g_stacki) break // Found!!!
  • if (i number) i -1 // Not found Return
    -1 to caller

24
Can You Find the Race?
  • define N 100
  • Type g_stack new TypeN
  • int g_counter 0
  • Lock g_lock
  • void push( Type obj )lock(g_lock)...unlock(g_lo
    ck)
  • void pop( Type obj ) lock(g_lock)...unlock(g_lo
    ck)
  • void popAll( )
  • lock(g_lock)
  • delete g_stack
  • g_stack new TypeN
  • g_counter 0
  • unlock(g_lock)
  • int find( Type obj, int number )
  • lock(g_lock)
  • for (int i 0 i lt number i)
  • if (obj g_stacki) break // Found!!!
  • if (i number) i -1 // Not found Return
    -1 to caller

Similar problem was found in java.util.Vector
write
read
25
Apparent Data Races
  • Based only the behavior of the explicit synch
  • not on program semantics
  • Easier to locate
  • Less accurate
  • Exist iff real (feasible) data race exist ?
  • Detection is still NP-hard ?

26
Lamports Happens-Before Partial Order
  • The happens-before partial order, denoted hb?, is
    defined for access events (reads, writes,
    releases and acquires) that happen in a specific
    execution, as follows
  • Program Order If a and b are events performed by
    the same thread, with a preceding b in program
    order, then a hb? b.
  • Release and Acquire Let a be a release and b be
    an acquire. If a and b take part in the same
    synchronization event, then a hb? b.
  • Transitivity If a hb? b and b hb? c, then a hb?
    c.
  • Shared accesses a and b are concurrent, a hb? b,
    if neither a hb? b nor b hb? a holds.

27
Djit Itskovitz et.al. 1999Apparent Data Races
  • Based on Lamports happens-before partial order
  • a,b concurrent if neither a hb? b nor b hb? a
  • ? Apparent data race
  • Otherwise, they are synchronized
  • Djit basic idea check each access performed
    against all previously performed accesses

a hb? b
28
DjitLocal Time Frames (LTF)
  • The execution of each thread is split into a
    sequence of time frames.
  • A new time frame starts on each unlock.
  • For every access there is a timestamp a vector
    of LTFs known to the thread at the moment the
    access takes place

29
DjitLocal Time Frames (LTF)
  • A vector ltft. for each thread t
  • ltftt is the LTF of thread t
  • ltftu stores the latest LTF of u known to t
  • If u is an acquirer of ts unlock
  • for k0 to maxthreads-1
  • ltfuk max( ltfuk, ltftk )

30
DjitVector Time Frames Example
write x
unlock( m1 )
read z
(2 1 1)
lock( m1 )
read y
(2 1 1)
unlock( m2 )
write x
(2 2 1)
lock( m2 )
write x
(2 2 1)
31
Realizing the hb? relation
  • a is access to thread ta at time frame Ta and b
    is an access to thread tb (!ta)
  • a hb? b iff at the moment that b occurs Ta lt
    ltftbta
  • There exists a chain of releases and
    corresponding acquires and ta local time frame
    propagates to tb

32
DjitChecking Concurrency
  • P(a,b) ? ( a.type write ? b.type write ) ?
  • ? ( a.ltf b.timestampa.thread_id )
  • a was logged earlier than b.
  • P returns TRUE iff a and b are racing.

Problem Too much logging, too
many checks.
33
DjitWhich Accesses to Check?
  • a in thread t1, and b and c in thread t2 in same
    ltf
  • b precedes c in the program order.
  • If a and b are synchronized, then a and c are
    synchronized as well.

b
c No logging
? It is sufficient to record only the first read
access and the first write access to a variable
in each ltf.
a
No logging
race
34
Djit Which LTFs to Check?
  • a occurs in t1
  • b and c previously occur in t2
  • If a is synchronized with c then it must also be
    synchronized with b.

? It is sufficient to check a current access
with the most recent accesses in each of the
other threads.
35
DjitAccess History
  • For every variable v for each of the threads
  • The last ltf in which the thread read from v
  • The last ltf in which the thread wrote to v
  • On each first read and first write to v in a ltf
    every thread updates the access history of v
  • If the access to v is a read, the thread checks
    all recent writes by other threads to v
  • If the access is a write, the thread checks all
    recent reads as well as all recent writes by
    other threads to v

36
Djit Pros and Cons
  • ? No false alarms
  • ? No missed races (in a given scheduling)
  • ? Very sensitive to differences in scheduling
  • ? Requires enormous number of runs. Yet cannot
    prove tested program is race free.
  • Can be extended to support other synchronization
    primitives, like barriers and counting semaphores
  • Correct on relaxed memory systems AdveHill,
    1990 data-race-free-1

37
LocksetThe Basic Algorithm
  • C(v) set of locks that protected all accesses tov
    so far
  • locks_held(t) set of currently acquired locks
  • Algorithm
  • - For each v, init C(v) to the set of all
    possible locks
  • - On each access to v by thread t
  • - lhv?locks_held(t)
  • - if it is a read, then lhv?lhv ?
    readers_lock
  • - C(v)?C(v) n lhv
  • - if C(v)Ø, issue a warning

38
LocksetExample
lock( m1 )
read v
m1,r_l
m1,r_l
Warning locking discipline for v is Violated !!!
unlock( m1 )

lock( m2 )
write v
m2

unlock( m2 )

39
Lockset Savage et.al. 1997
  • Locking discipline every shared location is
    consistently protected by a lock.
  • Lockset detects violations of this locking
    discipline.

Warning locking Discipline for v is Violated
40
Lockset vs. Djit
1 hb? 2, yet there might be a data race on y
under a different scheduling ? the locking
discipline is violated
41
Lockset Which Accesses to Check?
  • a and b in same thread, same time frame, a
    precedes b, then Locksa(v) ? Locksb(v)
  • Locksu(v) is set of locks held during access u to
    v.

? Only first accesses need be checked in every
time frame
? Lockset can use same logging (access history)
as DJIT
42
LocksetPros and Cons
  • ? Less sensitive to scheduling
  • ? Detects a superset of all apparently raced
    locations in an execution of a program
  • races cannot be missed
  • ? Lots of false alarms
  • ? Still dependent on scheduling
  • cannot prove tested program is race free

43
Combining Djit and Lockset
  • Lockset can detect suspected races in more
    execution orders
  • Djit can filter out the spurious warnings
    reported by Lockset
  • Lockset can help reduce number of checks
    performed by Djit
  • If C(v) is not empty yet, Djit should not check
    v for races
  • The implementation overhead comes mainly from the
    access logging mechanism
  • Can be shared by the algorithms

44
Disabling Detection
  • Obviously, Lockset can report false alarms.
  • Also Djit detects apparent races that are not
    necessarily feasible races
  • Intentional races
  • Unrefined granularity
  • Private synchronization
  • Detection can be disabled through the use of
    source code annotations.

45
Implementing Access Logging
  • In order to record only the first accesses (reads
    and writes) to shared locations in each of the
    time frames, we use the concept of views.
  • A view is a region in virtual memory.
  • Each view has its own protection NoAccess /
    ReadOnly / ReadWrite.
  • Each shared object in physical memory can be
    accessed through each of the three views.
  • Helps to distinguish between reads and writes.
  • Enables the realization of the dynamic detection
    unit and avoids false sharing problem.

46
Implementing Access LoggingRecording First LTF
Accesses
  • An access attempt with wrong permissions
    generates a fault
  • The fault handler activates the logging and the
    detection mechanisms, and switches views

47
Swizzling Between Views
unlock(m)
read fault
read x
write fault
write x
unlock(m)
write fault
write x
48
Minipages and Dynamic Granularity of Detection
  • Minipage is a shared location that can be
    accessed using the approach of views.
  • We detect races on minipages and not on fixed
    number of bytes.
  • Each minipage is associated with the access
    history of Djit and Lockset state.
  • The size of a minipage can vary.

49
Detection Granularity
  • A minipage ( detection unit) can contain
  • Objects of primitive types char, int, double,
    etc.
  • Objects of complex types classes and structures
  • Entire arrays of complex or primitive types
  • An array can be placed on a single minipage or
    split across several minipages.
  • Array still occupies contiguous addresses.

50
Playing with Detection Granularity to Reduce
Overhead
  • Larger minipages ? reduced overhead
  • Less faults
  • A minipage should be refined into smaller
    minipages when suspicious alarms occur
  • Replay technology can help (if available)
  • When suspicion resolved regroup
  • May disable detection on the accesses involved

51
Detection Granularity
52
Overheads
  • The overheads are steady for 1-4 threads we are
    scalable in number of CPUs.
  • The overheads increase for high number of
    threads.
  • Number of page faults (both read and write)
    increases linearly with number of threads.
  • In fact, any on-the-fly tool for data race
    detection will be unscalable in number of threads
    when number of CPUs is fixed.

53
Overheads
  • The testing platform
  • 4-way IBM Netfinity, 550 MHz
  • 2GB RAM
  • Microsoft Windows NT

54
Overheads
55
Benchmark Overheads (4-way IBM Netfinity server,
550MHz, Win-NT)
56
Overhead Breakdown
  • Numbers above bars are write/read faults.
  • Most of the overhead come from page faults.
  • Overhead due to detection algorithms is small.

57
Breakdowns of Overheads
58
Reporting Races in MultiRace
59
SummaryMultiRace is
  • Transparent
  • Supports two-way and global synchronization
    primitives locks and barriers
  • Detects races that actually occurred (Djit)
  • Usually does not miss races that could occur with
    a different scheduling (Lockset)
  • Correct for weak memory models
  • Scalable
  • Exhibits flexible detection granularity

60
Conclusions
  • MultiRace makes it easier for programmer to trust
    his programs
  • No need to add synchronization just in case
  • In case of doubt - MultiRace should be activated
    each time the program executes

61
Future/Ongoing work
  • Implement instrumenting pre-compiler
  • Higher transparency
  • Higher scalability
  • Automatic dynamic granularity adaptation
  • Integrate with scheduling-generator
  • Integrate with record/replay
  • Integrate with the compiler/debugger
  • May get rid of faults and views
  • Optimizations through static analysis
  • Implement extensions for semaphores and other
    synch primitives
  • Etc.

62
References
  • T. Brecht and H. Sandhu. The Region Trap Library
    Handling traps on application-defined regions of
    memory. In USENIX Annual Technical Conference,
    Monterey, CA, June 1999.
  • A. Itzkovitz, A. Schuster, and O.
    Zeev-Ben-Mordechai. Towards Integration of Data
    Race Detection in DSM System. In The Journal of
    Parallel and Distributed Computing (JPDC), 59(2)
    pp. 180-203, Nov. 1999
  • L. Lamport. Time, Clock, and the Ordering of
    Events in a Distributed System. In Communications
    of the ACM, 21(7) pp. 558-565, Jul. 1978
  • F. Mattern. Virtual Time and Global States of
    Distributed Systems. In Parallel Distributed
    Algorithms, pp. 215226, 1989.

63
ReferencesCont.
  • R. H. B. Netzer and B. P. Miller. What Are Race
    Conditions? Some Issues and Formalizations. In
    ACM Letters on Programming Languages and Systems,
    1(1) pp. 74-88, Mar. 1992.
  • R. H. B. Netzer and B. P. Miller. On the
    Complexity of Event Ordering for Shared-Memory
    Parallel Program Executions. In 1990
    International Conference on Parallel Processing,
    2 pp. 9397, Aug. 1990
  • R. H. B. Netzer and B. P. Miller. Detecting Data
    Races in Parallel Program Executions. In Advances
    in Languages and Compilers for Parallel
    Processing, MIT Press 1991, pp. 109-129.

64
ReferencesCont.
  • S. Savage, M. Burrows, G. Nelson, P. Sobalvarro,
    and T.E. Anderson. Eraser A Dynamic Data Race
    Detector for Multithreaded Programs. In ACM
    Transactions on Computer Systems, 15(4) pp.
    391-411, 1997
  • E. Pozniansky. Efficient On-the-Fly Data Race
    Detection in Multithreaded C Programs. Research
    Thesis.
  • O. Zeev-Ben-Mordehai. Efficient Integration of
    On-The-Fly Data Race Detection in Distributed
    Shared Memory and Symmetric Multiprocessor
    Environments. Research Thesis, May 2001.

65
The End
Write a Comment
User Comments (0)
About PowerShow.com