Programming, Debugging, Profiling and Optimizing Transactional Memory Applications - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Programming, Debugging, Profiling and Optimizing Transactional Memory Applications

Description:

Programming, Debugging, Profiling and Optimizing Transactional Memory Applications PhD Thesis Proposal Ferad Zyulkyarov Department of Computer Architecture – PowerPoint PPT presentation

Number of Views:121
Avg rating:3.0/5.0
Slides: 35
Provided by: fera7
Category:

less

Transcript and Presenter's Notes

Title: Programming, Debugging, Profiling and Optimizing Transactional Memory Applications


1
Programming, Debugging, Profiling and Optimizing
Transactional Memory Applications
PhD Thesis Proposal
Ferad Zyulkyarov
  • Department of Computer Architecture
  • Universitat Politècnica de Catalunya
    BarcelonaTech
  • Barcelona Supercomputing Center

01 July 2010
2
Publications
  • Ferad Zyulkyarov, Srdjan Stipic, Tim Harris,
    Osman Unsal, Adrian Cristal, Ibrahim Hur, Mateo
    Valero, Discovering and Understanding Performance
    Bottlenecks in Transactional Applications,
    PACT'10
  • Ferad Zyulkyarov, Tim Harris, Osman Unsal, Adrian
    Cristal, Mateo Valero, Debugging Programs that
    use Atomic Blocks and Transactional Memory,
    PPoPP'10
  • Vladimir Gajinov, Ferad Zyulkyarov, Osman Unsal,
    Adrian Cristal, Eduard Ayguade, Tim Harris, Mateo
    Valero, QuakeTM Parallelizing a Complex Serial
    Application Using Transactional Memory , ICS'09
  • Ferad Zyulkyarov, Vladimir Gajinov, Osman Unsal,
    Adrian Cristal, Eduard Ayguade, Tim Harris, Mateo
    Valero, Atomic Quake Using Transactional Memory
    in an Interactive Multiplayer Game Server ,
    PPoPP09
  • Ferad Zyulkyarov, Sanja Cvijic,Osman Unsal,
    Adrian Cristal, Eduard Ayguade, Tim Harris, Mateo
    Valero, WormBench - A Configurable Workload for
    Evaluating Transactional Memory Systems, MEDEA
    '09
  • Ferad Zyulkyarov, Milos Milovanovic, Osman Unsal,
    Adrian Cristal, Eduard Ayguade, Tim Harris, Mateo
    Valero, Memory Management for Transaction
    Processing Core in Heterogeneous
    Chip-Multiprocessors, OSHMA '09
  • Milos Milovanovic, Osman Unsal, Adrian Cristal,
    Ferad Zyulkyarov, Mateo Valero, Compiler Support
    for Using Transactional Memory in C/C
    Applications, INTERACT07

3
Work Plan
12m
11m
21m
10m
15m
9.5m
7m
2m
01/10/2010
4
Transactional Memory
atomic statement1 statement2
statement3 statement4 ...
5
The Big Questions
  • Is programming with TM easy?
  • Is TM competitive with locks?
  • Are existing development tools sufficient?

6
Atomic Quake
  • Parallel Quake game server
  • All locks are replaces with atomic blocks
  • 27,400 LOC of C code in 56 files
  • Rich transactional application
  • 63 atomic blocks
  • Rich uses of atomic blocks
  • Library calls, I/O, error handling, memory
    allocation, failure atomicity
  • Various transactional characteristics
  • A workload to drive research in TM

7
Is programming with TM easy?
  • Yes.
  • In large applications where we have many shared
    objects and want to provide efficient fine grain
    synchronization
  • Example region based locking in tree data
    structure and graphs.

8
Where Transactions Fit?
Guarding different types of objects with separate
locks.
1 switch(object-gttype) / Lock phase / 2
KEY lock(key_mutex) break 3 LIFE
lock(life_mutex) break 4 WEAPON
lock(weapon_mutex) break 5 ARMOR
lock(armor_mutex) break 6 7 8
pick_up_object(object) 9 10 switch(object-gttype
) / Unlock phase / 11 KEY
unlock(key_mutex) break 12 LIFE
unlock(life_mutex) break 13 WEAPON
unlock(weapon_mutex) break 14 ARMOR
unlock(armor_mutex) break 15
Lock phase.
atomic
pick_up_object(object)

Unlock phase.
9
Is TM Competitive to Locks?
  • No.
  • 4-5x slowdown on single threaded version.
  • But it is promising to be competitive because of
    the obtained good scalability.

Scales OK up to 4 threads.
Sudden increase in aborts.
Threads Transactions Aborts Aborts Irrevocable
Threads Transactions Num Irrevocable
1 36 667 0 0.00 17
2 75 824 241 0.42 31
4 166 000 2 612 1.58 85
8 477 519 76 771 25.50 237
10
Are Existing Tools Sufficient?
  • No
  • We need
  • Richer language level primitives and integration.
  • Mechanisms to handle I/O.
  • Dynamic error handling.
  • Debuggers.
  • Profilers.

11
Unstructured Use of Locks
Atomic Block
1 bool first_if false 2 bool second_if
false 3 for (i0 iltsv_tot_num_players/sv_nproc
i) 4 ltstatements1gt 5 atomic 6
ltstatemnts2gt 7 if (!c-gtsend_message) 8
ltstatements3gt 9 first_if true 10
else 11 ltstamemnts5gt 12 if
(!sv.paused !Netchan_CanPacket(c-gtnetchan)) 1
3 ltstatmenets6gt 14 second_if
true 15 else 16
ltstatements8gt 17 if (c-gtstate
cs_spawned) 18 if (frame_threads_num
gt 1) 19 atomic 20
ltstatements9gt 21 22
else 23 ltstatements9gt 24
25 26 27 28 29 if
(first_if) 30 ltstatements4gt 31
first_if false 32 continue 33 34 if
(second_if) 35 ltstatements7gt 36
second_if false 37 continue 38 39
ltstatements10gt 40
Locks
1 for (i0 iltsv_tot_num_players/sv_nproc
i) 2 ltstatements1gt 3
LOCK(cl_msg_lockc - svs.clients) 4
ltstatemnts2gt 5 if (!c-gtsend_message) 6
ltstatements3gt 7 UNLOCK(cl_msg_lockc
- svs.clients) 8 ltstatements4gt 9
continue 10 11 ltstamemnts5gt 12
if (!sv.paused !Netchan_CanPacket
(c-gtnetchan)) 13 ltstatmenets6gt 14
UNLOCK(cl_msg_lockc - svs.clients) 15
ltstatements7gt 16 continue 17 18
ltstatements8gt 19 if (c-gtstate
cs_spawned) 20 if (frame_threads_num gt
1) LOCK(par_runcmd_lock) 21
ltstatements9gt 22 if (frame_thread_num gt
1) UNLOCK(par_runcmd_lock) 23 24
UNLOCK(cl_msg_lockc - svs.clients) 25
ltstatements10gt 26
Extra variables and code
Solution explicit commit
Complicated Conditional Logic
12
Various Transactional Characteristics
Per-atomic block runtime statistics from Atomic
Quake.
Very small transactions
Different execution frequency -gt Phased behavior.
ID TX Dynamic Length (CPU Cycles) Dynamic Length (CPU Cycles) Dynamic Length (CPU Cycles) Dynamic Length (CPU Cycles) Read Set (Bytes) Read Set (Bytes) Read Set (Bytes) Read Set (Bytes) Write Set (Bytes) Write Set (Bytes) Write Set (Bytes) Write Set (Bytes)
ID TX Total Min Max Avg Total Min Max Avg Total Min Max Avg
56 26,962 172,872,572 288 112,832 6,412 1,328,536 20 104 49 0 0 0 0
60 5,931 5,810,152 224 41,552 980 76,212 12 640 13 928 0 116 0
61 1,095 20,573,540 4,560 49,984 19,208 723,474 88 776 661 90 84 84 84
59 1,042 3,117,844 1,520 39,344 2,999 29,176 5 28 28 16,672 16 16 16
57 1,038 401,502,152 288,704 522,528 387,552 10,963,719 7,614 15,490 10,562 2,592,367 1,680 3,656 2,497
58 1,002 134,949,344 87,056 1,341,504 134,949 5,054,282 3,028 53,566 5,044 931,445 548 11,161 930
15 3 67,660 720 48,176 1,735 96 32 32 32 18 6 6 6
5 2 99,988 592 36,384 1,923 64 32 32 32 10 5 5 5
22 2 43,632 12,176 35,504 21,816 72 36 36 36 128 64 64 64
36 2 40,476 6,800 44,880 20,238 249 108 141 125 55 22 33 28
38 2 71,368 2,144 31,504 4,461 90 44 46 45 26 12 14 13
Very large transactions
Most frequent atomic block is read-only.
Control flow does not reach all atomic blocks.
13
Debugging Transactional Applications
  • Existing debuggers are not aware of atomic blocks
    and transactional memory
  • New principles and approaches
  • Debugging atomic blocks atomically
  • Debugging at the level of transactions
  • Managing transactions at debug-time
  • Extension for WinDbg to debug programs with
    atomic blocks

14
Atomicity in Debugging
  • Step over atomic blocks as if single instruction.
  • Abstracts weather atomic blocks are implemented
    with TM or lock inference
  • Good for debugging sync errors at granularity of
    atomic blocks vs. individual statements inside
    the atomic blocks.

Non-TM Aware Debugger
TM Aware Debugger
ltstatement 1gt ltstatement 2gt atomic ltstatement
3gt ltstatement 4gt ltstatement 5gt ltstatement
6gt ltstatement 7gt ltstatement 8gt
ltstatement 1gt ltstatement 2gt atomic ltstatement
3gt ltstatement 4gt ltstatement 5gt ltstatement
6gt ltstatement 7gt ltstatement 8gt
Debugging becomes frustrating when transaction
aborts.
15
Isolation in Debugging
  • What if we want to debug wrong code within atomic
    block?
  • Put breakpoint inside atomic block.
  • Validate the transaction
  • Step within the transaction.
  • The user does not observe intermediate results of
    concurrently running transactions
  • Switch transaction to irrevocable mode after
    validation.

atomic ltstatement 1gt ltstatement 2gt
ltstatement 3gt ltstatement 4gt
16
Debugging at the Level of Transactions
  • Assumes that atomic blocks are implemented with
    transactional memory.
  • Examine the internal state of the TM
  • Read/write set, re-executions, status
  • TM specific watch points
  • Break when conflict happens
  • Filters
  • Concurrent work with Herlihy and Lev PACT 09.

17
TM Specific Watchpoints
Filter Break if Address reservation_at_04 Thread
T2
Break when conflict happens
AND
atomic ltstatement 1gt ltstatement 2gt
ltstatement 3gt ltstatement 4gt
18
Managing Transactions at Debug-Time
  • At the level of atomic blocks
  • Debug time atomic blocks
  • Splitting atomic blocks
  • At the level of transactions
  • Changing the state of TM system (i.e. adding and
    removing entries from read/write set, change the
    status, abort)
  • Analogous to the functionality of existing
    debuggers to change the CPU state

19
Example Debug Time Atomic Blocks
ltstatement 1gt ltstatement 2gt ltstatement
3gt ltstatement 4gt ltstatement 5gt ltstatement
6gt ltstatement 7gt ltstatement 8gt ltstatement
9gt ltstatement 10gt ltstatement 11gt ltstatement
12gt ltstatement 13gt ltstatement 14gt
20
Example Debug Time Atomic Blocks
ltstatement 1gt ltstatement 2gt ltstatement
3gt StartDebugAtomic ltstatement 4gt ltstatement
5gt ltstatement 6gt ltstatement 7gt ltstatement
8gt ltstatement 9gt EndDebugAtomic ltstatement
10gt ltstatement 11gt ltstatement 12gt ltstatement
13gt ltstatement 14gt
User marks the start and the end of
the transactions
21
Issues of Profiling TM Programs
  • TM applications have unanticipated overheads
  • Problem raised by Pankratius talk at ICSE09
    and Rossbach et al. PPoPP10
  • Difficult to profile TM applications without
    profiling tools and without knowing the
    implementation of the TM system
  • Experience of optimizing QuakeTM, Gajinov et al.
    ICS2009

22
Profiling TM Programs
  • Design principles
  • Report results at source language constructs
  • Abstract the underlying TM system
  • Low probe effect and overhead
  • Profiling techniques
  • Conflict point discovery
  • Identifying conflicting data structures
  • Visualizing transactions

23
Conflict Point Discovery
  • Identifies the statements involved in conflicts
  • Provides contextual information
  • Finds the critical path

FileLine Conf. Method Line
Hashtable.cs51 152 Add If (_containerhashCode
Hashtable.cs48 62 Add uint hashCode HashSdbm(
Hashtable.cs53 5 Add _containerhashCode n
Hashtable.cs83 5 Add while (entry ! null)
ArrayList.cs79 3 Contains for (int i 0 i lt count i )
ArrayList.cs52 1 Add if (count capacity 1)
24
Call Context
increment() counter
Thread 1
for (int i 0 i lt 100 i)
probability80() probability20()
Bottom-up view increment (100) ----
probability80 (80) ---- probability20 (20)
Top-down view main (100) ---- probability80
(80) ---- increment (80)
-----probability20 (20) ---- increment
(20)
probability20 probability random()
100 if (probability gt 80) atomic
increment()
Thread 2
for (int i 0 i lt 100 i)
probability80() probability20()
probability80 probability random()
100 if (probability lt 80) atomic
increment()
25
Aborts Graph (Bayes)
AB1
AB2
There are 15 atomic blocks and only one of them
aborts most. Which atomic blocks cause AB3 to
abort?
Conf 73 Wasted 63
Conf 20 Wasted 29
AB3
72 of wasted work
26
Indentifying Conflicting Objects
1 List list new List() 2 list.Add(1) 3
list.Add(2) 4 list.Add(3) ... atomic
list.Replace(2, 33)
Per-Object View List.cs1 list (42) ---
ChangeNode (20 ) ---- Replace (12)
---- Add (8)
List
1
2
3
0x08
0x10
0x18
0x20
GC Root 0x08
Object Addr 0x20
Instr Addr 0x446290
GC
Memory Allocator
DbgEng
List.cs1
27
Transaction Visualizer (Genome)
Garbage Collection
Wait on barrier
Aborts occur at the first and last atomic blocks
in program order.
28
Overhead and Probe Effect
Process data offline or during GC.
Profiling Enabled - Profiling Disabled
Normalized Execution Time
 Thrd Bayes Bayes- Gen Gen- Intrd Intrd- Labr Labr- Vac Vac- WB WB-
1 1.59 1.00 1.27 1.00 1.29 1.00 1.07 1.00 1.26 1.00 0.71 1.00
2 1.00 0.56 0.97 0.67 0.97 0.58 0.64 0.61 0.83 0.59 0.60 0.55
4 0.23 0.23 0.73 0.52 0.91 0.36 0.45 0.46 0.58 0.40 0.41 0.33
8 0.21 0.20 0.73 0.55 1.57 0.38 0.72 0.56 0.53 0.34 0.33 0.22
Standard deviation for the difference 27
Abort Rate in
 Thrd Bayes Bayes- Gen Gen- Intrd Intrd- Labr Labr- Vac Vac- WB WB-
1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
2 4.39 4.69 0.07 0.07 3.69 3.51 0.19 0.15 0.80 0.80 0.00 0.00
4 16.29 27.31 0.26 0.36 14.90 13.65 0.35 0.36 2.30 2.45 0.00 0.00
8 53.74 66.08 0.53 0.80 39.64 37.41 0.40 0.47 4.91 5.30 0.02 0.03
Standard deviation for the difference 3.88
29
Optimization Techniques
  • Moving statements
  • Atomic block scheduling
  • Checkpoints and nested atomic blocks
  • Pessimistic reads
  • Early release

30
Moving Statements
No!
  • atomic
  • counter
  • ltstatement1gt
  • ltstatement2gt
  • ltstatement3gt
  • atomic
  • ltstatement1gt
  • ltstatement2gt
  • ltstatement3gt
  • counter

Will this code execute the same?
31
Checkpoints
  • atomic
  • ltstatement1gt
  • ltstatement2gt
  • ltstatement3gt
  • ltstatement4gt
  • ltstatement5gt
  • ltstatement6gt
  • ltstatement7gt
  • Conflicts
  • 2
  • 15
  • 4
  • 79

Insert Checkpoint
32
Checkpoints
  • atomic
  • ltstatement1gt
  • ltstatement2gt
  • ltstatement3gt
  • ltstatement4gt
  • ltstatement5gt
  • ltstatement6gt
  • ltcheckpointgt
  • ltstatement7gt
  • Conflicts
  • 2
  • 15
  • 4
  • 79

Reduced wasted work for the atomic block with 40.
Insert Checkpoint
33
Conclusion
  • Study the programmability aspects of TM
  • New debugging principles and approaches for TM
    applications
  • New profiling techniques for TM applications
  • Profile-guided optimization approaches for TM
    applications

34
  • ????
Write a Comment
User Comments (0)
About PowerShow.com