Title: Transactional Memory
1Transactional Memory
- Companion slides for
- The Art of Multiprocessor Programming
- by Maurice Herlihy Nir Shavit
2Traditional Software Scaling
7x
Speedup
3.6x
1.8x
User code
Traditional Uniprocessor
Time Moores law
3Multicore Software Scaling
User code
Multicore
Unfortunately, not so simple
4Real-World Multicore Scaling
Speedup
2.9x
2x
1.8x
User code
Multicore
Parallelization and Synchronization require
great care
5Why?
Amdahls Law Speedup 1/(ParallelPart/N
SequentialPart) Pay for N 8 cores
SequentialPart 25 Speedup only 2.9 times!
As num cores grows the effect of 25 becomes more
accute 2.3/4, 2.9/8, 3.4/16, 3.7/32.
6Shared Data Structures
Fine Grained
Coarse Grained
25 Shared
25 Shared
75 Unshared
75 Unshared
7A FIFO Queue
Tail
Head
Enqueue(d)
Dequeue() gt a
8A Concurrent FIFO Queue
Simple Code, easy to prove correct
Object lock
Contention and sequential bottleneck
9Fine Grain Locks
Finer Granularity, More Complex Code
Tail
Head
P Dequeue() gt a
Q Enqueue(d)
Verification nightmare worry about deadlock,
livelock
10Fine Grain Locks
Complex boundary cases empty queue, last item
Tail
Head
P Dequeue() gt a
Q Enqueue(b)
Worry how to acquire multiple locks
11Moreover Locking Relies on Conventions
- Relation between
- Lock bit and object bits
- Exists only in programmers mind
Actual comment from Linux Kernel (hat tip
Bradley Kuszmaul)
/ When a locked buffer is visible to the I/O
layer BH_Launder is set. This means before
unlocking we must clear BH_Launder,mb() on
alpha and then clear BH_Lock, so no reader can
see BH_Launder set on an unlocked buffer and
then risk to deadlock. /
11
12Lock-Free (JDK 1.5)
Even Finer Granularity, Even More Complex Code
Tail
Head
P Dequeue() gt a
Q Enqueue(d)
Worry about starvation, subtle bugs, hardness to
modify
13Composing Objects
Complex Move data atomically between structures
More than twice the worry
14Transactional Memory
Great Performance, Simple Code
Tail
Head
P Dequeue() gt a
Q Enqueue(d)
Dont worry about deadlock, livelock, subtle
bugs, etc
15Promise of Transactional Memory
Dont worry which locks need to cover which
variables when
Tail
Head
P Dequeue() gt a
Q Enqueue(d)
TM deals with boundary cases under the hood
16Composing Objects
Will be easy to modify multiple structures
atomically
Provide Composability
17The Transactional Manifesto
- Current practice inadequate
- to meet the multicore challenge
- Research Agenda
- Replace locking with a transactional API
- Design languages to support this model
- Implement the run-time to be fast enough
17
18Transactions
- Atomic
- Commit takes effect
- Abort effects rolled back
- Usually retried
- Serizalizable
- Appear to happen in one-at-a-time order
18
19Atomic Blocks
atomic x.remove(3) y.add(3)atomic y
null
19
20Atomic Blocks
atomic x.remove(3) y.add(3)atomic y
null
No data race
20
21Designing a FIFO Queue
Public void LeftEnq(item x) Qnode q new
Qnode(x) q.left this.left this.left.right
q this.left q
Write sequential Code
21
22Designing a FIFO Queue
Public void LeftEnq(item x) atomic Qnode q
new Qnode(x) q.left this.left
this.left.right q this.left q
22
23Designing a FIFO Queue
Public void LeftEnq(item x) atomic Qnode q
new Qnode(x) q.left this.left
this.left.right q this.left q
Enclose in atomic block
23
24Warning
- Not always this simple
- Conditional waits
- Enhanced concurrency
- Complex patterns
- But often it is
- Works for sadistic homework
24
25Composition
Public void Transfer(QueueltTgt q1, q2) atomic
T x q1.deq() q2.enq(x)
Trivial or what?
25
26Roll Back
Public T LeftDeq() atomic if (this.left
null) retry
Roll back transaction and restart when something
changes
26
27OrElse Composition
atomic x q1.deq() orElse x
q2.deq()
Run 1st method. If it retries
Run 2nd method. If it retries
Entire statement retries
27
28Transactional Memory
- Software transactional memory (STM)
- Hardware transactional memory (HTM)
- Hybrid transactional memory (HyTM, try in
hardware and default to software if unsuccessful)
28
29Hardware versus Software
- Do we need hardware at all?
- Analogies
- Virtual memory yes!
- Garbage collection no!
- Probably do need HW for performance
- Do we need software?
- Policy issues dont make sense for hardware
29
30Transactional Consistency
- Memory Transactions are collections of reads and
writes executed atomically - Tranactions should maintain internal and external
consistency - External with respect to the interleavings of
other transactions. - Internal the transaction itself should operate
on a consistent state.
31External Consistency
Invariant x 2y
4
x
Transaction A Write x Write y
2
y
Transaction B Read x Read y Compute z
1/(x-y) 1/2
Application Memory
32Simple Lock-Based STM
- STMs come in different forms
- Lock-based
- Lock-free
- Here we will describe a simple lock-based STM
33Synchronization
- Transaction keeps
- Read set locations values read
- Write set locations values to be written
- Deferred update
- Changes installed at commit
- Lazy conflict detection
- Conflicts detected at commit
34STM Transactional Locking
Map
V
Array of Versioned- Write-Locks
Application Memory
V
V
34
35Reading an Object
Mem
Locks
V
- Put Vs value in RS
- If not already locked
35
36To Write an Object
Mem
Locks
V
- Add V and new value to WS
36
37To Commit
Mem
Locks
V
- Acquire W locks
- Check Vs unchanged
- In RS WS
- Install new values
- Increment Vs
- Release
X
V1
V
Y
V1
V
37
38Problem Internal Inconsistency
- A Zombie is a currently active transaction that
is destined to abort because it saw an
inconsistent state - If Zombies see inconsistent states errors can
occur and the fact that the transaction will
eventually abort does not save us
39Internal Consistency
Invariant x 2y
4
x
Transaction B Read x 4
2
Transaction A Write x (kills
B) Write y
y
Transaction B (zombie) Read y 4 Compute
z 1/(x-y)
Application Memory
DIV by 0 ERROR
40Solution The Global Clock
- Have one shared global clock
- Incremented by (small subset of) writing
transactions - Read by all transactions
- Used to validate that state worked on is always
consistent
41Read-Only Transactions
Mem
Locks
- Copy V Clock to RV
- Read lock,V
- Read mem
- Check unlocked
- Recheck V unchanged
- Check V lt RV
12
Reads form a snapshot of memory. No read set!
32
56
100
19
17
Private Read Version (RV)
41
42Regular Transactions
Mem
Locks
- Copy V Clock to RV
- On read/write, check
- Unlocked
- V RV
- Add to R/W set
12
32
56
19
100
17
Private Read Version (RV)
42
43Regular Transactions
Mem
Locks
- Acquire locks
- WV FInc(V Clock)
- Check each V RV
- Update memory
- Set write Vs to WV
12
x
100
32
56
19
100
100
101
100
y
17
Private Read Version (RV)
Shared Version Clock
43
44Hardware Transactional Memory
- Exploit Cache coherence
- Already almost does it
- Invalidation
- Consistency checking
- Speculative execution
- Branch prediction optimistic synch!
44
45HW Transactional Memory
read
active
caches
Interconnect
memory
45
46Transactional Memory
read
active
active
caches
memory
46
47Transactional Memory
active
committed
active
caches
memory
47
48Transactional Memory
write
committed
active
caches
memory
48
49Rewind
write
aborted
active
active
caches
memory
49
50Transaction Commit
- At commit point
- If no cache conflicts, we win.
- Mark transactional entries
- Read-only valid
- Modified dirty (eventually written back)
- Thats all, folks!
- Except for a few details
50
51Not all Skittles and Beer
- Limits to
- Transactional cache size
- Scheduling quantum
- Transaction cannot commit if it is
- Too big
- Too slow
- Actual limits platform-dependent
51
52TM Design Issues
- Implementation choices
- Language design issues
- Semantic issues
53Granularity
- Object
- managed languages, Java, C,
- Easy to control interactions between
transactional non-trans threads - Word
- C, C,
- Hard to control interactions between
transactional non-trans threads
54Direct/Deferred Update
- Deferred
- modify private copies install on commit
- Commit requires work
- Consistency easier
- Direct
- Modify in place, roll back on abort
- Makes commit efficient
- Consistency harder
55Conflict Detection
- Eager
- Detect before conflict arises
- Contention manager module resolves
- Lazy
- Detect on commit/abort
- Mixed
- Eager write/write, lazy read/write
56Conflict Detection
- Eager detection may abort transaction that could
have committed. - Lazy detection discards more computation.
57Contention Management Scheduling
- How to resolve conflicts?
- Who moves forward and who rolls back?
- Lots of empirical work but formal work in infancy
58Contention Manager Strategies
- Exponential backoff
- Priority to
- Oldest?
- Most work?
- Non-waiting?
- None Dominates
- But needed anyway
Judgment of Solomon
59I/O System Calls?
- Some I/O revocable
- Provide transaction-safe libraries
- Undoable file system/DB calls
- Some not
- Opening cash drawer
- Firing missile
60I/O System Calls
- One solution make transaction irrevocable
- If transaction tries I/O, switch to irrevocable
mode. - There can be only one
- Requires serial execution
- No explicit aborts
- In irrevocable transactions
61Exceptions
int i 0 try atomic i node
new Node() catch (Exception e)
print(i)
62Exceptions
Throws OutOfMemoryException!
int i 0 try atomic i node
new Node() catch (Exception e)
print(i)
63Exceptions
Throws OutOfMemoryException!
int i 0 try atomic i node
new Node() catch (Exception e)
print(i)
What is printed?
64Unhandled Exceptions
- Aborts transaction
- Preserves invariants
- Safer
- Commits transaction
- Like locking semantics
- What if exception object refers to values
modified in transaction?
65Nested Transactions
atomic void foo() bar() atomic void bar()
66Nested Transactions
- Needed for modularity
- Who knew that cosine() contained a transaction?
- Flat nesting
- If child aborts, so does parent
- First-class nesting
- If child aborts, partial rollback of child only
67Open Nested Transactions
- Normally, child commit
- Visible only to parent
- In open nested transactions
- Commit visible to all
- Escape mechanism
- Dangerous, but useful
- What escape mechanisms are needed?
68Strong vs Weak Isolation
- How do transactional non-transactional threads
synchronize? - Similar to memory-model theory?
- Efficient algorithms?
69I, for one, Welcome our new Multicore Overlords
- Multicore forces us to rethink almost everything
70I, for one, Welcome our new Multicore Overlords
- Multicore forces us to rethink almost everything
- Standard approaches too complex
71I, for one, Welcome our new Multicore Overlords
- Multicore forces us to rethink almost everything
- Standard approaches wont scale
- Transactions might make life simpler
72I, for one, Welcome our new Multicore Overlords
- Multicore forces us to rethink almost everything
- Standard approaches wont scale
- Transactions might
- Plenty more to do