Title: Software Transactions: A ProgrammingLanguages Perspective
1Software Transactions A Programming-Languages
Perspective
- Dan Grossman
- University of Washington
- 27 March 2008
2Atomic
- An easier-to-use and harder-to-implement primitive
void deposit(int x) synchronized(this) int
tmp balance tmp x balance tmp
void deposit(int x) atomic int tmp
balance tmp x balance tmp
lock acquire/release
(behave as if) no interleaved computation
3Viewpoints
- Software transactions good for
- Software engineering (avoid races deadlocks)
- Performance (optimistic no conflict without
locks) - Research should be guiding
- New hardware with transactional support
- Software support
- Semantic mismatch between language hardware
- Prediction hardware for the common/simple case
- May be fast enough without hardware
- Lots of nontransactional hardware exists
4PL Perspective
- Complementary to lower-level implementation work
- Motivation
- The essence of the advantage over locks
- Language design
- Rigorous high-level semantics
- Interaction with rest of the language
- Language implementation
- Interaction with modern compilers
- New optimization needs
- Answers urgently needed for the multicore era
5Today, part 1
- Language design, semantics
- Motivation Example the GC analogy OOPSLA07
- Semantics strong vs. weak isolation PLDI07
POPL08 - Interaction w/ other features ICFP05SCHEME07P
OPL08 - Joint work with Intel PSL
6Today, part 2
- Implementation
- On one core ICFP05SCHEME07
- Static optimizations for strong isolation
PLDI07 - Multithreaded transactions
- Joint work with Intel PSL
7Code evolution
void deposit() synchronized(this) void
withdraw() synchronized(this) int
balance() synchronized(this)
8Code evolution
void deposit() synchronized(this) void
withdraw() synchronized(this) int
balance() synchronized(this) void
transfer(Acct from, int amt)
if(from.balance()gtamt amt lt maxXfer)
from.withdraw(amt) this.deposit(amt)
9Code evolution
void deposit() synchronized(this) void
withdraw() synchronized(this) int
balance() synchronized(this) void
transfer(Acct from, int amt)
synchronized(this) //race
if(from.balance()gtamt amt lt maxXfer)
from.withdraw(amt) this.deposit(amt)
10Code evolution
void deposit() synchronized(this) void
withdraw() synchronized(this) int
balance() synchronized(this) void
transfer(Acct from, int amt)
synchronized(this) synchronized(from)
//deadlock (still) if(from.balance()gtamt
amt lt maxXfer) from.withdraw(amt)
this.deposit(amt)
11Code evolution
void deposit() atomic void withdraw()
atomic int balance() atomic
12Code evolution
void deposit() atomic void withdraw()
atomic int balance() atomic
void transfer(Acct from, int amt)
//race if(from.balance()gtamt amt lt
maxXfer) from.withdraw(amt)
this.deposit(amt)
13Code evolution
void deposit() atomic void withdraw()
atomic int balance() atomic
void transfer(Acct from, int amt) atomic
//correct and parallelism-preserving!
if(from.balance()gtamt amt lt maxXfer)
from.withdraw(amt) this.deposit(amt)
14But can we generalize
- So transactions sure looks appealing
- But what is the essence of the benefit?
Transactional Memory (TM) is to shared-memory
concurrency as Garbage Collection (GC) is to
memory management
15GC in 60 seconds
- Allocate objects in the heap
- Deallocate objects to reuse heap space
- If too soon, dangling-pointer dereferences
- If too late, poor performance / space exhaustion
- Automate deallocation via reachability
approximation
16GC Bottom-line
- Established technology with widely accepted
benefits - Even though it can perform arbitrarily badly in
theory - Even though you cant always ignore how GC works
(at a high-level) - Even though an active research area after 40
years - Now about that analogy
17The problem, part 1
- Why memory management is hard
- Balance correctness (avoid dangling pointers)
- And performance (space waste or exhaustion)
- Manual approaches require whole-program protocols
-
- Example Manual reference count for each object
- Must avoid garbage cycles
18The problem, part 2
- Manual memory-management is non-modular
- Caller and callee must know what each other
access or deallocate to ensure right memory is
live - A small change can require wide-scale code
changes - Correctness requires knowing what data subsequent
computation will access
19The solution
- Move whole-program protocol to language
implementation - One-size-fits-most implemented by experts
- Usually inside the compiler and run-time
- GC system uses subtle invariants, e.g.
- Object header-word bits
- No unknown mature pointers to nursery objects
20So far
21Incomplete solution
- GC a bad idea when reachable is a bad
approximation of cannot-be-deallocated - Weak pointers overcome this fundamental
limitation - Best used by experts for well-recognized idioms
(e.g., software caches) - In extreme, programmers can encode
- manual memory management on top of GC
- Destroys most of GCs advantages
22Circumventing TM
class SpinLock private boolean b false
void acquire() while(true) atomic
if(b) continue b true
return void release()
atomic b false
23It really keeps going (see the essay)
24Lesson
- Transactional memory is to
- shared-memory concurrency
- as
- garbage collection is to
- memory management
- Huge but incomplete help for correct, efficient
software - Analogy should help guide transactions research
25Today, part 1
- Language design, semantics
- Motivation Example the GC analogy OOPSLA07
- Semantics strong vs. weak isolation PLDI07
POPL08 Katherine Moore - Interaction w/ other features ICFP05SCHEME07P
OPL08 - Joint work with Intel PSL
26Weak isolation
initially y0
atomic y 1 x 3 y x
x 2 print(y) //1? 2? 666?
- Widespread misconception
- Weak isolation violates the all-at-once
property only if corresponding lock code has a
race - (May still be a bad thing, but smart people
disagree.)
27Its worse
- Privatization One of several examples where lock
code works and weak-isolation transactions do not
ptr
initially ptr.f ptr.g
sync(lk) r ptr ptr new
C() assert(r.fr.g)
sync(lk) ptr.f ptr.g
f
g
(Example adapted from Rajwar/Larus and Hudson
et al)
28Its worse
- (Almost?) every published weak-isolation system
lets the assertion fail! - Eager-update or lazy-update
ptr
f
g
initially ptr.f ptr.g
atomic r ptr ptr new
C() assert(r.fr.g)
atomic ptr.f ptr.g
29The need for semantics
- Which is wrong the privatization code or the
transactions implementation? - What other gotchas exist?
- What language/coding restrictions suffice to
avoid them? - Can programmers correctly use transactions
without understanding their implementation? - What makes an implementation correct?
- Only rigorous source-level semantics can answer
30What we did
- Formal operational semantics for a collection of
similar languages that have different isolation
properties - Program state allows at most one live
transaction -
- aHe1 en aHe1 en
- Multiple languages, including
31What we did
- Formal operational semantics for a collection of
similar languages that have different isolation
properties - Program state allows at most one live
transaction -
- aHe1 en aHe1 en
- Multiple languages, including
- 1. Strong If one thread is in a transaction,
no other thread may use shared memory or enter a
transaction
32What we did
- Formal operational semantics for a collection of
similar languages that have different isolation
properties - Program state allows at most one live
transaction -
- aHe1 en aHe1 en
- Multiple languages, including
- 2. Weak-1-lock If one thread is in a
transaction, no other thread may enter a
transaction
33What we did
- Formal operational semantics for a collection of
similar languages that have different isolation
properties - Program state allows at most one live
transaction - aHe1 en aHe1 en
- Multiple languages, including
- 3. Weak-undo Like weak, plus a transaction may
abort at any point, undoing its changes and
restarting
34A family
- Now we have a family of languages
- Strong other threads cant use memory
or start
transactions - Weak-1-lock other threads cant start
transactions - Weak-undo like weak, plus undo/restart
- So we can study how family members differ and
conditions under which they are the same - Oh, and we have a kooky, ooky name
The AtomsFamily
35Easy Theorems
- Theorem
- Every program behavior in strong is
- possible in weak-1-lock
- Theorem
- weak-1-lock allows behaviors strong does not
- Theorem
- Every program behavior in weak-1-lock is
- possible in weak-undo
- Theorem (slightly more surprising)
- weak-undo allows behavior weak-1-lock does not
36Hard theorems
- Consider a (formally defined) type system that
ensures any mutable memory is either - Only accessed in transactions
- Only accessed outside transactions
- Theorem If a program type-checks, it has the
same possible behaviors under strong and
weak-1-lock - Theorem If a program type-checks, it has the
same possible behaviors under weak-1-lock and
weak-undo
37A few months in 1 picture
strong-undo
strong
weak-1-lock
weak-undo
38Lesson
- Weak isolation has surprising behavior
- formal semantics lets us model the behavior and
- prove sufficient conditions for avoiding it
- In other words With a (too) restrictive type
system, get semantics of strong and
performance of weak
39Today, part 1
- Language design, semantics
- Motivation Example the GC analogy OOPSLA07
- Semantics strong vs. weak isolation PLDI07
POPL08 - Interaction w/ other features ICFP05SCHEME07P
OPL08 - Joint work with Intel PSL
40What if
- Real languages need precise semantics for all
feature interactions. For example - Native Calls Ringenburg
- Exceptions Ringenburg, Kimball
- First-class continuations Kimball
- Thread-creation Moore
- Java-style class-loading Hindman
- Open Bad interactions with memory-consistency
model - See joint work with Manson and Pugh MSPC06
41One cool ML thing
- To the front-end, atomic is just a first-class
function - So yes, you can pass it around
- Like every other function, it has two run-time
versions - For outside of a transaction (start one)
- For inside of a transaction (just call the thunk)
Thread.atomic (unit -gt a) -gt a
42Today, part 2
- Implementation
- On one core ICFP05 SCHEME07 Michael
Ringenburg, Aaron Kimball - Static optimizations for strong isolation
PLDI07 - Multithreaded transactions
- Joint work with Intel PSL
43Interleaved execution
- The uniprocessor (and then some) assumption
- Threads communicating via shared memory don't
execute in true parallel - Important special case
- Uniprocessors still exist
- Many language implementations assume it
(e.g., OCaml, Scheme48) - Multicore may assign one core to an application
44Implementing atomic
- Key pieces
- Execution of an atomic block logs writes
- If scheduler pre-empts during atomic, rollback
the thread - Duplicate code so non-atomic code is not slowed
by logging
45Logging efficiency
- Keep the log small
- Dont log reads (key uniprocessor advantage)
- Need not log memory allocated after atomic
entered - Particularly initialization writes
- Need not log an address more than once
- To keep logging fast, switch from array to
hashtable after many (50) log entries
46Representing closures/objects
- Representation of closures an interesting
- (and pervasive) design decision
- OCaml
add 3, push,
header
code ptr
free variables
47Representing closures/objects
- Representation of closures an interesting
- (and pervasive) design decision
- AtomCaml
- bigger closures -- and related GC changes
- (unnecessary with bytecodes -- but we did it
anyway)
add 3, push,
add 3, push,
header
code ptr1
free variables
code ptr2
48Representing closures/objects
- Representation of closures an interesting
- (and pervasive) design decision
- AtomCaml alternative
- (slower calls in atomic)
add 3, push,
add 3, push,
code ptr2
header
code ptr1
free variables
49Evaluation
- Strong isolation on uniprocessors at little cost
- See papers for in the noise performance
- Memory-access overhead
-
- Recall initialization writes need not be logged
- Rare rollback
50Lesson
- Implementing transactions in software for a
uniprocessor is so efficient it deserves
special-casing - Note Dont run other multicore services on a
uniprocessor either
51Today, part 2
- Implementation
- On one core ICFP05 SCHEME07
- Static optimizations for strong isolation
PLDI07 - Steven Balensiefer, Benjamin Hindman
- Multithreaded transactions
- Joint work with Intel PSL
52Strong performance problem
- Recall uniprocessor overhead
With parallelism
53Optimizing away strongs cost
Thread local
Not accessed in transaction
Immutable
- New static analysis for not-accessed-in-transacti
on
54Not-accessed-in-transaction
- Revisit overhead of not-in-atomic for strong
isolation, given information about how data is
used in atomic
not in atomic
Yet another client of pointer-analysis
55Analysis details
- Whole-program, context-insensitive,
flow-insensitive - Scalable, but needs whole program
- Can be done before method duplication
- Keep lazy code generation without losing
precision - Given pointer information, just two more passes
- How is an abstract object accessed
transactionally? - What abstract objects might a non-transactional
access use?
56Collaborative effort
- UW static analysis using pointer analysis
- Via Paddle/Soot from McGill
- Intel PSL high-performance STM
- Via compiler and run-time
- Static analysis annotates bytecodes, so the
compiler back-end knows what it can omit
57Benchmarks
Tsp
58Benchmarks
JBB
59Lesson
- The cost of strong isolation is in the
nontransactional code compiler optimizations
help a lot
60Today, part 2
- Implementation
- On one core ICFP05 SCHEME07
- Static optimizations for strong isolation
PLDI07 - Multithreaded transactions Aaron Kimball
- Caveat ongoing work
- Joint work with Intel PSL
61Multithreaded Transactions
- Most implementations (hw or sw) assume code
inside a transaction is single-threaded - But isolation and parallelism are orthogonal
- And Amdahls Law will strike with manycore
- Language design need nested transactions
- Currently modifying Microsofts Bartok STM
- Key correct logging without sacrificing
parallelism - Work perhaps ahead of the technology curve
- like concurrent garbage collection
62Credit
- Semantics Katherine Moore
- Uniprocessor Michael Ringenburg, Aaron Kimball
- Optimizations Steven Balensiefer, Ben Hindman
- Implementing multithreaded transactions Aaron
Kimball - Memory-model issues Jeremy Manson, Bill Pugh
- High-performance strong STM Tatiana Shpeisman,
Vijay Menon, Ali-Reza Adl-Tabatabai,
Richard Hudson, Bratin Saha
wasp.cs.washington.edu
63Please read
- High-Level Small-Step Operational Semantics for
Transactions POPL08 Katherine F. Moore, Dan
Grossman - The Transactional Memory / Garbage Collection
Analogy OOPSLA07 Dan Grossman - Software Transactions Meet First-Class
Continuations SCHEME07 Aaron Kimball, Dan
Grossman - Enforcing Isolation and Ordering in STM
- PLDI07 Tatiana Shpeisman, Vijay Menon,
- Ali-Reza Adl-Tabatabai, Steve
Balensiefer, Dan Grossman, - Richard Hudson, Katherine F. Moore,
Bratin Saha - Atomicity via Source-to-Source Translation
- MSPC06 Benjamin Hindman and Dan Grossman
- What Do High-Level Memory Models Mean for
Transactions? MSPC06 Dan Grossman, Jeremy
Manson, William Pugh - AtomCaml First-Class Atomicity via Rollback
- ICFP05 Michael F. Ringenburg, Dan
Grossman
64Lessons
- Transactions the garbage collection of shared
memory - Semantics prove sufficient conditions for
avoiding weak-isolation anomalies - Must define interaction with features like
exceptions - Uniprocessor implementations are worth
special-casing - Compiler optimizations help remove the overhead
in nontransactional code resulting from strong
isolation - Amdahls Law suggests multithreaded transactions,
which we believe we can make scalable