Software Transactions: A ProgrammingLanguages Perspective

About This Presentation

Title:

Software Transactions: A ProgrammingLanguages Perspective

Description:

Performance (optimistic 'no conflict' without locks) Research should be guiding: ... The essence of the advantage over locks. Language design: Rigorous high ... – PowerPoint PPT presentation

Number of Views:47

Avg rating:3.0/5.0

Slides: 65

Provided by: dangro

Learn more at: https://homes.cs.washington.edu

Category:

more less

Transcript and Presenter's Notes

Title: Software Transactions: A ProgrammingLanguages Perspective

1
Software Transactions A Programming-Languages
Perspective

Dan Grossman
University of Washington
27 March 2008

2
Atomic

An easier-to-use and harder-to-implement primitive

void deposit(int x) synchronized(this) int
tmp balance tmp x balance tmp
void deposit(int x) atomic int tmp
balance tmp x balance tmp
lock acquire/release
(behave as if) no interleaved computation
3
Viewpoints

Software transactions good for
Software engineering (avoid races deadlocks)
Performance (optimistic no conflict without
locks)
Research should be guiding
New hardware with transactional support
Software support
Semantic mismatch between language hardware
Prediction hardware for the common/simple case
May be fast enough without hardware
Lots of nontransactional hardware exists

4
PL Perspective

Complementary to lower-level implementation work
Motivation
The essence of the advantage over locks
Language design
Rigorous high-level semantics
Interaction with rest of the language
Language implementation
Interaction with modern compilers
New optimization needs
Answers urgently needed for the multicore era

5
Today, part 1

Language design, semantics
Motivation Example the GC analogy OOPSLA07
Semantics strong vs. weak isolation PLDI07
POPL08
Interaction w/ other features ICFP05SCHEME07P
OPL08
Joint work with Intel PSL

6
Today, part 2

Implementation
On one core ICFP05SCHEME07
Static optimizations for strong isolation
PLDI07
Multithreaded transactions
Joint work with Intel PSL

7
Code evolution
void deposit() synchronized(this) void
withdraw() synchronized(this) int
balance() synchronized(this)
8
Code evolution
void deposit() synchronized(this) void
withdraw() synchronized(this) int
balance() synchronized(this) void
transfer(Acct from, int amt)
if(from.balance()gtamt amt lt maxXfer)
from.withdraw(amt) this.deposit(amt)

9
Code evolution
void deposit() synchronized(this) void
withdraw() synchronized(this) int
balance() synchronized(this) void
transfer(Acct from, int amt)
synchronized(this) //race
if(from.balance()gtamt amt lt maxXfer)
from.withdraw(amt) this.deposit(amt)

10
Code evolution
void deposit() synchronized(this) void
withdraw() synchronized(this) int
balance() synchronized(this) void
transfer(Acct from, int amt)
synchronized(this) synchronized(from)
//deadlock (still) if(from.balance()gtamt
amt lt maxXfer) from.withdraw(amt)
this.deposit(amt)
11
Code evolution
void deposit() atomic void withdraw()
atomic int balance() atomic
12
Code evolution
void deposit() atomic void withdraw()
atomic int balance() atomic
void transfer(Acct from, int amt)
//race if(from.balance()gtamt amt lt
maxXfer) from.withdraw(amt)
this.deposit(amt)
13
Code evolution
void deposit() atomic void withdraw()
atomic int balance() atomic
void transfer(Acct from, int amt) atomic
//correct and parallelism-preserving!
if(from.balance()gtamt amt lt maxXfer)
from.withdraw(amt) this.deposit(amt)

14
But can we generalize

So transactions sure looks appealing
But what is the essence of the benefit?

Transactional Memory (TM) is to shared-memory
concurrency as Garbage Collection (GC) is to
memory management
15
GC in 60 seconds

Allocate objects in the heap
Deallocate objects to reuse heap space
If too soon, dangling-pointer dereferences
If too late, poor performance / space exhaustion

Automate deallocation via reachability
approximation

16
GC Bottom-line

Established technology with widely accepted
benefits
Even though it can perform arbitrarily badly in
theory
Even though you cant always ignore how GC works
(at a high-level)
Even though an active research area after 40
years
Now about that analogy

17
The problem, part 1

Why memory management is hard
Balance correctness (avoid dangling pointers)
And performance (space waste or exhaustion)
Manual approaches require whole-program protocols
Example Manual reference count for each object
Must avoid garbage cycles

18
The problem, part 2

Manual memory-management is non-modular
Caller and callee must know what each other
access or deallocate to ensure right memory is
live
A small change can require wide-scale code
changes
Correctness requires knowing what data subsequent
computation will access

19
The solution

Move whole-program protocol to language
implementation
One-size-fits-most implemented by experts
Usually inside the compiler and run-time
GC system uses subtle invariants, e.g.
Object header-word bits
No unknown mature pointers to nursery objects

20
So far
21
Incomplete solution

GC a bad idea when reachable is a bad
approximation of cannot-be-deallocated
Weak pointers overcome this fundamental
limitation
Best used by experts for well-recognized idioms
(e.g., software caches)
In extreme, programmers can encode
manual memory management on top of GC
Destroys most of GCs advantages

22
Circumventing TM
class SpinLock private boolean b false
void acquire() while(true) atomic
if(b) continue b true
return void release()
atomic b false
23
It really keeps going (see the essay)
24
Lesson

Transactional memory is to
shared-memory concurrency
as
garbage collection is to
memory management
Huge but incomplete help for correct, efficient
software
Analogy should help guide transactions research

25
Today, part 1

Language design, semantics
Motivation Example the GC analogy OOPSLA07
Semantics strong vs. weak isolation PLDI07
POPL08 Katherine Moore
Interaction w/ other features ICFP05SCHEME07P
OPL08
Joint work with Intel PSL

26
Weak isolation
initially y0
atomic y 1 x 3 y x
x 2 print(y) //1? 2? 666?

Widespread misconception
Weak isolation violates the all-at-once
property only if corresponding lock code has a
race
(May still be a bad thing, but smart people
disagree.)

27
Its worse

Privatization One of several examples where lock
code works and weak-isolation transactions do not

ptr
initially ptr.f ptr.g
sync(lk) r ptr ptr new
C() assert(r.fr.g)
sync(lk) ptr.f ptr.g
f
g
(Example adapted from Rajwar/Larus and Hudson
et al)
28
Its worse

(Almost?) every published weak-isolation system
lets the assertion fail!
Eager-update or lazy-update

ptr
f
g
initially ptr.f ptr.g
atomic r ptr ptr new
C() assert(r.fr.g)
atomic ptr.f ptr.g
29
The need for semantics

Which is wrong the privatization code or the
transactions implementation?
What other gotchas exist?
What language/coding restrictions suffice to
avoid them?
Can programmers correctly use transactions
without understanding their implementation?
What makes an implementation correct?
Only rigorous source-level semantics can answer

30
What we did

Formal operational semantics for a collection of
similar languages that have different isolation
properties
Program state allows at most one live
transaction
aHe1 en aHe1 en
Multiple languages, including

31
What we did

Formal operational semantics for a collection of
similar languages that have different isolation
properties
Program state allows at most one live
transaction
aHe1 en aHe1 en
Multiple languages, including
1. Strong If one thread is in a transaction,
no other thread may use shared memory or enter a
transaction

32
What we did

Formal operational semantics for a collection of
similar languages that have different isolation
properties
Program state allows at most one live
transaction
aHe1 en aHe1 en
Multiple languages, including
2. Weak-1-lock If one thread is in a
transaction, no other thread may enter a
transaction

33
What we did

Formal operational semantics for a collection of
similar languages that have different isolation
properties
Program state allows at most one live
transaction
aHe1 en aHe1 en
Multiple languages, including
3. Weak-undo Like weak, plus a transaction may
abort at any point, undoing its changes and
restarting

34
A family

Now we have a family of languages
Strong other threads cant use memory
or start
transactions
Weak-1-lock other threads cant start
transactions
Weak-undo like weak, plus undo/restart
So we can study how family members differ and
conditions under which they are the same
Oh, and we have a kooky, ooky name

The AtomsFamily
35
Easy Theorems

Theorem
Every program behavior in strong is
possible in weak-1-lock
Theorem
weak-1-lock allows behaviors strong does not
Theorem
Every program behavior in weak-1-lock is
possible in weak-undo
Theorem (slightly more surprising)
weak-undo allows behavior weak-1-lock does not

36
Hard theorems

Consider a (formally defined) type system that
ensures any mutable memory is either
Only accessed in transactions
Only accessed outside transactions
Theorem If a program type-checks, it has the
same possible behaviors under strong and
weak-1-lock
Theorem If a program type-checks, it has the
same possible behaviors under weak-1-lock and
weak-undo

37
A few months in 1 picture
strong-undo
strong
weak-1-lock
weak-undo
38
Lesson

Weak isolation has surprising behavior
formal semantics lets us model the behavior and
prove sufficient conditions for avoiding it
In other words With a (too) restrictive type
system, get semantics of strong and
performance of weak

39
Today, part 1

Language design, semantics
Motivation Example the GC analogy OOPSLA07
Semantics strong vs. weak isolation PLDI07
POPL08
Interaction w/ other features ICFP05SCHEME07P
OPL08
Joint work with Intel PSL

40
What if

Real languages need precise semantics for all
feature interactions. For example
Native Calls Ringenburg
Exceptions Ringenburg, Kimball
First-class continuations Kimball
Thread-creation Moore
Java-style class-loading Hindman
Open Bad interactions with memory-consistency
model
See joint work with Manson and Pugh MSPC06

41
One cool ML thing

To the front-end, atomic is just a first-class
function
So yes, you can pass it around
Like every other function, it has two run-time
versions
For outside of a transaction (start one)
For inside of a transaction (just call the thunk)

Thread.atomic (unit -gt a) -gt a
42
Today, part 2

Implementation
On one core ICFP05 SCHEME07 Michael
Ringenburg, Aaron Kimball
Static optimizations for strong isolation
PLDI07
Multithreaded transactions
Joint work with Intel PSL

43
Interleaved execution

The uniprocessor (and then some) assumption
Threads communicating via shared memory don't
execute in true parallel
Important special case
Uniprocessors still exist
Many language implementations assume it
(e.g., OCaml, Scheme48)
Multicore may assign one core to an application

44
Implementing atomic

Key pieces
Execution of an atomic block logs writes
If scheduler pre-empts during atomic, rollback
the thread
Duplicate code so non-atomic code is not slowed
by logging

45
Logging efficiency

Keep the log small
Dont log reads (key uniprocessor advantage)
Need not log memory allocated after atomic
entered
Particularly initialization writes
Need not log an address more than once
To keep logging fast, switch from array to
hashtable after many (50) log entries

46
Representing closures/objects

Representation of closures an interesting
(and pervasive) design decision
OCaml

add 3, push,
header
code ptr
free variables
47
Representing closures/objects

Representation of closures an interesting
(and pervasive) design decision
AtomCaml
bigger closures -- and related GC changes
(unnecessary with bytecodes -- but we did it
anyway)

add 3, push,
add 3, push,
header
code ptr1
free variables
code ptr2
48
Representing closures/objects

Representation of closures an interesting
(and pervasive) design decision
AtomCaml alternative
(slower calls in atomic)

add 3, push,
add 3, push,
code ptr2
header
code ptr1
free variables
49
Evaluation

Strong isolation on uniprocessors at little cost
See papers for in the noise performance
Memory-access overhead
Recall initialization writes need not be logged
Rare rollback

50
Lesson

Implementing transactions in software for a
uniprocessor is so efficient it deserves
special-casing
Note Dont run other multicore services on a
uniprocessor either

51
Today, part 2

Implementation
On one core ICFP05 SCHEME07
Static optimizations for strong isolation
PLDI07
Steven Balensiefer, Benjamin Hindman
Multithreaded transactions
Joint work with Intel PSL

52
Strong performance problem

Recall uniprocessor overhead

With parallelism
53
Optimizing away strongs cost
Thread local
Not accessed in transaction
Immutable

New static analysis for not-accessed-in-transacti
on

54
Not-accessed-in-transaction

Revisit overhead of not-in-atomic for strong
isolation, given information about how data is
used in atomic

not in atomic
Yet another client of pointer-analysis
55
Analysis details

Whole-program, context-insensitive,
flow-insensitive
Scalable, but needs whole program
Can be done before method duplication
Keep lazy code generation without losing
precision
Given pointer information, just two more passes
How is an abstract object accessed
transactionally?
What abstract objects might a non-transactional
access use?

56
Collaborative effort

UW static analysis using pointer analysis
Via Paddle/Soot from McGill
Intel PSL high-performance STM
Via compiler and run-time
Static analysis annotates bytecodes, so the
compiler back-end knows what it can omit

57
Benchmarks
Tsp
58
Benchmarks
JBB
59
Lesson

The cost of strong isolation is in the
nontransactional code compiler optimizations
help a lot

60
Today, part 2

Implementation
On one core ICFP05 SCHEME07
Static optimizations for strong isolation
PLDI07
Multithreaded transactions Aaron Kimball
Caveat ongoing work
Joint work with Intel PSL

61
Multithreaded Transactions

Most implementations (hw or sw) assume code
inside a transaction is single-threaded
But isolation and parallelism are orthogonal
And Amdahls Law will strike with manycore
Language design need nested transactions
Currently modifying Microsofts Bartok STM
Key correct logging without sacrificing
parallelism
Work perhaps ahead of the technology curve
like concurrent garbage collection

62
Credit

Semantics Katherine Moore
Uniprocessor Michael Ringenburg, Aaron Kimball
Optimizations Steven Balensiefer, Ben Hindman
Implementing multithreaded transactions Aaron
Kimball
Memory-model issues Jeremy Manson, Bill Pugh
High-performance strong STM Tatiana Shpeisman,
Vijay Menon, Ali-Reza Adl-Tabatabai,
Richard Hudson, Bratin Saha

wasp.cs.washington.edu
63
Please read

High-Level Small-Step Operational Semantics for
Transactions POPL08 Katherine F. Moore, Dan
Grossman
The Transactional Memory / Garbage Collection
Analogy OOPSLA07 Dan Grossman
Software Transactions Meet First-Class
Continuations SCHEME07 Aaron Kimball, Dan
Grossman
Enforcing Isolation and Ordering in STM
PLDI07 Tatiana Shpeisman, Vijay Menon,
Ali-Reza Adl-Tabatabai, Steve
Balensiefer, Dan Grossman,
Richard Hudson, Katherine F. Moore,
Bratin Saha
Atomicity via Source-to-Source Translation
MSPC06 Benjamin Hindman and Dan Grossman
What Do High-Level Memory Models Mean for
Transactions? MSPC06 Dan Grossman, Jeremy
Manson, William Pugh
AtomCaml First-Class Atomicity via Rollback
ICFP05 Michael F. Ringenburg, Dan
Grossman

64
Lessons

Transactions the garbage collection of shared
memory
Semantics prove sufficient conditions for
avoiding weak-isolation anomalies
Must define interaction with features like
exceptions
Uniprocessor implementations are worth
special-casing
Compiler optimizations help remove the overhead
in nontransactional code resulting from strong
isolation
Amdahls Law suggests multithreaded transactions,
which we believe we can make scalable

Write a Comment

User Comments (0)

About PowerShow.com

Software Transactions: A ProgrammingLanguages Perspective - PowerPoint PPT Presentation

Software Transactions: A ProgrammingLanguages Perspective

Performance (optimistic 'no conflict' without locks) Research should be guiding: ... The essence of the advantage over locks. Language design: Rigorous high ... – PowerPoint PPT presentation