Title: Optimistic Intra-Transaction Parallelism on Chip-Multiprocessors
1Optimistic Intra-Transaction Parallelism
onChip-Multiprocessors
- Chris Colohan1, Anastassia Ailamaki1,
- J. Gregory Steffan2 and Todd C. Mowry1,3
- 1Carnegie Mellon University
- 2University of Toronto
- 3Intel Research Pittsburgh
2Chip Multiprocessors are Here!
AMD Opteron
IBM Power 5
Intel Yonah
- 2 cores now, soon will have 4, 8, 16, or 32
- Multiple threads per core
- How do we best use them?
3Multi-Core Enhances Throughput
Database Server
Users
Cores can run concurrent transactions and improve
throughput
4Multi-Core Enhances Throughput
Database Server
Users
Can multiple cores improve transaction latency?
5Parallelizing transactions
DBMS
SELECT cust_info FROM customer UPDATE district
WITH order_id INSERT order_id INTO
new_order foreach(item) GET quantity FROM
stock quantity-- UPDATE stock WITH
quantity INSERT item INTO order_line
- Intra-query parallelism
- Used for long-running queries (decision support)
- Does not work for short queries
- Short queries dominate in commercial workloads
6Parallelizing transactions
DBMS
SELECT cust_info FROM customer UPDATE district
WITH order_id INSERT order_id INTO
new_order foreach(item) GET quantity FROM
stock quantity-- UPDATE stock WITH
quantity INSERT item INTO order_line
- Intra-transaction parallelism
- Each thread spans multiple queries
- Hard to add to existing systems!
- Need to change interface, add latches and locks,
worry about correctness of parallel execution
7Parallelizing transactions
DBMS
SELECT cust_info FROM customer UPDATE district
WITH order_id INSERT order_id INTO
new_order foreach(item) GET quantity FROM
stock quantity-- UPDATE stock WITH
quantity INSERT item INTO order_line
- Intra-transaction parallelism
- Breaks transaction into threads
- Hard to add to existing systems!
- Need to change interface, add latches and locks,
worry about correctness of parallel execution
Thread Level Speculation (TLS) makes
parallelization easier.
8Thread Level Speculation (TLS)
p
p
q
q
p
q
Sequential
Parallel
9Thread Level Speculation (TLS)
- Use epochs
- Detect violations
- Restart to recover
- Buffer state
- Worst case
- Sequential
- Best case
- Fully parallel
Epoch 1
Epoch 2
p
Violation!
p
p
R2
q
q
p
q
Sequential
Parallel
Data dependences limit performance.
10A Coordinated Effort
TPC-C
Transactions
DBMS
BerkeleyDB
Hardware
Simulated machine
11A Coordinated Effort
Choose epoch boundaries
TransactionProgrammer
DBMS Programmer
Remove performance bottlenecks
Hardware Developer
Add TLS support to architecture
12So whats new?
- Intra-transaction parallelism
- Without changing the transactions
- With minor changes to the DBMS
- Without having to worry about locking
- Without introducing concurrency bugs
- With good performance
- Halve transaction latency on four cores
13Related Work
- Optimistic Concurrency Control (Kung82)
- Sagas (MolinaSalem87)
- Transaction chopping (Shasha95)
14Outline
- Introduction
- Related work
- Dividing transactions into epochs
- Removing bottlenecks in the DBMS
- Results
- Conclusions
15Case Study New Order (TPC-C)
GET cust_info FROM customer UPDATE district WITH
order_id INSERT order_id INTO
new_order foreach(item) GET quantity FROM
stock WHERE i_iditem UPDATE stock
WITH quantity-1 WHERE i_iditem
INSERT item INTO order_line
- Only dependence is the quantity field
- Very unlikely to occur (1/100,000)
16Case Study New Order (TPC-C)
GET cust_info FROM customer UPDATE district WITH
order_id INSERT order_id INTO
new_order foreach(item) GET quantity FROM
stock WHERE i_iditem UPDATE stock
WITH quantity-1 WHERE i_iditem
INSERT item INTO order_line
GET cust_info FROM customer UPDATE district WITH
order_id INSERT order_id INTO
new_order TLS_foreach(item) GET quantity
FROM stock WHERE i_iditem UPDATE
stock WITH quantity-1 WHERE i_iditem
INSERT item INTO order_line
17Outline
- Introduction
- Related work
- Dividing transactions into epochs
- Removing bottlenecks in the DBMS
- Results
- Conclusions
18Dependences in DBMS
19Dependences in DBMS
- Dependences serialize execution!
- Performance tuning
- Profile execution
- Remove bottleneck dependence
- Repeat
20Buffer Pool Management
CPU
get_page(5)
put_page(5)
Buffer Pool
ref 1
ref 0
21Buffer Pool Management
CPU
get_page(5)
get_page(5)
put_page(5)
put_page(5)
get_page(5)
Buffer Pool
put_page(5)
TLS ensures first epoch gets page first. Who
cares?
ref 0
22Buffer Pool Management
- Escape speculation
- Invoke operation
- Store undo function
- Resume speculation
CPU
get_page(5)
get_page(5)
get_page(5)
put_page(5)
put_page(5)
put_page(5)
get_page(5)
Buffer Pool
put_page(5)
ref 0
23Buffer Pool Management
CPU
get_page(5)
get_page(5)
get_page(5)
put_page(5)
get_page(5)
Buffer Pool
Not undoable!
ref 0
24Buffer Pool Management
CPU
get_page(5)
get_page(5)
get_page(5)
put_page(5)
Buffer Pool
ref 0
- Delay put_page until end of epoch
- Avoid dependence
25Removing Bottleneck Dependences
- We introduce three techniques
- Delay operations until non-speculative
- Mutex and lock acquire and release
- Buffer pool, memory, and cursor release
- Log sequence number assignment
- Escape speculation
- Buffer pool, memory, and cursor allocation
- Traditional parallelization
- Memory allocation, cursor pool, error checks,
false sharing
26Outline
- Introduction
- Related work
- Dividing transactions into epochs
- Removing bottlenecks in the DBMS
- Results
- Conclusions
27Experimental Setup
- Detailed simulation
- Superscalar, out-of-order, 128 entry reorder
buffer - Memory hierarchy modeled in detail
- TPC-C transactions on BerkeleyDB
- In-core database
- Single user
- Single warehouse
- Measure interval of 100 transactions
- Measuring latency not throughput
28Optimizing the DBMS New Order
1.25
26 improvement
1
0.75
Time (normalized)
Other CPUs not helping
0.5
Cant optimize much more
Cache misses increase
0.25
0
Sequential
29Optimizing the DBMS New Order
1.25
1
0.75
Time (normalized)
0.5
0.25
0
This process took me 30 days and lt1200 lines of
code.
Sequential
30Other TPC-C Transactions
1
0.75
Idle CPU
Failed
Time (normalized)
Cache Miss
0.5
Busy
0.25
0
New Order
Delivery
Stock Level
Payment
Order Status
31Conclusions
- A new form of parallelism for databases
- Tool for attacking transaction latency
- Intra-transaction parallelism
- Without major changes to DBMS
- TLS can be applied to more than transactions
- Halve transaction latency by using 4 CPUs
32Any questions?
- For more information, see
- www.colohan.com
33Backup Slides Follow
34TPC-C Transactions on 2 CPUs
1
0.75
Idle CPU
Failed
Time (normalized)
Cache Miss
0.5
Busy
0.25
0
New Order
Delivery
Stock Level
Payment
Order Status
35LATCHES
36Latches
- Mutual exclusion between transactions
- Cause violations between epochs
- Read-test-write cycle ? RAW
- Not needed between epochs
- TLS already provides mutual exclusion!
37Latches Aggressive Acquire
Acquire latch_cnt work latch_cnt--
latch_cnt work (enqueue release)
latch_cnt work (enqueue release)
Commit work latch_cnt--
Commit work latch_cnt-- Release
38Latches Lazy Acquire
Acquire work Release
(enqueue acquire) work (enqueue release)
(enqueue acquire) work (enqueue release)
Acquire Commit work Release
Acquire Commit work Release
39HARDWARE
40TLS in Database Systems
- Large epochs
- More dependences
- Must tolerate
- More state
- Bigger buffers
Non-Database TLS
TLS in Database Systems
41Feedback Loop
for() do_work()
42Violations Feedback
p
Violation!
p
p
R2
q
q
p
q
Sequential
Parallel
43Eliminating Violations
0x0FD8? 0xFD20 0x0FC0? 0xFC18
44Tolerating Violations Sub-epochs
Violation!
q
Sub-epochs
45Sub-epochs
- Started periodically by hardware
- How many?
- When to start?
- Hardware implementation
- Just like epochs
- Use more epoch contexts
- No need to check violations between sub-epochs
within an epoch
Violation!
q
Sub-epochs
46Old TLS Design
Buffer speculative state in write back L1 cache
CPU
CPU
CPU
CPU
L1
L1
L1
L1
Restart by invalidating speculative lines
Invalidation
Detect violations through invalidations
- Problems
- L1 cache not large enough
- Later epochs only get values on commit
L2
Rest of system only sees committed data
Rest of memory system
47New Cache Design
CPU
CPU
CPU
CPU
Speculative writes immediately visible to L2 (and
later epochs)
L1
L1
L1
L1
Restart by invalidating speculative lines
Buffer speculative and non-speculative state for
all epochs in L2
L2
L2
Invalidation
Detect violations at lookup time
Rest of memory system
Invalidation coherence between L2 caches
48New Features
New!
CPU
CPU
CPU
CPU
Speculative state in L1 and L2 cache
L1
L1
L1
L1
Cache line replication (versions)
L2
L2
Data dependence tracking within cache
Speculative victim cache
Rest of memory system
49Scaling
Time (normalized)
50Evaluating a 4-CPU system
Parallelized benchmark run on 1 CPU
Original benchmark run on 1 CPU
Without sub-epoch support
1
0.75
Parallel execution
Time (normalized)
0.5
Ignore violations (Amdahls Law limit)
0.25
0
TLS Seq
Baseline
Sequential
No Sub-epoch
No Speculation
51Sub-epochs How many/How big?
- Supporting more sub-epochs is better
- Spacing depends on location of violations
- Even spacing is good enough
52Query Execution
- Actions taken by a query
- Bring pages into buffer pool
- Acquire and release latches locks
- Allocate/free memory
- Allocate/free and use cursors
- Use B-trees
- Generate log entries
These generate violations.
53Applying TLS
- Parallelize loop
- Run benchmark
- Remove bottleneck
- Go to 2
54Outline
TransactionProgrammer
DBMS Programmer
Hardware Developer
55TLS Execution
p
Violation!
p
p
R2
q
s
t
56TLS Execution
p
Violation!
p
p
R2
q
s
t
57TLS Execution
p
Violation!
p
p
R2
q
58TLS Execution
p
Violation!
p
p
R2
q
q
59TLS Execution
p
Violation!
p
p
R2
q
q
60Replication
p
Violation!
p
p
R2
q
q
q
Cant invalidate line if it contains two epochs
changes
61Replication
p
Violation!
p
p
R2
q
q
q
q
62Replication
p
Violation!
p
p
R2
q
q
q
q
- Makes epochs independent
- Enables sub-epochs
63Sub-epochs
p
1a
q
p
p
1b
q
q
p
1c
q
q
1d
p
- Uses more epoch contexts
- Detection/buffering/rewind is free
- More replication
- Speculative victim cache
64get_page() wrapper
- page_t get_page_wrapper(pageid_t id)
- static tls_mutex mut
- page_t ret
- tls_escape_speculation()
- check_get_arguments(id)
- tls_acquire_mutex(mut)
- ret get_page(id)
- tls_release_mutex(mut)
- tls_on_violation(put, ret)
- tls_resume_speculation()
- return ret
? Wraps get_page()
65get_page() wrapper
- page_t get_page_wrapper(pageid_t id)
- static tls_mutex mut
- page_t ret
- tls_escape_speculation()
- check_get_arguments(id)
- tls_acquire_mutex(mut)
- ret get_page(id)
- tls_release_mutex(mut)
- tls_on_violation(put, ret)
- tls_resume_speculation()
- return ret
? No violations while calling get_page()
66get_page() wrapper
- page_t get_page_wrapper(pageid_t id)
- static tls_mutex mut
- page_t ret
- tls_escape_speculation()
- check_get_arguments(id)
- tls_acquire_mutex(mut)
- ret get_page(id)
- tls_release_mutex(mut)
- tls_on_violation(put, ret)
- tls_resume_speculation()
- return ret
? May get bad input data from speculative thread!
67get_page() wrapper
- page_t get_page_wrapper(pageid_t id)
- static tls_mutex mut
- page_t ret
- tls_escape_speculation()
- check_get_arguments(id)
- tls_acquire_mutex(mut)
- ret get_page(id)
- tls_release_mutex(mut)
- tls_on_violation(put, ret)
- tls_resume_speculation()
- return ret
? Only one epoch per transaction at a time
68get_page() wrapper
- page_t get_page_wrapper(pageid_t id)
- static tls_mutex mut
- page_t ret
- tls_escape_speculation()
- check_get_arguments(id)
- tls_acquire_mutex(mut)
- ret get_page(id)
- tls_release_mutex(mut)
- tls_on_violation(put, ret)
- tls_resume_speculation()
- return ret
? How to undo get_page()
69get_page() wrapper
- Isolated
- Undoing this operation does not cause cascading
aborts - Undoable
- Easy way to return system to initial state
- Can also be used for
- Cursor management
- malloc()
- page_t get_page_wrapper(pageid_t id)
- static tls_mutex mut
- page_t ret
- tls_escape_speculation()
- check_get_arguments(id)
- tls_acquire_mutex(mut)
- ret get_page(id)
- tls_release_mutex(mut)
- tls_on_violation(put, ret)
- tls_resume_speculation()
- return ret
70Sequential Btree Inserts
4
free
free
1
4
3
2
item
free
item
item
item
item
item
item
free
free
item
item
item
free
free
free
free
free
free
free
free
free
free
free