Optimistic Intra-Transaction Parallelism on Chip-Multiprocessors - PowerPoint PPT Presentation

1 / 70

About This Presentation

Title:

Optimistic Intra-Transaction Parallelism on Chip-Multiprocessors

Description:

Optimistic Intra-Transaction Parallelism on. Chip-Multiprocessors. Chris Colohan1, ... Transaction chopping (Shasha95) 14. Outline. Introduction. Related work ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 71

Provided by: vldb

Category:

more less

Transcript and Presenter's Notes

Title: Optimistic Intra-Transaction Parallelism on Chip-Multiprocessors

1
Optimistic Intra-Transaction Parallelism
onChip-Multiprocessors

Chris Colohan1, Anastassia Ailamaki1,
J. Gregory Steffan2 and Todd C. Mowry1,3
1Carnegie Mellon University
2University of Toronto
3Intel Research Pittsburgh

2
Chip Multiprocessors are Here!
AMD Opteron
IBM Power 5
Intel Yonah

2 cores now, soon will have 4, 8, 16, or 32
Multiple threads per core
How do we best use them?

3
Multi-Core Enhances Throughput
Database Server
Users
Cores can run concurrent transactions and improve
throughput
4
Multi-Core Enhances Throughput
Database Server
Users
Can multiple cores improve transaction latency?
5
Parallelizing transactions
DBMS
SELECT cust_info FROM customer UPDATE district
WITH order_id INSERT order_id INTO
new_order foreach(item) GET quantity FROM
stock quantity-- UPDATE stock WITH
quantity INSERT item INTO order_line

Intra-query parallelism
Used for long-running queries (decision support)
Does not work for short queries
Short queries dominate in commercial workloads

6
Parallelizing transactions
DBMS
SELECT cust_info FROM customer UPDATE district
WITH order_id INSERT order_id INTO
new_order foreach(item) GET quantity FROM
stock quantity-- UPDATE stock WITH
quantity INSERT item INTO order_line

Intra-transaction parallelism
Each thread spans multiple queries
Hard to add to existing systems!
Need to change interface, add latches and locks,
worry about correctness of parallel execution

7
Parallelizing transactions
DBMS
SELECT cust_info FROM customer UPDATE district
WITH order_id INSERT order_id INTO
new_order foreach(item) GET quantity FROM
stock quantity-- UPDATE stock WITH
quantity INSERT item INTO order_line

Intra-transaction parallelism
Breaks transaction into threads
Hard to add to existing systems!
Need to change interface, add latches and locks,
worry about correctness of parallel execution

Thread Level Speculation (TLS) makes
parallelization easier.
8
Thread Level Speculation (TLS)
p
p
q
q
p
q
Sequential
Parallel
9
Thread Level Speculation (TLS)

Use epochs
Detect violations
Restart to recover
Buffer state
Worst case
Sequential
Best case
Fully parallel

Epoch 1
Epoch 2
p
Violation!
p
p
R2
q
q
p
q
Sequential
Parallel
Data dependences limit performance.
10
A Coordinated Effort
TPC-C
Transactions
DBMS
BerkeleyDB
Hardware
Simulated machine
11
A Coordinated Effort
Choose epoch boundaries
TransactionProgrammer
DBMS Programmer
Remove performance bottlenecks
Hardware Developer
Add TLS support to architecture
12
So whats new?

Intra-transaction parallelism
Without changing the transactions
With minor changes to the DBMS
Without having to worry about locking
Without introducing concurrency bugs
With good performance
Halve transaction latency on four cores

13
Related Work

Optimistic Concurrency Control (Kung82)
Sagas (MolinaSalem87)
Transaction chopping (Shasha95)

14
Outline

Introduction
Related work
Dividing transactions into epochs
Removing bottlenecks in the DBMS
Results
Conclusions

15
Case Study New Order (TPC-C)
GET cust_info FROM customer UPDATE district WITH
order_id INSERT order_id INTO
new_order foreach(item) GET quantity FROM
stock WHERE i_iditem UPDATE stock
WITH quantity-1 WHERE i_iditem
INSERT item INTO order_line

Only dependence is the quantity field
Very unlikely to occur (1/100,000)

16
Case Study New Order (TPC-C)
GET cust_info FROM customer UPDATE district WITH
order_id INSERT order_id INTO
new_order foreach(item) GET quantity FROM
stock WHERE i_iditem UPDATE stock
WITH quantity-1 WHERE i_iditem
INSERT item INTO order_line
GET cust_info FROM customer UPDATE district WITH
order_id INSERT order_id INTO
new_order TLS_foreach(item) GET quantity
FROM stock WHERE i_iditem UPDATE
stock WITH quantity-1 WHERE i_iditem
INSERT item INTO order_line
17
Outline

Introduction
Related work
Dividing transactions into epochs
Removing bottlenecks in the DBMS
Results
Conclusions

18
Dependences in DBMS
19
Dependences in DBMS

Dependences serialize execution!
Performance tuning
Profile execution
Remove bottleneck dependence
Repeat

20
Buffer Pool Management
CPU
get_page(5)
put_page(5)
Buffer Pool
ref 1
ref 0
21
Buffer Pool Management
CPU
get_page(5)
get_page(5)
put_page(5)
put_page(5)
get_page(5)
Buffer Pool
put_page(5)
TLS ensures first epoch gets page first. Who
cares?
ref 0
22
Buffer Pool Management

Escape speculation
Invoke operation
Store undo function
Resume speculation

CPU
get_page(5)
get_page(5)
get_page(5)
put_page(5)
put_page(5)
put_page(5)
get_page(5)
Buffer Pool
put_page(5)
ref 0
23
Buffer Pool Management
CPU
get_page(5)
get_page(5)
get_page(5)
put_page(5)
get_page(5)
Buffer Pool
Not undoable!
ref 0
24
Buffer Pool Management
CPU
get_page(5)
get_page(5)
get_page(5)
put_page(5)
Buffer Pool
ref 0

Delay put_page until end of epoch
Avoid dependence

25
Removing Bottleneck Dependences

We introduce three techniques
Delay operations until non-speculative
Mutex and lock acquire and release
Buffer pool, memory, and cursor release
Log sequence number assignment
Escape speculation
Buffer pool, memory, and cursor allocation
Traditional parallelization
Memory allocation, cursor pool, error checks,
false sharing

26
Outline

Introduction
Related work
Dividing transactions into epochs
Removing bottlenecks in the DBMS
Results
Conclusions

27
Experimental Setup

Detailed simulation
Superscalar, out-of-order, 128 entry reorder
buffer
Memory hierarchy modeled in detail
TPC-C transactions on BerkeleyDB
In-core database
Single user
Single warehouse
Measure interval of 100 transactions
Measuring latency not throughput

28
Optimizing the DBMS New Order
1.25
26 improvement
1
0.75
Time (normalized)
Other CPUs not helping
0.5
Cant optimize much more
Cache misses increase
0.25
0
Sequential
29
Optimizing the DBMS New Order
1.25
1
0.75
Time (normalized)
0.5
0.25
0
This process took me 30 days and lt1200 lines of
code.
Sequential
30
Other TPC-C Transactions
1
0.75
Idle CPU
Failed
Time (normalized)
Cache Miss
0.5
Busy
0.25
0
New Order
Delivery
Stock Level
Payment
Order Status
31
Conclusions

A new form of parallelism for databases
Tool for attacking transaction latency
Intra-transaction parallelism
Without major changes to DBMS
TLS can be applied to more than transactions
Halve transaction latency by using 4 CPUs

32
Any questions?

For more information, see
www.colohan.com

33
Backup Slides Follow
34
TPC-C Transactions on 2 CPUs
1
0.75
Idle CPU
Failed
Time (normalized)
Cache Miss
0.5
Busy
0.25
0
New Order
Delivery
Stock Level
Payment
Order Status
35
LATCHES
36
Latches

Mutual exclusion between transactions
Cause violations between epochs
Read-test-write cycle ? RAW
Not needed between epochs
TLS already provides mutual exclusion!

37
Latches Aggressive Acquire
Acquire latch_cnt work latch_cnt--
latch_cnt work (enqueue release)
latch_cnt work (enqueue release)
Commit work latch_cnt--
Commit work latch_cnt-- Release
38
Latches Lazy Acquire
Acquire work Release
(enqueue acquire) work (enqueue release)
(enqueue acquire) work (enqueue release)
Acquire Commit work Release
Acquire Commit work Release
39
HARDWARE
40
TLS in Database Systems

Large epochs
More dependences
Must tolerate
More state
Bigger buffers

Non-Database TLS
TLS in Database Systems
41
Feedback Loop
for() do_work()
42
Violations Feedback
p
Violation!
p
p
R2
q
q
p
q
Sequential
Parallel
43
Eliminating Violations
0x0FD8? 0xFD20 0x0FC0? 0xFC18
44
Tolerating Violations Sub-epochs
Violation!
q
Sub-epochs
45
Sub-epochs

Started periodically by hardware
How many?
When to start?
Hardware implementation
Just like epochs
Use more epoch contexts
No need to check violations between sub-epochs
within an epoch

Violation!
q
Sub-epochs
46
Old TLS Design
Buffer speculative state in write back L1 cache
CPU
CPU
CPU
CPU
L1
L1
L1
L1
Restart by invalidating speculative lines
Invalidation
Detect violations through invalidations

Problems
L1 cache not large enough
Later epochs only get values on commit

L2
Rest of system only sees committed data
Rest of memory system
47
New Cache Design
CPU
CPU
CPU
CPU
Speculative writes immediately visible to L2 (and
later epochs)
L1
L1
L1
L1
Restart by invalidating speculative lines
Buffer speculative and non-speculative state for
all epochs in L2
L2
L2
Invalidation
Detect violations at lookup time
Rest of memory system
Invalidation coherence between L2 caches
48
New Features
New!
CPU
CPU
CPU
CPU
Speculative state in L1 and L2 cache
L1
L1
L1
L1
Cache line replication (versions)
L2
L2
Data dependence tracking within cache
Speculative victim cache
Rest of memory system
49
Scaling
Time (normalized)
50
Evaluating a 4-CPU system
Parallelized benchmark run on 1 CPU
Original benchmark run on 1 CPU
Without sub-epoch support
1
0.75
Parallel execution
Time (normalized)
0.5
Ignore violations (Amdahls Law limit)
0.25
0
TLS Seq
Baseline
Sequential
No Sub-epoch
No Speculation
51
Sub-epochs How many/How big?

Supporting more sub-epochs is better
Spacing depends on location of violations
Even spacing is good enough

52
Query Execution

Actions taken by a query
Bring pages into buffer pool
Acquire and release latches locks
Allocate/free memory
Allocate/free and use cursors
Use B-trees
Generate log entries

These generate violations.
53
Applying TLS

Parallelize loop
Run benchmark
Remove bottleneck
Go to 2

54
Outline
TransactionProgrammer
DBMS Programmer
Hardware Developer
55
TLS Execution
p
Violation!
p
p
R2
q
s
t
56
TLS Execution
p
Violation!
p
p
R2
q
s
t
57
TLS Execution
p
Violation!
p
p
R2
q
58
TLS Execution
p
Violation!
p
p
R2
q
q
59
TLS Execution
p
Violation!
p
p
R2
q
q
60
Replication
p
Violation!
p
p
R2
q
q
q
Cant invalidate line if it contains two epochs
changes
61
Replication
p
Violation!
p
p
R2
q
q
q
q
62
Replication
p
Violation!
p
p
R2
q
q
q
q

Makes epochs independent
Enables sub-epochs

63
Sub-epochs
p
1a

q
p
p
1b

q
q
p
1c

q

q
1d
p

Uses more epoch contexts
Detection/buffering/rewind is free
More replication
Speculative victim cache

64
get_page() wrapper

page_t get_page_wrapper(pageid_t id)
static tls_mutex mut
page_t ret
tls_escape_speculation()
check_get_arguments(id)
tls_acquire_mutex(mut)
ret get_page(id)
tls_release_mutex(mut)
tls_on_violation(put, ret)
tls_resume_speculation()
return ret

? Wraps get_page()
65
get_page() wrapper

page_t get_page_wrapper(pageid_t id)
static tls_mutex mut
page_t ret
tls_escape_speculation()
check_get_arguments(id)
tls_acquire_mutex(mut)
ret get_page(id)
tls_release_mutex(mut)
tls_on_violation(put, ret)
tls_resume_speculation()
return ret

? No violations while calling get_page()
66
get_page() wrapper

page_t get_page_wrapper(pageid_t id)
static tls_mutex mut
page_t ret
tls_escape_speculation()
check_get_arguments(id)
tls_acquire_mutex(mut)
ret get_page(id)
tls_release_mutex(mut)
tls_on_violation(put, ret)
tls_resume_speculation()
return ret

? May get bad input data from speculative thread!
67
get_page() wrapper

page_t get_page_wrapper(pageid_t id)
static tls_mutex mut
page_t ret
tls_escape_speculation()
check_get_arguments(id)
tls_acquire_mutex(mut)
ret get_page(id)
tls_release_mutex(mut)
tls_on_violation(put, ret)
tls_resume_speculation()
return ret

? Only one epoch per transaction at a time
68
get_page() wrapper

page_t get_page_wrapper(pageid_t id)
static tls_mutex mut
page_t ret
tls_escape_speculation()
check_get_arguments(id)
tls_acquire_mutex(mut)
ret get_page(id)
tls_release_mutex(mut)
tls_on_violation(put, ret)
tls_resume_speculation()
return ret

? How to undo get_page()
69
get_page() wrapper

Isolated
Undoing this operation does not cause cascading
aborts
Undoable
Easy way to return system to initial state
Can also be used for
Cursor management
malloc()

page_t get_page_wrapper(pageid_t id)
static tls_mutex mut
page_t ret
tls_escape_speculation()
check_get_arguments(id)
tls_acquire_mutex(mut)
ret get_page(id)
tls_release_mutex(mut)
tls_on_violation(put, ret)
tls_resume_speculation()
return ret

70
Sequential Btree Inserts
4
free
free
1
4
3
2
item
free
item
item
item
item
item
item
free
free
item
item
item
free
free
free
free
free
free
free
free
free
free
free

Write a Comment

User Comments (0)