Title: Tashkent: Uniting Durability
1Tashkent Uniting Durability Ordering in
Replicated Databases
Sameh Elnikety, EPFL Steven Dropsho, EPFL Fernando Pedone, USI
2Write-Many Replicated Database
- All replicas agree on
- which update tx commit
- their commit order
- Total order
- Determined by middleware
- Followed by each replica
Replica 1
Tx A
durability
Replica 2
Tx B
durability
Replica 3
durability
separation ?
3Order Determined Outside DB
Replica 1
Tx A
Tx A
durability
A ? B
A ? B
Tx B
Replication MW (global ordering)
Replica 2
Tx B
durability
A ? B
A ? B
A ? B
Replica 3
A ? B
durability
A ? B
One Replica ?
4Enforce External Commit Order
Middleware Commitorder A ? B
Replica
durability
B
Cannot commit A B concurrently!
Must serialize ?
5Enforce Order Serial Commit
Middleware Commitorder A ? B
Replica
durability
A
Serialization slow ?
6Commit Serialization is Slow
Middleware order A ? B ? C
Commit orderA ? B ? C
Proxy
Ack B
Ack A
Ack C
Root cause Durability ordering
separated ? serial disk writes
Database
Commit A
Commit B
Commit C
DurabilityA
CPU
DurabilityA ? B
CPU
DurabilityA ? B ? C
CPU
durability
Solutions ?
7Solution Unite Durability Ordering
1-Pass order info to DB
2-Move durability to MW
Middleware (ordering)
Middleware (ordering)
Replica
Replica
durability
durability OFF
order
durability
Replica
Replica
durability OFF
durability
order
Unite in DB ?
81- Unite Dur. Ord. in Database
Middleware order A ? B ? C
Commit orderA ? B ? C
Proxy
Ack AAck B Ack C
Commit A at 1 Commit B at 2 Commit C at 3
Database
order
CPU
DurabilityA ? B ? C
durability
Solution 1 pass order info to DB
Durability ordering in database ? group commit
Solutions ?
9Solution Unite Durability Ordering
1-Pass order info to DB
2-Move durability to MW
Middleware (ordering)
Middleware (ordering)
Replica
Replica
durability
durability OFF
order
durability
Replica
Replica
durability OFF
durability
order
Unite in DB ?
102- Unite D. O. in Middleware
Middleware order A ? B ? C
DurabilityA ? B ? C
Commit orderA ? B ? C
durability
Proxy
Ack A
Ack B
Ack C
Database
Commit A
Commit B
Commit C
CPU
CPU
CPU
durability OFF
Solution 2 move durability to MW
Durability ordering in middleware ? group
commit
Roadmap ?
11Roadmap
- Durability ordering
- Separated ? serial commit ? slow
- United ? group commit ? fast
- Two Implementations
- Tashkent-API united in DB
- Tashkent-MW united in MW
- Tashkent-MW
- Implementation
- Recovery
- Performance
12Tashkent-MW
Tx A
Replica 1
Tx A
A ? B ? C
durability OFF
A ? B ? C
Replication MW (global ordering)
Tx B
Replica 2
Tx B
durability
A ? B ? C
A ? B ? C
A ? B ? C
durability OFF
A ? B ? C
Replica 3
Tx C
A ? B ? C
A ? B ? C
Tx C
durability OFF
One Replica ?
13 Tashkent-MW Durability Ordering in Middleware
- Middleware logs tx effects
- Durability of update tx
- Guaranteed in middleware
- Turn durability off at database
- Middleware performs durability ordering
- United ? group commit ? fast
- Database commits update tx serially
- Commit quick main memory operation
Back to Example ?
14Recovery in Tashkent-MW
Replica 1
durability OFF
Replication MW (global ordering)
Replica 2
durability
durability OFF
Replica 3
durability OFF
Db i/o?
15Standard Database I/O
Crash!
Memory
Tx A
Data
Log
- Log flushed for
- 1- Durability
- 2- Allow cleaning dirty data pages physical
integrity
A
A
Disk
Log
Data
A bad
DB recovery ?
16Database I/O with Durabilityoff
Middleware order A ? B ? C
Crash!
Memory
Tx A
Durability
Data
Log
A
A
A
- Simple SolutionRecover from a data dump
(checkpoint)
Disk
Log
Data
A bad
DB recovery ?
17Roadmap
- Durability ordering
- Separated ? serial commit ? slow
- United ? group commit ? fast
- Two Implementations
- Tashkent-API united in DB
- Tashkent-MW united in MW
- Tashkent-MW
- Implementation
- Recovery
- Performance
18Performance - Setup
- Metrics
- Throughput
- Response time
- Workload
- AllUpdates tx 1 update , mix 100 updates
- TPC-B tx4 update,1 read, mix100 updates
- TPC-W mix of long short txs
- System configuration
- Linux Cluster running PostgreSQL
AllUpdates TH ?
19AllUpdates Throughput
Throughput ?
20AllUpdates Throughput
21AllUpdates Throughput
RT ?
22AllUpdates Response Time
In paper ?
23In the Paper
- Design Implementation
- Tashkent-API
- Performance results
- TPC-B TPC-W
- Recovery times
- Another I/O subsystems
Conclusions ?
24Conclusions
- Durability ordering
- Separated ? serial commit ? slow
- United ? group commit ? fast
- Two Implementations
- Tashkent-API united in DB
- Tashkent-MW united in MW
- Tashkent-MW system
- Pure middleware replication
- Significant performance improvement
25(No Transcript)
26(No Transcript)
27Concurrency Control
- Generalized Snapshot Isolation GSI
- Conclusions valid whenever replicas agree
- 1- on which update transactions commit
- 2- on their commit order
- Example (bank database)
- T1 set balance 1000
- T2 set balance 2000
- Replica1 see T1 then T2 ? balance 2000
- Replica2 see T2 then T1 ? balance 1000
?
28Durability and Ordering 1/2
Replica 1
T4 T9
Proxy
Database
DB1 Log T4 T9
Cert. Log T4 T9
Certifier
? Scalability problem one write per trans.
29Durability and Ordering 2/2
Replica 1
? One disk write
T4 T9
Proxy
DB1 Log T1 T2 T3 T4 T5 T6 T7 T8 T9
DB1 Log T1,T2,T3 T4 T5, T6, T7, T8 T9
Database
Cert. Log T1 T2 T3 T4 T5 T6 T7 T8 T9
Certifier
Replica 2
? Scalability problem two writes per trans.
T3 T8
Proxy
Database
. . .
Tis
. . .
30AllUpdates 1-Replica Throughput
low replication overhead, 1-replica
standalone DB
31AllUpdates Response Time
In paper ?
32TPC-B Throughput
Low replication overhead, 1-replica system
standalone DB, Performance scales with multiple
replicas
In the Paper ?