Tashkent: Uniting Durability - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Tashkent: Uniting Durability

Description:

Tashkent: Uniting Durability & Ordering in Replicated Databases Sameh Elnikety, EPFL Steven Dropsho, EPFL Fernando Pedone, USI – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 33
Provided by: labosEpfl
Category:

less

Transcript and Presenter's Notes

Title: Tashkent: Uniting Durability


1
Tashkent Uniting Durability Ordering in
Replicated Databases
Sameh Elnikety, EPFL Steven Dropsho, EPFL Fernando Pedone, USI
2
Write-Many Replicated Database
  • All replicas agree on
  • which update tx commit
  • their commit order
  • Total order
  • Determined by middleware
  • Followed by each replica

Replica 1
Tx A
durability
Replica 2
Tx B
durability
Replica 3
durability
separation ?
3
Order Determined Outside DB
Replica 1
Tx A
Tx A
durability
A ? B
A ? B
Tx B
Replication MW (global ordering)
Replica 2
Tx B
durability
A ? B
A ? B
A ? B
Replica 3
A ? B
durability
A ? B
One Replica ?
4
Enforce External Commit Order
Middleware Commitorder A ? B
Replica
durability
B
Cannot commit A B concurrently!
Must serialize ?
5
Enforce Order Serial Commit
Middleware Commitorder A ? B
Replica
durability
A
Serialization slow ?
6
Commit Serialization is Slow
Middleware order A ? B ? C
Commit orderA ? B ? C
Proxy
Ack B
Ack A
Ack C
Root cause Durability ordering
separated ? serial disk writes
Database
Commit A
Commit B
Commit C
DurabilityA
CPU
DurabilityA ? B
CPU
DurabilityA ? B ? C
CPU
durability
Solutions ?
7
Solution Unite Durability Ordering
1-Pass order info to DB
2-Move durability to MW
Middleware (ordering)
Middleware (ordering)
Replica
Replica
durability
durability OFF
order
durability
Replica
Replica
durability OFF
durability
order
Unite in DB ?
8
1- Unite Dur. Ord. in Database
Middleware order A ? B ? C
Commit orderA ? B ? C
Proxy
Ack AAck B Ack C
Commit A at 1 Commit B at 2 Commit C at 3
Database
order
CPU
DurabilityA ? B ? C
durability
Solution 1 pass order info to DB
Durability ordering in database ? group commit
Solutions ?
9
Solution Unite Durability Ordering
1-Pass order info to DB
2-Move durability to MW
Middleware (ordering)
Middleware (ordering)
Replica
Replica
durability
durability OFF
order
durability
Replica
Replica
durability OFF
durability
order
Unite in DB ?
10
2- Unite D. O. in Middleware
Middleware order A ? B ? C
DurabilityA ? B ? C
Commit orderA ? B ? C
durability
Proxy
Ack A
Ack B
Ack C
Database
Commit A
Commit B
Commit C
CPU
CPU
CPU
durability OFF
Solution 2 move durability to MW
Durability ordering in middleware ? group
commit
Roadmap ?
11
Roadmap
  • Durability ordering
  • Separated ? serial commit ? slow
  • United ? group commit ? fast
  • Two Implementations
  • Tashkent-API united in DB
  • Tashkent-MW united in MW
  • Tashkent-MW
  • Implementation
  • Recovery
  • Performance

12
Tashkent-MW
Tx A
Replica 1
Tx A
A ? B ? C
durability OFF
A ? B ? C
Replication MW (global ordering)
Tx B
Replica 2
Tx B
durability
A ? B ? C
A ? B ? C
A ? B ? C
durability OFF
A ? B ? C
Replica 3
Tx C
A ? B ? C
A ? B ? C
Tx C
durability OFF
One Replica ?
13
Tashkent-MW Durability Ordering in Middleware
  • Middleware logs tx effects
  • Durability of update tx
  • Guaranteed in middleware
  • Turn durability off at database
  • Middleware performs durability ordering
  • United ? group commit ? fast
  • Database commits update tx serially
  • Commit quick main memory operation

Back to Example ?
14
Recovery in Tashkent-MW
Replica 1
durability OFF
Replication MW (global ordering)
Replica 2
durability
durability OFF
Replica 3
durability OFF
Db i/o?
15
Standard Database I/O
Crash!
Memory
Tx A
Data
Log
  • Log flushed for
  • 1- Durability
  • 2- Allow cleaning dirty data pages physical
    integrity

A
A
Disk
Log
Data
A bad
DB recovery ?
16
Database I/O with Durabilityoff
Middleware order A ? B ? C
Crash!
Memory
Tx A
Durability
Data
Log
A
A
A
  • Simple SolutionRecover from a data dump
    (checkpoint)

Disk
Log
Data
A bad
DB recovery ?
17
Roadmap
  • Durability ordering
  • Separated ? serial commit ? slow
  • United ? group commit ? fast
  • Two Implementations
  • Tashkent-API united in DB
  • Tashkent-MW united in MW
  • Tashkent-MW
  • Implementation
  • Recovery
  • Performance

18
Performance - Setup
  • Metrics
  • Throughput
  • Response time
  • Workload
  • AllUpdates tx 1 update , mix 100 updates
  • TPC-B tx4 update,1 read, mix100 updates
  • TPC-W mix of long short txs
  • System configuration
  • Linux Cluster running PostgreSQL

AllUpdates TH ?
19
AllUpdates Throughput
Throughput ?
20
AllUpdates Throughput
21
AllUpdates Throughput
RT ?
22
AllUpdates Response Time
In paper ?
23
In the Paper
  • Design Implementation
  • Tashkent-API
  • Performance results
  • TPC-B TPC-W
  • Recovery times
  • Another I/O subsystems

Conclusions ?
24
Conclusions
  • Durability ordering
  • Separated ? serial commit ? slow
  • United ? group commit ? fast
  • Two Implementations
  • Tashkent-API united in DB
  • Tashkent-MW united in MW
  • Tashkent-MW system
  • Pure middleware replication
  • Significant performance improvement

25
(No Transcript)
26
(No Transcript)
27
Concurrency Control
  • Generalized Snapshot Isolation GSI
  • Conclusions valid whenever replicas agree
  • 1- on which update transactions commit
  • 2- on their commit order
  • Example (bank database)
  • T1 set balance 1000
  • T2 set balance 2000
  • Replica1 see T1 then T2 ? balance 2000
  • Replica2 see T2 then T1 ? balance 1000

?
28
Durability and Ordering 1/2
Replica 1
T4 T9
Proxy
Database
DB1 Log T4 T9
Cert. Log T4 T9
Certifier
? Scalability problem one write per trans.
29
Durability and Ordering 2/2
Replica 1
? One disk write
T4 T9
Proxy
DB1 Log T1 T2 T3 T4 T5 T6 T7 T8 T9
DB1 Log T1,T2,T3 T4 T5, T6, T7, T8 T9
Database
Cert. Log T1 T2 T3 T4 T5 T6 T7 T8 T9
Certifier
Replica 2
? Scalability problem two writes per trans.
T3 T8
Proxy
Database
. . .
Tis
. . .
30
AllUpdates 1-Replica Throughput
low replication overhead, 1-replica
standalone DB
31
AllUpdates Response Time
In paper ?
32
TPC-B Throughput
Low replication overhead, 1-replica system
standalone DB, Performance scales with multiple
replicas
In the Paper ?
Write a Comment
User Comments (0)
About PowerShow.com