DStore:%20An%20Easy-to-Manage%20Persistent%20State%20Store - PowerPoint PPT Presentation

About This Presentation

Title:

DStore:%20An%20Easy-to-Manage%20Persistent%20State%20Store

Description:

atomic compare-and-swap. Our online repartitioning algorithm lowers scaling cost. Reactive scaling adjusts capacity to match current load ... – PowerPoint PPT presentation

Number of Views:30

Avg rating:3.0/5.0

Slides: 20

Provided by: Andrew780

Learn more at: http://roc.cs.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: DStore:%20An%20Easy-to-Manage%20Persistent%20State%20Store

1
DStore An Easy-to-Manage Persistent State Store
Andy Huang and Armando FoxStanford University
2
Outline

Project overview
Consistency guarantees
Failure detection
Benchmarks
Next steps and bigger picture

3
Background Scalable CHTs
Frontends
App Servers
DBs
Cluster hash tables (CHTs)

Single-key-lookup data
Yahoo! user profiles
Amazon catalog metadata

Underlying storage layer
InktomiwordID ? docID listdocID ? document
metadata
DDS/Ninjaatomic compare-and-swap

4
DStore An easy-to-manage CHT

Capacity planning
High scaling costs necessitate accurate load
prediction

Failure detection
Fast detection is at odds with accurate detection

C H A L L E N G E S

Cheap recovery
Predictably fast and predictably small impact on
availability/performance

Our online repartitioning algorithm lowers
scaling cost
Reactive scaling adjusts capacity to match
current load

Lowers the cost of acting on false positive
Effective failure detection not contingent on
accuracy

B E N E F I T S
Manage like stateless frontends
5
Cheap recovery Principles and costs

Single-phase writes
No locking and transactional logging

Quorums
No recovery code to freeze writes copy missed
updates

T E C H N I Q U E S

Sacrifice some consistency Well-defined
guarantees that provide consistent ordering

Higher replication factor 2N1 bricks to
tolerate N failures (vs. N1 in ROWA)

C O S T S
Trade storage and consistency for cheap recovery
6
Nothing new under the sun, but
DStore
Prior work
Technique
Ease of management
Scalable performance
CHT
Availability during failures and recovery
Availability during network partitions and
Byzantine faults
Quorums
Availability and performance while nodes are
unavailable
Relaxed consistency
Cheap recovery (but thats just the start)
High availability and performance (end goal)
Result
7
Cheap recovery simplifies state management
DStore
Prior work
Challenge
Effective even if it is not highly accurate
Difficult to make fast and accurate
Failure detection
Duration and impact is predictably small
Relatively new area Aqueduct
Online repartitioning
Scale reactively based on current load
Predict future load
Capacity planning
Future work
RAID
Data reconstruction
Manage state with techniques used for stateless
frontends
State management is costly (administration- and
availability-wise)
Result
8
Outline

Project overview
Consistency guarantees
Failure detection
Benchmarks
Next steps and bigger picture

9
Consistency guarantees

Usage model
Guarantee For a key k, DStore enforces a global
order of operations that is consistent with the
order seen by individual clients.
C1 issues w1(k, vnew) to replace current hash
table entry (k, vold)
w1 returns SUCCESS subsequent reads return vnew
w1 returns FAIL subsequent reads return vold
w1 return UNKNOWN (due to Dlib failure) two
cases

10
Case 1 Another user U2 performs a read
(k1,vold)
U2 r(k1) returns vold no user has read
vnew vnew no user will later read vold
Dlib failure can cause a partial write, violating
the quorum property
If timestamps differ, read-repair restores
majority invariant
U1
B1
B2
B3
U2
11
Case 2 U1 performs a read
(k1,vold)
U1 r(k1) write is immediately committed or
aborted all future readers see either vold or
vnew
A write-in-progress cookie can be used to detect
partial writes and commit/abort on the next read
B1
B2
B3
U1
U2
12
Consistency guarantees

C1 issues w1(k, vnew) to replace current hash
table entry (k, vold)
w1 returns SUCCESS subsequent reads return vnew
w1 returns FAIL subsequent reads return vold
w1 return UNKNOWN (due to Dlib failure)
U1 reads w1 is immediately committed or aborted
U2 reads if vold is returned, no user has read
vnew if vnew is returned, no user
will later read vold

13
Versus sequential consistency
(k1,vold)(k2,vold)
Conditions atomicity consistent ordering
w1(k1,vnew)
UNKNOWN causes non-atomic writes
U1
B1
B2
B3
U2
14
Two-phase commit vs. single phase writes
Single-phase writes
2-phase commit
Property
Consistent ordering
Sequential consistency
Consistency
No special-case recovery
Read log to complete in progress transactions
Recovery
No locking
Locking may cause request to block during
failures
Availability
1 synchronous update1 roundtrip
2 synchronous log writes2 roundtrips
Performance
Read-repair (spreads out the cost of 2-PC to
make common case faster)Write-in-progress
cookie (spreads out the responsibility of
2-PC)
None
Other costs
15
Recovery behavior
Predictably fast and small impact
16
Application-generic failure detection
Failure detection techniques
Operating statistics (CPU load, requests
processed, etc.)
Anomalies
Beacon listener
gt treshold
Median absolute deviation
reboot
Simple detection techniques work because
resolution mechanism is cheap
17
Failure detection and repartitioning behavior
Aggressive failure detection
Low scaling cost
Low cost of acting on false positives
18
Bigger picture What is self-managing?
Indicator
Brick performance
a sign of system health
Monitoring
tests for potential problems
Treatment
low-impact resolution mechanism
19
Bigger picture What is self-managing?
20
Bigger picture What is self-managing?
Brick performance
System load
Disk failures
Simple detection mechanisms policies
Key low-cost mechanisms
Constant recovery
21
Nothing new under the sun, but
Technique Prior work DStore
CHT Scalable performance Ease of management
Quorums Availability during network partitions and Byzantine faults Availability during failures and recovery
Relaxed consistency Availability and performance while nodes are unavailable Availability during failures and recovery
Result High availability and performance (end goal) Cheap recovery (but thats just the start)
22
Cheap recovery simplifies state management
Challenge Prior work DStore
Failure detection Difficult to make fast and accurate Effective even if it is not highly accurate
Online repartitioning Relatively new area Aqueduct Duration and impact is predictably small
Capacity planning Predict future load Scale reactively based on current load
Data reconstruction RAID Future work
Result State management is costly (administration- and availability-wise) Manage state with techniques used for stateless frontends
23
Two-phase commit vs. single phase writes
Property 2-phase commit Single-phase writes
Consistency Sequential consistency Consistent ordering
Recovery Read log to complete in progress transactions No special-case recovery
Availability Locking may cause request to block during failures No locking
Performance 2 synchronous log writes2 roundtrips 1 synchronous update1 roundtrip
Other costs None Read-repair (spreads out the cost of 2-PC to make common case faster)Write-in-progress cookie (spreads out the responsibility of 2-PC)
24
Bigger picture
25
Big picture

Use simple metrics to trigger scaling
Brick load
Cache hit rate
Online data reconstruction

26
Simple, aggressive failure detection

Bricks send operating statistics
CPU load, average queue delay, number of requests
processed, etc.
Statistical methods
Median absolute deviation compares one bricks
behavior with the current behavior of the rest of
the bricks
Tarzan incorporates past behavior of each brick
and detects anomalies in the operating
statistics patterns
Why these techniques are effective
Not the best failure detection mechanisms
Parameters are not highly tuned
Simple, application-generic techniques work
because of the low cost of acting on false
positives