Replication

About This Presentation

Title:

Replication

Description:

Replication Improves reliability Improves availability (What good is a reliable system if it is not available?) Replication must be transparent and create the ... – PowerPoint PPT presentation

Number of Views:84

Avg rating:3.0/5.0

Slides: 48

Provided by: Sukuma6

Learn more at: http://homepage.cs.uiowa.edu

Category:

more less

Transcript and Presenter's Notes

Title: Replication

1
Replication

Improves reliability
Improves availability
(What good is a reliable system if it is not
available?)
Replication must be transparent and create the
illusion of a single copy.

2
Updating replicated data
shared
Separate replicas
F
F
F
Alice
Bob
Bob
Alice
Update and consistency are primary issues.
3
Passive replication

Each client communicates with one
replica called the primary server
Each client maintains a variable L
(leader) that specifies the replica to
which it will send requests. Requests
are queued at the primary server.
Backup servers ignore client requests.

4
L3
1
3
L3
primary
2
clients
backup
4
Primary-backup protocol

Receive. Receive the request from the client and
update the state if appropriate.
Broadcast. Broadcast an update of the state to
all other replicas.
Reply. Send a response to the client.

client
req
reply
primary
update
backup
5
Primary-backup protocol

If the client fails to get a response due
to the crash of the primary, then the
request is retransmitted until a
backup is promoted as the primary.
The switch should ideally be
Instantaneous, but practically
it is not so
Failover time is the duration when
there is no primary server.

New primary elected
client
req
reply
primary
update
?
heartbeat
backup
election
6
Active replication

Each server receives client requests, and
broadcasts them to the other servers. They
collectively implement a fault-tolerant state
machine. In presence of crash, all the correct
processes reach the same next state.

input
Next state
State
7
Fault-tolerant state machine

This formalism is based on a survey by Fred
Schneider.
The clients must receive correct response even if
up to
m replica servers fail (either fail-stop or
byzantine).
For fail-stop, (m1) replicas are needed. If a
client queries
the replicas, the first one that responds gives a
correct value.
For byzantine failure (2m1) replicas are
needed. m bad
responses can be voted out by the (m1) good
responses.
But the states of the good processes must be
correctly
Updated (byzantine consensus is needed)

Fault intolerant
Fault tolerant
8
Replica coordination

Agreement. Every correct replica receives all the
requests.
Order. Every correct replica receives the
requests in the same order.
Agreement part is solved by atomic multicast.
Order part is solved by total order multicast.
The order part solves the consensus problem
where servers will agree about the next update.
It requires a synchronous model. Why?

server
client
9
Agreement
client

With fail-stop processors, the agreement part
is solved by reliable atomic multicast.
To deal with byzantine failures, an interactive
consistency protocol needs to be implemented.
Thus, with an oral message protocol, n 3m1
processors will be required.

server
10
Order

Let timestamps determine the message order.

client
A request is stable at a server, when the it
does not expect to receive any other client
request with a lower timestamp. Assume three
clients are trying to send an update, the
channels are FIFO, and their timestamps are 20,
30, 42. Each server will first update its copy
with the value that has the timestamp 20.
30
20
server
42
11
Order
But some clients may not have any update. How
long should the server wait? Require clients to
send null messages (as heartbeat signals) with
some timestamp ts. A message (null, 35) means
that the client will not send any update till
ts35. These can be part of periodic heartbeat
messages. An alternative is to use virtual time,
where processes are able to undo actions.
client
30
null
35
server
42
12
What is replica consistency?
replica
clients
Consistency models define a contract between the
data manager and the clients regarding the
responses to read and write operations.
13
Replica Consistency

Data Centric
Client communicates with the same replica
Client centric
Client communicates with different replica at
different times. This may be the case with mobile
clients.

14
Data-centric Consistency Models

1. Strict consistency
2. Linearizability
3. Sequential consistency
Causal consistency
Eventual consistency (as in DNS)
Weak consistency
There are many other models

15
Strict consistency

Strict consistency corresponds to true
replication transparency. If one of the processes
executes x 5 at real time t and this is the
latest write operation, then at a real time t gt
t, every process trying to read x will receive
the value 5. Too strict! Why?

W(x5)
p1
R(x5)
p2
t
t
Assume the read or write operations are
non-blocking
16
Sequential consistency

Some interleaving of the local temporal order of
events at the different replicas is a consistent
trace.

W(x100)
W(x99
R(x100)
R(x99)
17
Sequential consistency

Is sequential consistency satisfied here?
Initially x y 0

W(x10)
W(x8
R(x10)
W(x20)
R(x20)
R(x10)
18
Causal consistency

All writes that are causally related must be
seen by every process in the same order.

W(x10)
W(x20)
R(x10)
R(x20)
R(x10)
R(x20)
19
Linearizability

Linearizability is a correctness criterion for
concurrent object (Herlihy Wing ACM TOPLAS
1990). It provides the illusion that each
operation on the object takes effect in zero
time, and the results are equivalent to some
legal sequential computation.

20
Linearizability

A trace is in a read-write system is consistent,
when every read returns the latest value written
into the shared variable preceding that read
operation. A trace is linearizable, when (1) it
is consistent, and (2) the temporal ordering
among the reads and writes is respected (may be
based on real time or logical time).

W (x0)
R (x1)
W (x0)
ts10
ts21
ts27
R(x1)
W (x1)
(Initially xy0)
ts38
ts19
Linearizability is stronger than sequential
consistency, i.e. every linearizable object is
also sequentially consistent.
Is it a linearizable trace?
21
Exercise
What consistency model is satisfied by the above?
22
Implementing consistency models

Why are there so many consistency models?
Each model has a use in some type of
application.
The cost of implementation (as measured by
message complexity) decreases as the models
become weaker.

23
Implementing linearizability
W (x20)
Read x
W(x10)
Read x
Needs total order multicast of all reads and
writes
24
Implementing linearizability

The total order multicast forces every process to
accept and handle all reads and writes in the
same temporal order.
The peers update their copies in response to a
write, but only send acknowledgments for reads.
After all updates and acknowledgments are
received, the local copy is returned to the
client.

25
Implementing sequential consistency

Use total order broadcast all writes only,
but for reads, immediately return local copies.

26
Eventual consistency

Only guarantees that all replicas eventually
receive all updates, regardless of the order.
The system does not provide replication
transparency but large scale systems like Bayou
allows this. Conflicting updates are resolved
using occasional anti-entropy sessions that
incrementally steer the system towards a
consistent configuration.

27
Implementing eventual consistency

Updates are propagated via epidemic protocols.
Server S1 randomly picks a neighboring server S2,
and passes on the update.
Case 1. S2 did not receive the update before. In
this case, S2 accepts the update, and both S1 and
S2 continue the process.
Case 2. S2 already received the update from
someone else. In that case, S1 loses interest in
sending updates to S2 (reduces the probability of
transmission to S2 to 1/p (p is a tunable
parameter)
There is always a finite probability that some
servers do not receive all updates. The number
can be controlled by changing p.

28
Anti-entropy sessions

These sessions minimize the degree of chaos in
the states of the replicas.
During such a session, server S1 will pull the
update from S2, and server S3 can push the
update to S4

30
Timestamp of update
30
S4
26
32
30
S2
S3
24
S1
29
Exercise

Let x, y be two shared variables
Process P Process Q
initially x0 initially y0
x 1 y1
if y0 ? x2 fi if x0 ? y2 fi
Print x Print y
If sequential consistency is preserved, then
what are the possible values of the printouts?
List all of them.

30
Client centric consistency model
Relevant in the cloud storage environment
31
Client-centric consistency model

Read-after-read
If read from A is followed by read from B then
the second read should return a data that is as
least as old the previous read.

A
B
Iowa City
San Francisco
All the emails read at location A must be marked
as read in location B
32
Client-centric consistency model

Read-after-write (a.k.a read your writes
Consider a large distributed store containing a
massive collection of music. Clients set up
password-protected accounts for purchasing and
downloading music.
Alice changed her password in Iowa City,
traveled to a Minneapolis, and tried to access
the collection by logging into the account using
her new password, then she must be able to do so.

33
Client-centric consistency model

Write-after-read (a.k.a. write-follows-read)
Each write operation following a read should
take effect on the previously read copy, or a
more recent version of it.

Use your bank card to pay 500 in a store in
Denver
Alice then went to San Francisco
Balance 1500
Balancebalance-500
Write should take effect on Balance 1500
Balance in Iowa city bank after your paycheck was
credited
But the payment did not go through!
Write-after-read consistency was violated
34
Client-centric consistency model

Write-after-write (a.k.a. monotonic write)
When write at S is followed by write at a
different server S, the updates at S must be
visible before the data is updates at S.

S
San Francisco
S
Dallas
Alice then went to San Francisco
Only ½ of the updates at S are visible here
Alice gave a raise to each of her 100 employees
Alice then decided to give a 10 bonus on the new
salary to every employee
½ of the employees will receive a lower bonus
Write-after-read consistency was violated
35
Implementing client-centric consistency
Read set RS, write set WS Before an operation at
a different server is initiated, the
appropriate RS or WS is fetched from another
server.
36
Quorum-based protocols
A quorum system engages only a designated minimum
number of the replicas for every read or write
operation this number is called the read or
write quorum. When the quorum is not met, the
operation (read or write) is not performed.
Improves reliability, available, and reduces the
load on individual servers
37
Quorum-based protocols
Use 2-phase locking to update all the copies
(value, version )
Write quorum
Thomas rule
To write, update gt N/2 of them, and tag it with
new version number. To read, access gt N/2
replicas, and access the value from the copy with
the largest version number. Otherwise abandon the
read
Read quorum
38
Rationale
N no of replicas.
Ver 3
Ver 2
If different replicas store different version
numbers for an item, the state associated with a
larger version number is more recent than the
state associated with a smaller version
number. We require that RW gt N, i.e., read
quorums always intersect with write quorums.
This will ensure that read results always reflect
the result of the most recent write (because the
read quorum will include at least one replica
from the most recent write).
39
How it works
N no of replicas.
1. Send a write request containing the state and
new version number to all the replicas and waits
to receive acknowledgements from a write quorum.
At that point the write operation is complete.
The replicas are locked when the write is in
progress. 2. Send a read request for the version
number to all the replicas, and wait for replies
from a read quorum.
40
Quorum-based protocols
After a partition, only the larger segment runs
the consensus protocol. The smaller segment
contains stale data, until the network is
repaired.
Ver.1
Ver.0
41
Quorum-based protocolsGeneralized version
Asymmetric quorum W R gt N W gt N/2
No two writes overlap No read overlaps with a
write.
R read quorum W write quorum
This generalization is due to Gifford.
42
Brewers CAP Theorem
In an invited talk in the PODC 2000 conference,
Eric Brewer presented a conjecture that it is
impossible for a web service to provide all three
of the following guarantees consistency (C),
Availability (A), and partition-tolerance (P).
Individually each of these guarantees is highly
desirable, however, a web-service can meet at
most two of the three guarantees.
43
A High-level View of CAP Theorem
For consistency and availability, propagate the
update from the left to the right partition. But
how can you do it? So sacrifice partition
tolerance If you prefer partition tolerance and
availability, the sacrifice consistency. Or if
you prefer both partition-tolerance and
consistency, then sacrifice availability users
in the right partition will wait indefinitely
until the partition is restored and the update is
propagated to the right.
44
Amazon Dynamo
Amazons Dynamo is a highly scalable and highly
available key-value storage designed to support
the implementation of its various e-commerce
services. Dynamo serves tens of millions of
customers at peak times using thousands of
servers located across numerous data centers
around the world Dynamo uses distributed hash
tables (DHT) to map its servers in a circular
key space using consistent hashing commonly used
in many P2P networks. .
45
Amazon Dynamo
(a) The key K is stored in the server SG and is
also replicated in servers like SH and SA (b)
The evolution of multi-version data as reflected
by the values of the vector clocks.
46
Amazon Dynamo
Multiple versions of data are however rare. In a
24-hour profile of the shopping cart service,
99.94 of requests saw exactly one version,
and 0.00057 of requests saw 2 versions. Write
the coordinator generates the vector clock for
the new version, and sends it to the top T
reachable nodes. If at least W nodes respond,
then the write is considered successful. Read
the coordinator sends a request for all existing
version to the T top reachable servers. If it
receives R responses then the read is considered
successful Uses sloppy quorum -- T, R, and W
are limited to the first set of reachable
non-faulty servers in the consistent hashing ring
-- this speeds up the read and the write
operations by avoiding the slow servers.
Typically, (T,R,W) (3,2,2)
47
Amazon Dynamo
Maintains the spirit of always write When a
designated server S is inaccessible or down, the
write is directed to a different server S with a
hint that this update is meant for S . S later
delivers the update to S when it recovers (Hinted
handoff). Service level agreement Quite
stringent -- a typical SLA requires that 99.9 of
the read and write requests execute within 300ms,
otherwise customers lose interest and business
suffers.

Write a Comment

User Comments (0)