Title: Distributed Systems
1Distributed Systems Principles and Paradigms
Chapter 06 Consistency and Replication
2Consistency Replication
- Introduction (whats it all about)
- Data-centric consistency models
- Client-centric consistency models
- Distribution protocols
- Consistency protocols
- Examples
3Replication
- What kind of things do we replicate in a
distributed system? - Data
- Servers
- Why do we replicate things?
- To increase
- Reliability
- Performance
- What is the main problem in providing
replication? - Keeping replicas consistent!
4Shared Objects
Problem If objects (or data) are shared, we need
to do something about concurrent accesses to
guarantee state consistency.
5Concurrency Control (1/2)
Solution (a) the shared object itself can handle
concurrent invocations Solution (b) the system
in which the object resides is responsible
6Concurrency Control (2/2)
Problem How do we manage replicated shared data
objects? Solution (a) objects are
replication-aware object-specific replication
protocol is used for replica management Solution
(b) the distributed system is responsible for
replica management
7Performance and Scalability
- Main issue To keep replicas consistent, we
generally need to ensure that all conflicting
operations are done in the the same order
everywhere - Conflicting operations From the world of
transactions - Readwrite conflict a read operation and a write
operation act concurrently - Writewrite conflicts two concurrent write
operations - Guaranteeing global ordering on conflicting
operations may be a costly operation, downgrading
scalability - Solution to weaken consistency requirements so
that hopefully global synchronization can be
avoided
8Weakening Consistency Requirements
- What does it mean to weaken consistency
requirements? - Relax the requirement that updates need to be
executed as atomic operations - Do not require global synchronizations
- Copies may not always be the same everywhere
- To what extent can consistency be weakened?
- Depends highly on the access and update patterns
of the replicated data - Depends on the use of the replicated data (i.e.,
application)
9Data-Centric Consistency Models (1/2)
Consistency model a contract between a
(distributed) data store and processes, in which
the data store specifies precisely what the
results of read and write operations are in the
presence of concurrency. A data store is a
distributed collection of storages accessible to
clients
10Data-Centric Consistency Models (2/2)
- Strong consistency models Operations on shared
data are synchronized (models not using
synchronization operations) - Strict consistency (related to absolute global
time) - Linearizability (atomicity)
- Sequential consistency (what we are used to -
serializability) - Causal consistency (maintains only causal
relations) - FIFO consistency (maintains only individual
ordering) - Weak consistency models Synchronization occurs
only when shared data is locked and unlocked
(models with synchronization operations) - General weak consistency
- Release consistency
- Entry consistency
- Observation The weaker the consistency model,
the easier it is to build a scalable solution.
11Strict Consistency (1/2)
Any read to a shared data item X returns the
value stored by the most recent write operation
on X. Observation It doesnt make sense to talk
about the most recent in a distributed
environment.
- Assume all data items have been initialized to
NIL - W(x)a value a is written to x
- R(x)a reading x returns the value a
- The behavior shown in Figure (a) is correct for
strict consistency - The behavior shown in Figure (b) is incorrect for
strict consistency
12Strict Consistency (2/2)
- Strict consistency is what you get in the normal
sequential case, where your program does not
interfere with any other program. - When a data store is strictly consistent, all
writes are instantaneously visible to all
processes and an absolute global time order is
maintained - If a data item is changed, all subsequent reads
performed on that data return the new value, no
matter how soon after the change the reads are
done, and no matter which processes are doing the
reading and where they are located - If a read is done, it gets the current value, no
matter how quickly the next write is done - Unfortunately, this is impossible to implemented
in a distributed system
13Sequential Consistency (1/2)
- Sequential consistency is a slightly weaker
consistency model than strict consistency. A data
store is said to be sequentially consistent when
it satisfies the following condition - The result of any execution is the same as if the
(read and write) operations by all processes on
the data store were executed in some sequential
order, and the operations of each individual
process appear in this sequence in the order
specified by its program. - When processes run concurrently on possibly
different machines, any valid interleaving of
read and write operations is acceptable behavior - All processes see the same interleaving of
executions. - Nothing is said about time
- a process sees writes from all processes but
only its own reads
14Sequential Consistency (2/2)
- Figure (a) a sequentially consistent data store
- P1 first performs W(x)a to x. Later in absolute
time, P2 also performs W(x)b to x - Both P3 P4 first read value b and later value
a. The write operation of P2 appears to have
taken place before that of P1 to both P3 P4
- Figure (b) a data store that is not
sequentially consistent - Not all processes see the same interleaving of
write operations
15Linearizability
- A consistent model that is weaker than strict
consistency, but stronger than sequential
consistency is linearizability. - Operations are assumed to receive a timestamp
using a globally available clock, but one with
only finite precision. - A data store is said to be linearizable when each
operation is timestamped and the following
condition holds - The result of any execution is the same as if the
(read and write) operations by all processes on
the data store were executed in some sequential
order, and the operations of each individual
process appear in this sequence in the order
specified by its program. - In addition, if tsOP1(x) lt tsOP2(y), then
operation OP1(x) should precede OP2(y) in this
sequence. - a linearizable data store is also sequentially
consistent - Linearizability takes ordering according to a
set of synchronized clocks
16Causal Consistency (1/2)
- The causal consistency model is a weaker model
than sequential consistency. - Makes a distinction between events that are
potentially causally related and those that are
not. - If event B is caused or influenced by an earlier
event, A, causality requires that everyone else
first see A, then see B. - Operations that are not causally related are said
to be concurrent. - A data store is said to be causally consistent,
if it obeys the following condition - Writes that are potentially causally related must
be seen by all processes in the same order.
Concurrent writes may be seen in a different
order by different processes. - See Figure 6-9 as an example of a
causally-consistent store
17Causal Consistency (2/2)
- Figure (a) a data store that is not causally
consistent - Two writes, W(x)a and W(x)b, are casually
related since b may be a result of a computation
involving R(x)a - Figure (b) a data store that is causally
consistent
18FIFO Consistency
FIFO consistency is weaker than causal
consistency Removed the requirement that
causally-related writes must be see in the same
order by all processes A data store is said to be
FIFO consistent when it satisfies the following
condition Writes done by a single process are
received by all other processes in the order in
which they were issued, but writes from different
processes may be seen in a different order by
different processes.
19Weak Consistency (1/2)
- Although FIFO consistency can give better
performance than the stronger consistency models,
it is still unnecessarily restrictive for many
applications because they require that writes
originating in a single process be seen every
where in order - Not all applications require seeing all writes or
seeing them in order - Solution Use a synchronization variable.
Synchronize(S) synchronizes all local copies of
the data store - Using synchronization variables to partly define
consistency is called weak consistency - has
three properties - Accesses to synchronization variables are
sequentially consistent. - No access to a synchronization variable is
allowed to be performed until all previous writes
have completed everywhere. - No data access is allowed to be performed until
all previous accesses to synchronization
variables have been performed.
20Weak Consistency (2/2)
- Figure (a) a data store that is weak consistent
(i.e., valid sequence) - P1 performs W(x)a and W(x)b and then
synchronizes. P2 and P3 have not yet been
synchronized, thus no guarantees are given about
what they see - Figure (b) a data store that is not weak
consistent - why not? - Since P2 has synchronized, R(x) in P2 must read
b
21Release Consistency (1/2)
- Weak consistency has the problem that when a
synchronization variable is accessed, the data
store does not know whether this is being done
because the process is either - Finished writing the shared data, or
- About to start reading data
- Consequently, the data store must take the
actions required in both cases - Make sure that all locally initiated writes have
been completed (i.e., propagated to other copies) - Gathering in all writes from other copies
- If the data store could tell the difference
between entering a critical region or leaving
one, a more efficient implementation might be
possible.
22Release Consistency (2/2)
- Idea Divide access to a synchronization variable
into two parts an acquire and a release phase. - About to start accessing data - Acquire forces a
requester to wait until the shared data can be
accessed - Finished accessing the shared data - Release
sends requesters local value to other servers in
data store.
Question Why did P3 get a instead of b when it
executed R(x)? ? Since P3 does not do an acquire
before reading x, the data store has no
obligation to give it the current value of x, so
returning a is ok.
23Entry Consistency (1/3)
- With release consistency, all local updates are
propagated to other copies/servers during release
of shared data. - With entry consistency, each shared data item is
associated with a synchronization variable. - In order to access consistent data, each
synchronization variable must be explicitly
acquired. - Release consistency affects all shared data but
entry consistency affects only those shared data
associated with a synchronization variable.
24Entry Consistency (2/3)
- A data store exhibits entry consistency if it
meets all of the following conditions - An acquire access of a synch variable is not
allowed to perform with respect to a process
until all updates to the guarded shared data have
been performed with respect to that process. - Before an exclusive mode acess to a synch
variable by a process is allowed to perform with
respect to that process, no other process may
hold the synch variable, not even in nonexclusive
mode. - After an exclusive mode access to a synch
variable has been performed, any other process
next nonexclusive mode access to that synch
variable may not be performed until it has
performed with respect to that variables owner.
25Entry Consistency (3/3)
- Question Is this a valid event sequence for
entry consistency? - Yes
- Question Why did P2 get NIL when R(y) is
executed? - ? Since P2 did not do an acquire before reading
y, P2 may not read the latest. - Question What would be a convenient way of
making entry consistency more or less transparent
to programmers? - ? By having the distributed system use and handle
distributed shared objects (i.e., the system does
an acquire on the objects associated synch
variable when a client access a shared
distributed object).
26Summary of Consistency Models
Strong consistency models
Models do not use synch. operations
Weak consistency models
Models use synch. operations
27Client-Centric Consistency Models
- Data-centric consistency models aim at providing
the system-wide view on a data store. - Client-centric consistency models are generally
used for applications that lack simultaneous
updates i.e., most operations involve reading
data. - The following are very weak, client-centric
consistency models - Eventual consistency
- Monotonic reads
- Monotonic writes
- Read your writes
- Writes follow reads
28Client-Centric Consistency Models
Goal Show how we can perhaps avoid system-wide
consistency, by concentrating on what specific
clients want, instead of what should be
maintained by servers. Background Most
large-scale distributed systems (i.e., databases)
apply replication for scalability, but can
support only weak consistency. DNS Updates are
propagated slowly, and inserts may not be
immediately visible. News Articles and reactions
are pushed and pulled throughout the Internet,
such that reactions can be seen before
postings. Lotus Notes Geographically dispersed
servers replicate documents, but make no attempt
to keep (concurrent) updates mutually
consistent. WWW Caches all over the place, but
there need be no guarantee that you are reading
the most recent version of a page.
29Eventual Consistency
- Systems such as DNS and WWW can be viewed as
applications of large scale distributed and
replicated databases that tolerate a relatively
high degree of inconsistency - They have in common that if no updates take place
for a long time, all replicas will gradually and
eventually become consistent - This form of consistency is called eventual
consistency - Eventual consistency requires only that updates
are guaranteed to propagate to all replicas - Eventual consistent data stores work fine as long
as clients always access the same replica what
happens when different replicas are accessed?
30Consistency for Mobile Users
- Example Consider a distributed database to which
you have access through your notebook. Assume
your notebook acts as a front end to the
database. - At location A you access the database doing reads
and updates. - At location B you continue your work, but unless
you access the same server as the one at location
A, you may detect inconsistencies - your updates at A may not have yet been
propagated to B - you may be reading newer entries than the ones
available at A - your updates at B may eventually conflict with
those at A - Note The only thing you really want is that the
entries you updated and/or read at A, are in B
the way you left them in A. In that case, the
database will appear to be consistent to you.
31Basic Architecture
32Client-centric Consistency
- For the mobile user example, eventual consistent
data stores will not work properly - Client-centric consistency provides guarantees
for a single client concerning the consistency of
access to a data store by that client - No guarantees are given concerning concurrent
accesses by different clients
33Monotonic-Read Consistency
A data store is said to be monotonic-read
consistent if the following condition holds If a
process reads the value of a data item x, any
successive read operation on x by that process
will always return that same or a more recent
value. That is, if a process has seen a value of
x at time t, it will never see an older version
of x at a later time Notation WS(xit) is the
set of write operations (at Li) that lead to
version xi of x (at time t) WS(xit1xj t2)
indicates that it is known that WS(xit1) is
part of WS(xjt2) Note Parameter t is omitted
from figures
34Monotonic Reads (1/2)
Example The read operations are performed by a
single process P at two different local copies
(L1 L2) of the same data store
- Figure (a) a data store that is monotonic-read
consistent - P performs a read operation on x at L1, R(x1).
Later, P performs a read operation on x at L2,
R(x2) - Figure (b) a data store that is not
monotonic-read consistent - Why not?
- ? Since only the write operations in WS(x2) have
been performed at L2
35Monotonic Reads (2/2)
Example 1 Automatically reading your personal
calendar updates from different servers.
Monotonic Reads guarantees that the user sees all
updates, no matter from which server the
automatic reading takes place. Example 2
Reading (not modifying) incoming mail while you
are on the move. Each time you connect to a
different e-mail server, that server fetches (at
least) all the updates from the server you
previously visited.
36Monotonic-Write Consistency
A data store is said to be monotonic-write
consistent if the following condition holds A
write operation by a process on a data item x is
completed before any successive write operation
on x by the same process. That is, a write
operation on a copy of data item x is performed
only if that copy has been brought up to date by
means of any preceding write operations, which
may have taken place on other copies of x.
37Monotonic Writes (1/2)
- Figure (a) a data store that is monotonic-write
consistent - P performs a write operation on x at L1, W(x1).
Later, P performs a write operation on x at L2,
W(x2) - W(x2) requires that W(x1) is updated on L2
before it. - Figure (b) a data store that is not
monotonic-write consistent - Why not?
- W(x1) has not been propagated to L2
38Monotonic Writes (2/2)
Example 1 Updating a program at server S2, and
ensuring that all components on which compilation
and linking depends, are also placed at
S2. Example 2 Maintaining versions of
replicated files in the correct order everywhere
(propagate the previous version to the server
where the newest version is installed).
39Read-Your-Writes Consistency
A data store is said to be read-your-writes
consistent if the following condition holds The
effect of a write operation by a process on data
item x, will always be seen by a successive read
operation on x by the same process. That is, a
write operation is always completed before a
successive read operation by the same process, no
matter where that read operation takes place.
40Read Your Writes (1/2)
- Figure (a) a data store that is
read-your-writes consistent - P performs a write operation on x at L1, W(x1).
Later, P performs a read operation on x at L2,
R(x2). - WS(x1x2) states that W(x1) is part of WS(x2).
- Figure (b) a data store that is not
read-your-writes consistent - W(x1) is left out of WS(x2). That is, the
effects of the previous write operation by
process P have not been propagated to L2.
41Read Your Writes (2/2)
Example Updating your Web page and guaranteeing
that your Web browser shows the newest version
instead of its cached copy.
42Writes Follow Reads
A data store is said to be writes-follow-reads
consistent if the following condition holds A
write operation by a process on a data item x,
following a previous read operation on x by the
same process, is guaranteed to take place on the
same or a more recent value of x that was
read. That is, any successive write operation by
a process on a data item x will be performed on a
copy of x that is up to date with the value most
recently read by that process.
43Writes Follow Reads (1/2)
- Figure (a) a data store that is
writes-follow-reads consistent - P performs a read operation on x at L1, R(x1).
- The write operations that led to R(x1), also
appear in the write set at L2, where P later
performs W(x2). - Figure (b) a data store that is not
writes-follow-reads consistent - The write operations that led to R(x1), did not
appear in the write set at L2, before P later
performs W(x2).
44Writes Follow Reads (2/2)
Example See reactions to posted articles only if
you have the original posting (a read pulls in
the corresponding write operation).
45Distribution Protocols
- Distribution protocols focus on distributing
updates on replicas - The following are important design issues
- Replica Placement
- Update Propagation
- Epidemic Protocols
46Replica Placement (1/2)
- Model We consider objects (and dont worry
whether they contain just data or code, or both) - Distinguish different processes A process is
capable of hosting a replica of an object - Permanent replicas Process/machine always having
a replica (i.e., initial set of replicas) - Server-initiated replica Process that can
dynamically host a replica on request of another
server in the data store - Client-initiated replica Process that can
dynamically host a replica on request of a client
(client cache)
47Replica Placement (2/2)
48Server-Initiated Replicas
- Keep track of access counts per file, aggregated
by considering server closest to requesting
clients - Number of accesses drops below threshold D ? drop
file - Number of accesses exceeds threshold R ?
replicate file - Number of access between D and R ? migrate file
49Update Propagation (1/3)
- Important design issues in update propagation
- Propagate only notification/invalidation of
update (often used for caches) - Transfer data from one copy to another
(distributed databases) - Propagate the update operation to other copies
(also called active replication) - Observation No single approach is the best, but
depends highly on available bandwidth and
read-to-write ratio at replicas.
50Update Propagation (2/3)
- Pushing updates server-initiated approach, in
which update is propagated regardless whether
target asked for it or not. - Pulling updates client-initiated approach, in
which client requests to be updated.
51Update Propagation (3/3)
- Observation We can dynamically switch between
pulling and pushing using leases A contract in
which the server promises to push updates to the
client until the lease expires. - Issue Make lease expiration time dependent on
systems behavior (adaptive leases) - Age-based leases An object that hasnt changed
for a long time, will not change in the near
future, so provide a long-lasting lease - Renewal-frequency based leases The more often a
client requests a specific object, the longer the
expiration time for that client (for that object)
will be - State-based leases The more loaded a server is,
the shorter the expiration times become
52Epidemic Algorithms
- General background
- Update models
- Removing objects
53Principles
- Basic idea Assume there are no writewrite
conflicts - Update operations are initially performed at one
or only a few replicas - A replica passes its updated state to a limited
number of neighbors - Update propagation is lazy, i.e., not immediate
- Eventually, each update should reach every
replica - Read the theory of epidemics on pages 334-335
- Anti-entropy Each replica regularly chooses
another replica at random, and exchanges state
differences, leading to identical states at both
afterwards - Gossiping A replica which has just been updated
(i.e., has been contaminated), tells a number of
other replicas about its update (contaminating
them as well)
54System Model
- We consider a collection servers, each storing a
number of objects - Each object O has a primary server at which
updates for O are always initiated (avoiding
write-write conflicts) - An update of object O at server S is always
time-stamped the value of O at S is denoted
VAL(O,S) - T(O,S) denotes the timestamp of the value of
object O at server S
55Anti-Entropy
Basic issue When a server S contacts another
server S to exchange state information, three
different strategies can be followed Push S
only forwards all its updates to S if T(O,S)
lt T(O,S) then VAL(O,S) ? VAL(O,S) Pull S only
fetches updates from S if T(O,S) lt
T(O,S) then VAL(O,S) ? VAL(O,S) Push-Pull S
and S exchange their updates by pushing and
pulling values Observation if each server
periodically randomly chooses another server for
exchanging updates, an update is propagated in
O(log(N)) time units. Question Why is pushing
alone not efficient when many servers have
already been updated?
56Gossiping
Basic model A server S having an update to
report, contacts other servers. If a server is
contacted to which the update has already
propagated, S stops contacting other servers with
probability 1/k If s is the fraction of ignorant
servers (i.e., which are unaware of the update),
it can be shown that with many servers
Observation If we really have to ensure that all
servers are eventually updated, gossiping alone
is not enough ? Combining anti-entropy with
gossiping will solve this problem
57Deleting Values
- Fundamental problem We cannot remove an old
value from a server and expect the removal to
propagate. Instead, mere removal will be undone
in due time using epidemic algorithms - Solution Removal has to be registered as a
special update by inserting a death certificate - Next problem When to remove a death certificate
(it is not allowed to stay forever) - Run a global algorithm to detect whether the
removal is known everywhere, and then collect the
death certificates (looks like garbage
collection) - Assume death certificates propagate in finite
time, and associate a maximum lifetime for a
certificate (can be done at risk of not reaching
all servers) - Note it is necessary that a removal actually
reaches all servers. - Question Whats the scalability problem here?
58Consistency Protocols
- Consistency protocol describes the
implementation of a specific consistency model.
We will concentrate only on sequential
consistency. - Primary-based protocols
- Replicated-write protocols
- Cache-coherence protocols
59Primary-Based Protocols (1/4)
Primary-based, remote-write, fixed server
Example Used in traditional client-server
systems that do not support replication.
60Primary-Based Protocols (2/4)
Primary-backup protocol
Example Traditionally applied in distributed
databases and file systems that require a high
degree of fault tolerance. Replicas are often
placed on same LAN.
61Primary-Based Protocols (3/4)
Primary-based, local-write protocol
Example Establishes only a fully distributed,
non-replicated data store. Useful when writes are
expected to come in series from the same client
(e.g., mobile computing without replication)
62Primary-Based Protocols (4/4)
Primary-backup protocol with local writes
Example Distributed shared memory systems, but
also mobile computing in disconnected mode (ship
all relevant files to user before disconnecting,
and update later on).
63Replicated-Write Protocols (1/3)
Active replication Updates are forwarded to
multiple replicas, where they are carried out.
There are some problems to deal with in the face
of replicated invocations
64Replicated-Write Protocols (2/3)
Replicated invocations Assign a coordinator on
each side (client and server), which ensures that
only one invocation, and one reply is sent
65Replicated-Write Protocols (3/3)
Quorum-based protocols Ensure that each
operation is carried out in such a way that a
majority vote is established distinguish read
quorum and write quorum
Read the explanation on these examples on page
344.
66Example Lazy Replication
Basic model Number of replica servers jointly
implement a causal-consistent data store. Clients
normally talk to front ends which maintain data
to ensure causal consistency.
67Lazy Replication Vector Timestamps
- VAL(i) VAL(i)i denotes the total number of
write operations sent directly by a front end
(client). VAL(i)j denotes the number of updates
sent from replica j. - WORK(i) WORK(i)i total number of write
operations directly from front ends, including
the pending ones. WORK(i)j is total number of
updates from replica j, including pending ones. - LOCAL(C) LOCAL(C)j is (almost) most recent
value of VAL(j)j known to front end C (will be
refined in just a moment) - DEP(R) Timestamp associated with a request,
reflecting what the request depends on.
68Operations
Read operations
Write operations
69READING