Title: Replication 1
1Replication (1)
2Topics
- Why Replication?
- Consistency Models How do we reason about the
consistency of the global state? - Data-centric consistency
- Client-centric consistency
- We will examine consistency protocols which
describe an implementation of a specific
consistency model. - Other Implementation Issues
- Examples
3Readings
- Van Steen and Tanenbaum 6.1, 6.2 and 6.3, 6.4
- Coulouris 11,14
4Why Replicate?
- Replication refers to the maintenance of copies
at multiple site - Reliability
- If one replica is unavailable or crashes, use
another - Avoid single points of failure
- Performance
- Placing copies of data close to the processes
using them can improve performance through
reduction of access time. - If there is only one copy, then the server could
become overloaded.
5Common Replication Examples
- DNA naming service
- Web browsers often locally store a copy of a
previously fetched web page. - This is referred to as caching a web page.
- Replication of a database
- Replication of game state
6Replication Problem
- Multiple copies may lead to consistency problems.
- Whenever a copy is modified, that copy becomes
different from the rest. - Modifications have to be carried out on all
copies to ensure consistency. - The type of application has an impact on the
consistency requirements needed and thus on the
implementation.
7Consistency Model
- Some applications (e.g., banking) require
- That update operations are performed in the same
order at each copy. - This is referred to as sequential consistency.
- Possible Implementation Using Lamports clocks
- Other applications (e.g., bulletin board) require
- That if one update, U1, causes another update,
U2, to occur then U1 should be executed before U2
at each copy. - This is referred to as causal consistency
- Possible Implementation Using vector clocks
8Consistency Model
- Observe that although there is replication the
type of application indicates the type of
consistency model to be used. - A consistency model describes the rules to be
used in updating replicated data - There are more consistency models than sequential
and causal. - Other Consistency Models
- FIFO
- Strict
9FIFO Consistency
- Writes done by a single process are seen by all
other processes in the order in which they were
issued - but writes from different processes may be seen
in a different order by different processes. - i.e., there are no guarantees about the order in
which different processes see writes, except that
two or more writes from a single source must
arrive in order.
10FIFO Consistency
- Caches in web browsers
- All updates are updated by page owner.
- No conflict between two writes
- Note If a web page is updated twice in a very
short period of time then it is possible that the
browser doesnt see the first update. - Implementation
- Each process adds the following to an update
message (process id, sequence number) - Each other process applies the update messages in
the order received from a single process.
11Strict Consistency
- Strict consistency is defined as follows
- Read is expected to return the value resulting
from the most recent write operation - Assumes absolute global time
- All writes are instantaneously visible to all
- Suppose that process pi updates the value of x
to 5 from 4 at time t1 and multicasts this value
to all replicas - Process pj reads the value of x at t2 (t2 t1).
- Process pj should read x as 5 regardless of the
size of the (t2-t1) interval.
12Strict Consistency
- What if t2-t1 1 nsec and the optical fibre
between the host machines with the two processes
is 3 meters. - The update message would have to travel at 10
times the speed of light - Not allowed by Einstens special theory of
relativity. - Cant have strict consistency
13Implementation Options Sequential Consistency
- We saw how to use Lamports logical clocks for
sequential consistency. - Another option is to have a centralized processor
that is a sequencer.
14Implementation Options Sequential Consistency
- We saw how to use Lamports logical clocks for
sequential consistency. - Another option is to have a centralized processor
that is a sequencer. - Each update request it sent to the sequencer
which - Assigns the request a unique sequence number
- Update request is forwarded to each replica
- Operations are carried out in the order of their
sequence number
15Implementation Options Sequential Consistency
- The use of a sequencer also does not solve the
scalability problem. - It may become a performance bottleneck.
- What if it goes down?
- A combination of Lamport timestamps and
sequencers may be necessary. - The approach is summarized as follows
- Each process has a unique identifier, pi, and
keeps a sent message counter ci. The process
identifier and message counter uniquely identify
a message. - Active processes (or a sequencer) keep an extra
counter ti. This is called the ticket number. A
ticket is a triplet (pi, ti, (pj, cj)). - All other processes are passive
16Implementation Options Sequential Consistency
- Approach Summary (cont)
- Passive processes (non-sequencer) send their
messages to their sequencer. - Lamports totally ordered multicast algorithm is
used among the sequencers to determine the order
of update operations. - When an operation is allowed, each sequencer
sends the ticket to its associated passive
processes. It is assumed that the passive
process receives these tickets in the order sent.
17Implementation Options Sequential Consistency
- Approach Summary (cont)
- If a sequencer terminates abnormally, then one of
the passive processes associated with it can
become the new sequencer. - An election algorithm may be used to choose the
new sequencer.
18Implementation Options Sequential Consistency
- Lets say that we have 6 processes
p1,p2,p3,p4,p5,p6 - Assume that p1,p2 are sequencers p3,p4 are
associated with p1 and p5,p6 are associated with
p2 - Lets say that p3 sends a message which is
identified by (p3 , 1). - p1 generates a ticket as follows (p1, 1, (p3 ,
1)) - The ticket number is generated using the Lamport
clock algorithm.
Ticket number
19Implementation Options Sequential Consistency
- Lets say that p5 sends a message which is
identified by (p5 , 1). - p2 generates a ticket as follows (p2, 1, (p5 ,
1)) - Which update gets done first? Basically, p1,p2
will apply Lamports algorithm for totally
ordered multicast. - When an update operation is allowed to proceed,
the sequencers send messages to their associated
processes.
20Data-Centric Consistency Models
- The consistency models just discussed are called
data-centric consistency models. - Assumptions
- Concurrently processes may be simultaneously
updating - Updates need to be propagated quickly.
21Eventual Consistency
- In the banking example an account can have many
updates by different sources e.g., person at ATM,
bank adding interest Updates should be
immediate - Many applications One or few processes perform
updates - Example DNS
- DNS name space is divided into domains.
- Each domain has its own naming authority
- Only that authority is allowed to update its part
of the name space e.g., change the IP address
associated with a host name. - This implies that there is no write-write
conflict - Does the update have to be done immediately?
- No.
- Can propagate an update in a lazy fashion i.e.,
- Often acceptable to propagate an update only
after some time has passed
22Eventual Consistency
- Example WWW
- Web pages are updated by a single authority.
- Web pages are cached by browsers for efficiency
- The cached page that is returned to the
requesting client may be an older version
compared to the one available at the actual web
server. - This inconsistency is usually acceptable.
- Some applications can tolerate relatively high
inconsistency. - Eventual consistency requires only that updates
are guaranteed to propagate to all replicas.
23Eventual Consistency
- The principle of a mobile user accessing
different replicas - of a distributed database.
24Eventual Consistency
- The mobile user accesses the database by
connecting to one of the replicas in a
transparent way. - The application running on the users portable
computer is unaware (ideally) on which replica it
is actually operating. - Assume the user performs several update
operations and then disconnects again. - Later the user accesses the database again,
possibly after moving to a different location or
by using a different access device. The user may
be connected to a different replica. - What if the updates have not propagated? Could
be confusing to the user.
25Client-Consistency Models
- Often there are some constraints placed on
eventual consistency. - These constraints help define client-consistency
models.
26Client-Consistency Models
- Monotonic reads
- If a process reads a value of data item x, the
subsequent reads by the same process will return
the same value or a later value. - Example
- Consider a distributed e-mail database
- In such a database, each users mailbox may be
distributed and replicated across multiple
machines. - Mail can be inserted in a mailbox at any
location. - Updates are propagated in a lazy (i.e., on
demand) fashion. - Assume that reads dont change the mailbox.
- Suppose a user reads their e-mail in Vancouver
and then flies to Toronto and reads their e-mail. - A monotonic read guarantees that the messages
that were in the mailbox in Vancouver will also
be in the mailbox in Toronto.
27Client-Consistency Models
- Monotonic writes
- A write operation on data item x is completed
before any subsequent writes by the same process
on data item x. - Example Updating a software library
- Update may consist of replacing one or more
functions resulting in a new version. - Updates performed on a copy of the library should
be able to assume that all proceeding updates
have been performed first.
28Client-Consistency Models
- Read-Your-Writes
- A write operation by a process on data item x
will always be seen by a successive read
operation on x by the same process - The absence of this consistency is seen in the
following examples. - Example Updating Web HTML pages
- Cached web pages are still read even though that
web page has been updated. - Example Password updates for digital library
- This may occur at one site, but not immediately
propagated to a site where the account/password
is actually needed
29Client-Consistency Models
- Write-Follows-Reads
- A write operation by a process on data item x
following a previous read operation on x by the
same process is guaranteed to see the same or
more recent value of x
30Implementing Client-Centric Models
- Globally unique ID per write operation
- Assigned by the initiating server
- Global IDs can be generated locally.
- A server is required to log the write operation
so that it can be replayed at another server. - For each client, we keep track of two sets of
write identifiers - Read set
- Write IDs relevant to clients read operations
- Write set
- IDs of writes performed by client
- Major performance issue
- Size of read/write sets
31Implementing Client-Centric Models
- Monotonic read
- When a client issues a read, the server is given
the clients read set to check whether all the
identified writes have taken place locally - If not, the server contacts others to ensure that
it is brought up-to-date - After the read, the clients read set is updated
with the servers relevant writes - Monotonic write
- When a client issues a write, the server is given
the clients write set - to ensure that all specified writes have been
applied (in-order) - The write operations ID is appended to clients
write set
32Implementing Client-Centric Models
- Read-your-writes
- Before serving a read request, the server fetches
(from other servers) all writes in the clients
write set - Writes-follow-reads
- Server is brought up-to-date with the writes in
the clients read set - After write, the new ID is added to the clients
write set, along with the IDs in the read set - as these have become relevant for the write
just performed
33Impact of Mobility
- Mobility suggests that a user may be
disconnected. - Assume that a user of a mobile device has
downloaded their calendar from their workstation.
- Users device is disconnected.
- User makes changes to the calendar on the mobile
device. - Secretary makes changes to the calendar on the
workstation - When the user is connected the calendar on the
users device and on the users workstation
should become the same. - Some schemes have the users device by the
primary and the workstation be a backup. - This suggests that the calendar on the users
device is considered the most recent.
34Other Important Implementation Issues
- Important issues in implementation includes the
following - Placement and nature of replicas
- Distributing updates
35Replica Placement
- Permanent
- A process/machine always has a replica.
- Example Mirroring of a web site
- Server-Initiated
- Processes that can dynamically host a replica on
request of another server. - Client-Initiated
- Processes that can dynamically host a replica on
request of a client. - Example Web Caches
36Server-Initiated Replicas
- Consider a web server placed in Toronto.
- Under normal situations, the server can handle
incoming requests easily it is predicted that
in a couple of a days there will be sudden burst
of requests. - It may be worthwhile to install a number of
temporary replicas in region where requests are
coming from.
37Server-Initiated Replicas
- The ability to optimize the dynamic placement of
replicas is of special interest to web hosting
services. - ISPs pay a web hosting company (sometimes called
an access-centric content distribution network)
to serve popular content from caches close to the
ISPs subscribers. - This model assumes that storage is cheaper than
bandwidth, and that customers will not hesitate
to move to other ISPs if they perceive their
current ISP to be slow.
38Server-Initiated Replicas
- Example Heuristic
- Keep track of access counts per file.
- Number of accesses drops below some threshold
value D. This implies that file can be dropped. - The number of accesses exceeds a threshold R.
This implies that the file should be replicated.
39Client-Initiated Replicas
- Created at the initiative of clients.
- Known as caches
- In essence, a cache is a local storage facility
that is used by a client to temporarily store a
copy of the data it has just requested. - Client caches are used to improve access times to
data. - Data is generally kept in a cache for a limited
amount of time e.g., to prevent extremely stale
data from being used or make room for other data. - Cache placement can be local to a clients
machine or in a location that is easily
accessible by other machines in the clients
organization.
40Update Propagation
- Update operations are generally initiated by a
client and subsequently forwarded to one of the
copies. - There are a number of design issues to consider.
- State or Operation?
- An important design issue concerns what is
actually to be propagated. - Three Possibilities
- Notification of an update
- New copy of data
- Copy of operation
- Trade bandwidth for processing
41Update Propagation
- Push vs Pull
- Another design issue is whether updates are
pulled or pushed. - Push by server
- Server must know replicas
- Client immediately updated
- Pull by client
- Client must poll or delay response when item
requested
42Update Propagation
- Push vs. Pull (cont)
- Leases
- We can dynamically switch between pulling and
pushing using leases A contract in which the
server promises to push updates to the client
until the lease expires. - Age-based leases An object that hasnt changed
for a long-time, will not change in the near
future, so provide a long-lasting lease. - Renewal-frequency based leases The more often a
client requests a specific object, the longer the
expiration time for that client (for that object)
will be. - State-based leases The more loaded a server is,
the shorter the expiration times become.
43Consistency Requirements in Applications
- We have looked at several consistency models and
possible implementations. - There are many more out there that are a
variation of the models described. - It is important to understand the consistency
requirements of the application domain. - Lets look at some Internet applications.
44Consistency Requirements for Applications
- Bulletin board
- Replicated message posting service
- As discussed earlier, causal order is needed.
Some bulletin boards may also want total order. - There may be a requirement on how fast these
updates should be. - KaZaa
- Order of updates doesnt matter since downloading
a file is a commutative operation i.e., it
doesnt matter if song a is downloaded before
song b or if song b is downloaded before song a.
- Some would say is that what is important is
eventually all sites could have the same songs.
45Consistency Requirements for Applications
- Chat Service
- Chat messages require causal order for
discussions to make sense. - Games
- Players moves in a game must be delivered in the
same order to all participants for fairness. - In both these cases, timeliness is important.
- A centralized solution results in a performance
bottleneck. - Games sometimes guess at moves or the position of
objects on the game board - E.g., instead of sending and receiving messages
for the position of a object, the software
predicts what the positions would be.
46Consistency Requirements for Applications
- Airline reservation
- This is representative of replicated e-commerce
services that accept inquiries (searches) and
purchases orders on a catalog. - A measurement of consistency is used. This is
the percentage of requests that access
inconsistent results. - Example A user may observe an available seat
when in fact the set has been booked at another
replica. - Isnt this handled by using one of the approaches
to providing total order. - Yes, but if a small violation of consistency is
tolerated we can achieve better performance.
47Consistency Requirements for Applications
- Airlines reservation (cont)
- Consistency requirements change dynamically.
- Example The cost of a transaction that must be
rolled back is fairly small when a flight is
empty but grows was the flight fills. - Why? One can likely find an alternate seat on the
same flight. - Requests when the flight is close to full may
require a replica to be more aggressive in
enforcing sequential consistency.