Title: CTIS 490 DISTRIBUTED SYSTEMS
1CTIS 490DISTRIBUTED SYSTEMS
- WEEK 10
- RECOVERY
- OTHER ISSUES
2RECOVERY
- Another issue of fault tolerance is the recovery
from an error. - The idea of error recovery is to replace an
erroneous state with an error-free state. - The most widely used recovery method is the
backward recovery. - Backward recovery brings the system from its
present erroneous state back into a previously
correct state. To do so, it will be necessary to
record the systems state from time to time and
to restore such a recorded state when things go
wrong. - Each time the systems present state is recorded,
a checkpoint is said to be made.
3CHECKPOINTING
- In distributed systems, a consistent global state
is also called a distributed snapshot. - In a distributed snapshot, if a process P has
recorded the receipt of a message, then there
should be a process Q that has recorded the
sending of that message.
4INDEPENDENT CHECKPOINTING
- Each process saves its state from time to time to
a locally available stable storage (which is
designed to survive anything except major
disasters), and we have to construct a consistent
global state from these local states. - A recovery line corresponds to the most recent
collection of checkpoints. - The distributed nature of checkpointing may make
it difficult to find a recovery line. To discover
a recovery line requires that each process is
rolled back to its most recently saved state. - If these local states jointly do not form a
distributed snapshot, further rolling back is
necessary. - This process of a cascaded rollback leads to
domino effect.
5INDEPENDENT CHECKPOINTING
- The state saved by P2 indicates the receipt of a
message m, but no other process can be identified
as its sender. So, P2 needs to be rolled back to
an earlier state. - P1 has recorded the receipt of message m, but
there is no recorded event of this message being
sent. - In this example, the recovery line is the initial
state of the system.
6COORDINATED CHECKPOINTING
- In coordinating checkpointing, all processes
synchronize to jointly write their state to local
stable storage. The main advantage is that the
saved state is globally consistent. - A coordinator first multicasts a
CHECKPOINT_REQUEST message to all processes. When
a process receives such a message, it takes a
local checkpoint, queues any subsequent messages,
and acknowledges that it has taken a checkpoint. - When the coordinator has received an
acknowledgment from all processes, it multicasts
a CHECKPOINT_DONE message to allow the blocked
processes to continue.
7MESSAGE LOGGING
- Many distributed systems combine checkpointing
with message logging. - There are two types of message logging
- Sender-based logging process logs its messages
before sending them off. - Receiver-based logging process logs its
messages before executing them. - Message logging allows replay of the messages.
8REPLICA MANAGEMENT
- Replica management involves two issues where to
place replicas and which mechanisms to use for
keeping them consistent. - Placing replica servers concerned with finding
the best locations to place a server that can
host part of a data store. - Placing content finding the best servers for
placing content.
9REPLICA-SERVER PLACEMENT
- The optimal placement of replica servers is not
an intensively studied problem since it is more
of a management issue. - Analysis of client and network properties are
useful to come to informed decisions. - One approach is to consider the topology of the
Internet as formed by the Autonomous Systems
(AS). - An AS can best be viewed as a network in which
the nodes all run the same routing protocol and
which is managed by single organization,
typically Internet Service Provider (ISP).
10CONTENT REPLICATION PLACEMENT
- There are three types of replicas.
11PERMANENT REPLICAS
- Permanent replicas can be considered as the
initial set of replicas that constitute a data
store. - For example, distribution of a Web site generally
comes in two forms - First, files that constitute a site are
replicated across number of servers at a single
location. Whenever a request comes in, it is
forwarded to one of the servers, for instance
using a round-robin method. - Second, mirroring can bed used. In this case, a
Web site is copied to a limited number of
servers, called mirror sites which are
geographically spread across the Internet.
Clients choose one of the sites offered to them.
12SERVER-INITIATED REPLICAS
- Server-initiated replicas are used to enhance
performance by placing temporary replicas
(dynamically placing) to handle sudden burst of
requests. - Used mainly by the Web hosting services.
- Each server keeps track of access counts per file
and where access requests come from. - Given a client C, each server can determine which
of the servers in the Web hosting service is
closest to C (Such information can be obtained
from routing database). - If client C1 and C2 share the same closest server
P, all access requests for file F jointly
registered. - When the number of requests for a specific file F
drops below a certain threshold, that file can be
removed from the server.
13SERVER-INITIATED REPLICAS
- Server-initiated replicas are generally used for
placing read-only copies.
14CLIENT-INITIATED REPLICAS
- Client-initiated replicas are more commonly known
as client caches. A cache is a local storage
facility that is used by a client to temporarily
store a copy of data. - Managing cache is left to the client. However,
client can rely on server to inform when cache
has become stale. - When most operations involve only reading data,
performance can be improved by letting the client
store requested data in nearby cache. Such a
cache can be located on the clients machine or
on a separate machine in the same LAN. - Whenever requested data can be fetched from the
local cached, a cache hit is said to have occured.
15CONTENT DISTRIBUTION
- There are three ways to propagate the updated
content to the replica servers - Propagate only a notification of an update
Other copies are informed that an update has
taken place, and the data they contain is no
longer valid. The main advantage here is that use
of little network bandwidth, and works best when
there are many update operations compared to read
operations, that is read-to-write ratio is
relatively small. - Transfer data from one copy to another It is
useful when read-to-write ratio is relatively
high. In that case, the probability that an
update will be effective in the sense that the
modified data will be read before the next update
takes place is high.
16CONTENT DISTRIBUTION
- Propagate the update operation from one copy to
another Tell each replica which update
operation it should perform (sending only
parameter values that those operations need).
This approach, also referred as active
replication assumes that each replica is
represented by a process capable of actively
keeping its associated data up to date.
17ACTIVE REPLICATION
- Active replication requires that operations need
to be carried out in the same order everywhere. - Such an ordering can be achieved using a central
coordinator, also called a sequencer. - Each operation is forwarded to the sequencer
which assigns it a unique number and then
forwards the operation to all replicas.
18PULL versus PUSH PROTOCOLS
- Yet another design issue is whether updates are
pulled or pushed. - In a push-based approach, also referred as
server-based protocols, updates are propagated to
other replicas without those replicas asking for
the updates. Push-based approach is used when
replicas need to maintain a high degree of
consistency i.e. replicas need to be kept
identical. The server needs to keep track of all
client caches. A Web server may need to keep
track of tens of thousands of client caches. - In pull-based approach, a server or client
requests another server to send it any updates it
has at that moment. This approach, also called
client-based protocols are often used by client
caches, for example by Web caches.
19PULL versus PUSH PROTOCOLS
- A comparison between push-based and pull-based
protocols - in the case of multiple-client, single-server
systems.
20ELECTION ALGORITHMS
- Many distributed algorithms require one process
to act as coordinator, initiator, or perform some
special role. There are algorithms for electing a
coordinator. - We will assume that each process has a unique
number, for example, its network address (for
simplicity, we will assume one process per
machine). - Furthermore, we also assume that every process
knows process number of every other process. What
the processes do not know is which processes are
currently running and which ones are down. - In general, election algorithms attempt to locate
the process with the highest process number and
designate it as coordinator. - The goal of an election algorithm is to ensure
that when an election starts, it concludes with
all processes agreeing on who the new coordinator
is to be.
21BULLY ALGORITHM
- When any process notices that the coordinator is
no longer responding to requests, it initiates an
election. A process P, holds an election as
follows - 1. P sends an ELECTION message to all processes
with higher numbers. - 2.If no one responds, P wins the election and
becomes the coordinator. - 3. If one of the higher-ups answers, it takes
over. - At any moment, a process can get an ELECTION
message from one of the lower-numbered processes,
and it sends an OK message back. - It holds another election in the same manner.
Eventually, the highest numbered process will be
new coordinator. - The biggest guy wins, that is why it is called
the bully algorithm.
22BULLY ALGORITHM
The bully election algorithm. (a) Process 4 holds
an election. (b) Processes 5 and 6 respond,
telling 4 to stop. (c) Now 5 and 6 each hold an
election.
23BULLY ALGORITHM
- The bully election algorithm. (d) Process 6
tells 5 to stop. - (e) Process 6 wins and tells everyone.
24RING ALGORITHM
- Another election algorithm is based on the use of
a ring. Unlike some ring algorithms, this one
does not use a token. - We assume that the processes are physically or
logically ordered, so that each process knows who
its successor is. - When any process notices that the coordinator is
not functioning, it builds an ELECTION message
containing its own process number. - If the successor is down, it skips over and goes
to the next member along the ring. - At each step, the sender adds its own process
number to the list making itself a candidate to
be elected.
25RING ALGORITHM
- Eventually, the message gets back to the process
that started it all. The process recognizes this
event when it receives an incoming message
containing its own process number. - The message type is changed to COORDINATOR and
circulated once again, this time to inform
everyone else who the coordinator is (the list
member with the highest number) and who the
members of the new ring are.
26RING ALGORITHM
- Election algorithm using a ring.
27NETWORK TIME PROTOCOL
- The Network Time Protocol (NTP) is a protocol
built on top of TCP/IP used to synchronize clocks
of distributed systems. NTP uses the UDP protocol
on port 123 to communicate between clients and
servers.
28THE BERKELEY ALGORITHM
- In many algorithms such as NTP, the time server
is passive. Other machines ask it for time, and
it responds to their queries. - In Berkeley algorithm, time server (time deamon)
is active, polling every machine from time to
time to ask what time it is there. - Based on the answers, it computes and average
time and tells all the other machines to adjust
their clocks. - The time servers clock is set manually.
29THE BERKELEY ALGORITHM
30THE BERKELEY ALGORITHM
- (c) The time daemon tells everyone how to adjust
their clock.