Title: ICS362 Distributed Systems
1ICS362 Distributed Systems
2Review
- The last topic was Naming
- Names for entities organised in namespaces.
- Addresses
- Identifiers
- Names
- The difficulty of naming with mobile entities
- E.g. Mobile IP
- Garbage Collection
3This Week
- Synchronisation
- Clock Synchronisation
- Logical Clocks
- Global State
- Election Algorithms
- Mutual Exclusion
- Distributed Transactions
4Sychronisation
- With a single processor, a timing request can be
simply directed - A system call to the kernel returns system time.
- Sequences are easy to identify
- Message m1 arrived before Message m2.
- With distributed systems, there is no global
time, and many messages overlapping each other.
5Why Synchronise?
- Often important to control access to a single,
shared resource. - Also often important to agree on the ordering of
events. - Synchronisation in Distributed Systems is much
more difficult than in uniprocessor systems.
6Clock Synchronisation
- When each machine has its own clock, an event
that occurred after another event may
nevertheless be assigned an earlier time on
another remote device. MAKE will not call the
compiler for the newer version of the output.c
program, even though it is newer.
7The problem
- Achieving agreement on time in a Distributed
System is not trivial. - Is it even possible to synchronise all the clocks
in a Distributed System? - With multiple computers, clock skew ensures
that no two machines have the same value for the
current time. But, how do we measure time?
8What is Time?
- It has over 60 definitions in dictionary.com
alone! - All of which could be described as human methods
of control a way of coordinating actions, or
enabling a landlord to collect the rent. - Therefore time for humans requires some
consensus. - But humans havent always agreed on time
consider the conflicts when Pope Gregory decided
to remove 10 days in the year in 1582, when he
developed the Gregorian calendar.
9We have all the time in the world.
- How many days in a year?
- 365? Well, not according to early astrologers,
and technically not even true today! - What is a year?
- The period while the earth makes a complete orbit
of the sun. - What is a day?
- Traditionally the transit of the sun, i.e. the
period between the sun being at its highest point.
10Were dragging baby
- The earths spin is slowing down, so while there
was once 400 days per year, were making less
spins per year. - Solution the atomic clock.
- Invented in 1948, the atomic clock counts the
number of transitions of a cesium 133 atom. The
cesium 133 atom is very stable, and independent
of the earths motion, and it enabled physicists
to define a second as 9,192,631,770 transitions
of the atom. (This was the number of transitions
in a measured second in 1948.
11The beginning of Time
- To improve time measurements, 50 cesium clocks
were placed in labs around the world, and their
transitions are counted and then divided by
9,192,631,770 and periodically reported to the
Bureau International de lHeure. - Since midnight January 1st 1958 the mean result
of these cesium clocks has been considered the
right time well at least the International
Atomic Time (TAI).
12International Atomic Time
- There is still a problem. Currently 86,400 TAI
seconds is about 3msec shorter than a solar day. - While we can sleep soundly at night knowing that
we have a precisely accurate timing system, in a
few years when we wake up in the morning we might
find that noon has already passed! - So, whenever the difference between TAI and solar
time grows to 800msec, BIH introduces leap seconds
13So what is the time?
- Several short wave radio stations issue signals
every second with call letter WWV. - This radio receiver can be tracked and if precise
time is needed it can be achieved. - If one machine in the Distributed System has a
WWV receiver then the goal is keeping all other
machines in sync with it. - If no machine has a WWV receiver the goal is
keeping all machines together as best we can.
14Drift
- Suppose each machine has a timer, which creates
an interrupt H times per second. - An interrupt handler adds 1 to the software
clock, C, each time the timer ticks. - If t is the correct time, then in an ideal world
- dC/dt 1.
- But in reality a fast clock has dC/dt 1? and a
slow clock has dC/dt 1-?. - So at time ?t after sync 2 clocks could be 2??t
different. - If we need the clocks not to differ by more than
d, the they need to be resynced at least every
d/2? seconds
15Christians Algorithm
Assuming we have a time server (perhaps with a
WWV receiver) each machine sends a request to the
time server at least every d/2? seconds.
16Christians Algorithm
- Problem 1
- First, time cant go backwards so a fast clock
cant simply be changed back to an earlier time
imagine the complications of 2 files being
compiled sequentially, but the clock change
meaning the 2nd one is time stamped before the
first one. - Normally changes to the time are introduced
gradually by adding or taking from the interrupt
interval.
17Christians Algorithm
- Problem 2
- The time server request will take a nonzero
amount of time which can vary according to the
network load. - Christians Algorithm attempts to measure this
problem, by (T1-T0 I)/2 where T1 and T0 can be
measured on the same clock, and I estimated by
the server. - Taking measurements over time improves
reliability.
18Berkeley Algorithm
- While the time server in Christians algorithm is
passive, Berkeleys algorithm takes a different
approach where the time daemon polls every
machine from time to time to ask what time it is.
From this average time can be calculated.
19Logical Clocks
- All very interesting, but for many purposes the
key is for a consensus or agreement about time to
be reached. - It doesnt matter if its really 1000 if we all
agree that it is 1002 or even 0101101. - Whats important is that the processes in the
Distributed System agree on the ordering in which
certain events occur. - Such clocks are referred to as Logical Clocks,
based on relative time.
20Lamports Timestamps
- First point if two processes do not interact,
then their clocks do not need to be synchronized
they can operate concurrently without fear of
interferring with each other. - Second (critical) point it does not matter that
two processes share a common notion of what the
real current time is. What does matter is that
the processes have some agreement on the order in
which certain events occur. - Lamport used these two observations to define the
happens-before relation (also often referred to
within the context of Lamports Timestamps).
21Happens Before
- If A and B are events in the same process, and A
occurs before B, then we can state that - A happens-before B is true.
- A?B
- Equally, if A is the event of a message being
sent by one process, and B is the event of the
same message being received by another process,
then A happens-before B is also true. - (Note that a message cannot be received before
it is sent, since it takes a finite, nonzero
amount of time to arrive and, of course, time
is not allowed to run backwards).
22Happens Before(2)
- Obviously, if A happens-before B and B
happens-before C, then it follows that A
happens-before C. - A?B and B?C so A?C.
- If the happens-before relation holds,
deductions about the current clock value on
each DS component can then be made. - It therefore follows that if C(A) is the time on
A, then C(A)
23Happens Before(3)
- Assume three processes are in a DS A, B and C.
- All have their own physical clocks (which are
running at differing rates due to clock skew,
etc.). - A sends a message to B and includes a
timestamp. - If this sending timestamp is less than the time
of arrival at B, things are OK, as the
happens-before relation still holds (i.e. A
happens-before B is true). - However, if the timestamp is more than the time
of arrival at B, things are NOT OK (as A
happens-before B is not true, and this cannot
be as the receipt of a message has to occur after
it was sent).
24Happens Before(4)
- The question to ask is
- How can some event that happens-before some
other event possibly have occurred at a later
time?? - The answer is it cant!
- So, Lamports solution is to have the receiving
process adjust its clock forward to one more than
the sending timestamp value. This allows the
happens-before relation to hold, and also keeps
all the clocks running in a synchronised state.
The clocks are all kept in sync relative to each
other.
25Problem
- Updating a replicated database and leaving it in
an inconsistent state Update 1 adds 100 baht to
an account, Update 2 calculates and adds 1
interest to the same account. Due to network
delays, the updates may not happen in the correct
order!
26Solution
- A multicast message is sent to all processes in
the group, including the sender, together with
the senders timestamp. - At each process, the received message is added to
a local queue, ordered by timestamp. - Upon receipt of a message, a multicast
acknowledgement/timestamp is sent to the group. - Due to the happens-before relationship holding,
the timestamp of the acknowledgement is always
greater than that of the original message.
27Totally Ordered Multicasting
- Only when a message is marked as acknowledged by
all the other processes will it be removed from
the queue and delivered to a waiting application. - Lamports clocks ensure that each message has a
unique timestamp, and consequently, the local
queue at each process eventually contains the
same contents. - In this way, all messages are delivered/processed
in the same order everywhere, and updates can
occur in a consistent manner.
28Totally Ordered Multicasting
- Update 1 is time-stamped and multicast. Added to
local queues. - Update 2 is time-stamped and multicast. Added to
local queues. - Acknowledgements for Update 2 sent/received.
Update 2 can now be processed. - Acknowledgements for Update 1 sent/received.
Update 1 can now be processed. - (Note all queues are the same, as the timestamps
have been used to ensure the happens-before
relation holds.)
29Global State
- Related to synchronisation is the concept of a
global state - The state of the entire distributed system
- It may be useful to know the entire global state
of a system - For instance when performing garbage collection
as introduced last week. - Or to determine if a system has hung or
completed correctly. - A way of determining global state is to take a
distributed snapshot of the system.
30Distributed Snapshots
- To take a snapshot, we need to know the state of
each processor within the system. - So, any process can request a snapshot, by asking
for the current state of the other processors. - Clearly a message that has been sent but not yet
received is acceptable, but a message received
but not yet sent should not be possible. - Logical clocks can assist with this problem.
31Election Algorithms
- Many Distributed Systems require a process to act
as coordinator (for various reasons). The
selection of this process can be performed
automatically by an election algorithm. - For simplicity, we assume the following
- Processes each have a unique, positive
identifier. - All processes know all other process identifiers.
- The process with the highest valued identifier is
duly elected coordinator. - When an election concludes, a coordinator has
been chosen and is known to all processes.
32Why Elections?
- The overriding goal of all election algorithms is
to have all the processes in a group agree on a
coordinator. - There are two types of algorithm
- Bully the biggest guy in town wins.
- Ring a logical, cyclic grouping.
33Bully Elections
- When a process notices that the current
coordinator is no longer responding (4 deduces
that 7 is down), it sends out an ELECTION message
to any higher numbered process. - If none respond, it (ie. 4) becomes the
coordinator (sending out a COORDINATOR message to
all other processes informing them of this change
of coordinator). - If a higher numbered process responds to the
ELECTION message with an OK message, the election
is cancelled and the higher-up process starts its
own election (5 and 6 in this example both start,
with 6 eventually winning). - When the original coordinator (ie. 7) comes back
on-line, it simply sends out a COORDINATOR
message, as it is the highest numbered process
(and it knows it). - Simply put the process with the highest numbered
identifier bullies all others into submission.
34Ring Elections
- The processes are ordered in a logical ring,
with each process knowing the identifier of its
successor (and the identifiers of all the other
processes in the ring). - When a process notices that a coordinator is
down, it creates an ELECTION message (which
contains its own number) and starts to circulate
the message around the ring. - Each process puts itself forward as a candidate
for election by adding its number to this message
(assuming it has a higher numbered identifier). - Eventually, the original process receives its
original message back (having circled the ring),
determines who the new coordinator is, then
circulates a COORDINATOR message with the result
to every process in the ring. - With the election over, all processes can get
back to work.
35Mutual Exclusion in Distributed Systems
- It is often necessary to protect a shared
resource within a Distributed System using
mutual exclusion for example, it might be
necessary to ensure that no other process changes
a shared resource while another process is
working with it. - In non-distributed, uniprocessor systems, we can
implement critical regions using techniques
such as semaphores, monitors and similar
constructs thus achieving mutual exclusion.
36Distributed Mutual Exclusion Techniques
- Centralised a single coordinator controls
whether a process can enter a critical region. - Distributed the group confers to determine
whether or not it is safe for a process to enter
a critical region. - Token Ring All processes are given turns at
entering the critical region.
37Centralised Algorithm
- Process 1 asks the coordinator for permission to
enter a critical region. Permission is granted
by an OK message (assuming it is, of course, OK). - Process 2 then asks permission to enter the same
critical region. The coordinator does not reply
(but adds 2 to a queue of processes waiting to
enter the critical region). No reply is
interpreted as a busy state for the critical
region. - When process 1 exits the critical region, it
tells the coordinator, which then replies to 2
with an OK message.
38Centralised Algorithm
- Advantages
- It works.
- It is fair.
- Theres no process starvation.
- Easy to implement.
- Disadvantages
- Theres a single point of failure!
- The coordinator is a bottleneck on busy systems.
- Critical Question When there is no reply, does
this mean that the cordinator is dead or just
busy?
39Distributed Algorithm
- When a process (the requesting process) decides
to enter a critical region, a message is sent to
all processes in the Distributed System
(including itself). - What happens at each process depends on the
state of the critical region. - If not in the critical region (and not waiting to
enter it), a process sends back an OK to the
requesting process. - If in the critical region, a process will queue
the request and send back no reply to the
requesting process. - If waiting to enter the critical region, a
process will - Compare the timestamp of the new message with
that in its queue (note that the lowest timestamp
wins). - If the received timestamp wins, an OK is sent
back, otherwise the request is queued (and no
reply is sent back). - When all the processes send OK, the requesting
process can safely enter the critical region. - When the requesting process leaves the critical
region, it sends an OK to all the process in its
queue, then empties its queue.
40Distributed Algorithm
- Process 0 and 2 wish to enter the critical region
at the same time. - Process 0 wins as its timestamp is lower than
that of process 2. - When process 0 leaves the critical region, it
sends an OK to 2.
41Distributed Algorithm
- The algorithm works because in the case of a
conflict, the lowest timestamp wins as everyone
agrees on the total ordering of the events in the
distributed system. - Advantages
- It works.
- There is no single point of failure
- Disadvantages
- We now have multiple points of failure!!!
- A crash is interpreted as a denial of entry to
a critical region. - (A patch to the algorithm requires all messages
to be ACKed). - Worse is that all processes must maintain a list
of the current processes in the group (and this
can be tricky) - Worse still is that one overworked process in the
system can become a bottleneck to the entire
system so, everyone slows down.
42? Distributed ?
- It isnt always best to implement a distributed
algorithm when a reasonably good centralised
solution exists. - Whats good in theory (or on paper) may not be so
good in practice. - Think of all the message traffic this distributed
algorithm is generating (especially with all
those ACKs). Remember every process is involved
in the decision to enter the critical region,
whether they have an interest in it or not.
43Token Ring Algorithm
- An unordered group of processes on a network.
Note that each process knows the process that is
next in order on the ring after itself. - A logical ring is constructed in software, around
which a token can circulate a critical region
can only be entered when the token in held. When
the critical region is exited, the token is
released.
44Token Ring Algorithm
- Advantages
- It works (as theres only one token, so mutual
exclusion is guaranteed). - Its fair everyone gets a shot at grabbing the
token at some stage. - Disadvantages
- Lost token! How is the loss detected (is it in
use or is it lost)? How is the token
regenerated? - Process failure can cause problems a broken
ring! - Every process is required to maintain the current
logical ring in memory not easy.
45Comparison
- None are perfect they all have their problems!
- The Centralized algorithm is simple and
efficient, but suffers from a single
point-of-failure. - The Distributed algorithm has nothing going for
it it is slow, complicated, inefficient of
network bandwidth, and not very robust. It
sucks! - The Token-Ring algorithm suffers from the fact
that it can sometimes take a long time to reenter
a critical region having just exited it. - All perform poorly when a process crashes, and
they are all generally poorer technologies than
their non-distributed counterparts. Only in
situations where crashes are very infrequent
should any of these techniques be considered.
46Distributed Transactions
- Related to Mutual Exclusion, which protects a
shared resource. - Transactions protect shared data.
- Often, a single transaction contains a collection
of data accesses/modifications. - The collection is treated as an atomic
operation either all the collection complete,
or none of them do. - Mechanisms exist for the system to revert to a
previously good state whenever a transaction
prematurely aborts.
47Transaction Primitives
48Grouping Transactions
- Suppose a transaction has 2 phases
- Step 1 Withdraw amount a from account 1.
- Step 2 Deposit amount a into account 2.
- What happens if the connection is broken after
the first step, but before the second? - We want either neither or both actions to be
processed.
49Example
In (a) the process completes. In (b), the final
flight is full so its likely that we wont want
to reserve flights from Chiang Mai to Bangkok, or
Bangkok to Nairobi if we cant book the final
connection.
50Transaction ACID
- Four key transaction characteristics
- Atomic the transaction is considered to be one
thing, even though it may be made of up many
different parts. - Consistent invariants that held before the
transaction must also hold after its successful
execution. - Isolated if multiple transactions run at the
same time, they must not interfere with each
other. To the system, it should look like the
two (or more) transactions are executed
sequentially (i.e., that they are serializable). - Durable Once a transaction commits, any changes
are permanent.
51Transaction Types
- Flat Transaction this is the model that we have
looked at so far. Disadvantage is its too
rigid, partial results cannot be committed.
i.e., the atomic nature of Flat Transactions
can be a downside. - Nested Transaction a main, parent transaction
spawns child sub-transactions to do the real
work. Disadvantage problems result when a
sub-transaction commits and then the parent
aborts the main transaction. - Distributed Transaction this is sub-transactions
operating on distributed data stores.
Disadvantage complex mechanisms required to lock
the distributed data, as well as commit the
entire transaction.
52Nested / Distributed Transactions
- A nested transaction logically decomposed into
a hierarchy of sub-transactions. - A distributed transaction logically a flat,
indivisible transaction that operates on
distributed data.
53Summary
- Synchronisation doing the right thing at the
right time. - No real notion of a globally shared physical
clock. - Clock-synchronisation algorithms exist to try and
provide such a facility. - Often not necessary to know the time, more
important to be sure that things happen in the
correct order. This leads to the notion of
logical clocks (Lamports Timestamps). - Synchronisation also achieved by one process
acting as coordinator. Dynamic election
algorithms have been developed to allow
Distributed Systems to automatically select the
coordinator. - Distributed Mutual Exclusion is possible (in
theory), but performs poorly on real networks. - Related to Mutual Exclusion is the notion of a
Transaction, of which there are three main types
flat, nested and distributed. Important
characteristics A.C.I.D., i.e.., Atomic,
Consistent, Isolated and Durable.