Coordination

About This Presentation

Title:

Coordination

Description:

When a process notices that the coordinator fails, it holds an election: 1. P sends an ELECTION (E-message) to all processes with higher numbers ... – PowerPoint PPT presentation

Number of Views:63

Avg rating:3.0/5.0

Slides: 67

Provided by: baobao3

Category:

Tags: coordination

more less

Transcript and Presenter's Notes

Title: Coordination

1
Chapter 8

Coordination

2
Topics

Election algorithms
Mutual exclusion
Deadlock
Transaction

3
Election Algorithms

This is the way nodes in a DS electing a new
coordinator when the old one failed or was cut
out of the network
In the following algorithms, each processor
(node) has a unique ID. Communications are
reliable (messages are not dropped or corrupted).

4
Requirements

Safety each process Pi has coordinator null or
coordinator P, where P is the live process
Liveness each process Pi eventually has
coordinator ? null or it has failed.

5
The Bully Algorithm

(Garcia-Molina) Node with highest ID bullies his
way into leadership.
When a process notices that the coordinator
fails, it holds an election
1. P sends an ELECTION (E-message) to all
processes with higher numbers
2. If no one responds, P wins the election and
becomes coordinator.
3. If one of the higher-ups answers, say Q, it
takes over. Ps job is done.

6
An Example

Process 4 holds an election
Process 5 and 6 respond, telling 4 to stop
Now 5 and 6 each hold an election

7
An Example (Cont.)

Process 6 tells 5 to stop
Process 6 wins and tells everyone

8
The Cost

In a network of N nodes, assume the coordinator
with ID N fails
If the process with ID (N-1) starts an election,
the cost is O(N) messages
If the lowest numbered node starts an election,
the cost is O(N2)

9
A Ring Election Algorithm

Nodes are physically or logically organized in a
ring.
Nodes know their successors.
Node states are Normal, Election, Leader.
Any node that notices that the leader is not
functioning, changes his state to Election,
starts an election message containing his ID and
sends it to his clockwise neighbor.

10
An Example
11
A Ring Election Algorithm (2)

When a node receives an election message
Add its ID to the message, send it to the
successor
If the message contains its own ID, it sends a
CORDINATOR message, which contains the list
member with the highest number as the
coordinator. This message circulates once.

12
An Example
13
An Example (Cont.)
14
An Example (Cont.)
15
Complexity

In the best case, only one node starts an
election message, so the number of messages is
2N.
In the worst case, N nodes start an election
message resulting in O(N2).
Improvements
Drop election messages arriving in less than time
?, where ? is the time a message takes to
traverse the ring.
Does it work?

16
LCR Ring Election

Each node sends a message with its ID around the
ring. When a process receives an incoming
message, it compares the ID with its own. If the
incoming ID is greater than its own, it passes it
to the next node if it is less than its own, it
discards it if it is equal to its own, it
declares itself leader.

3
Elect 0
Elect 3
Elect 5
0
5
17
Complexity
2

If messages are passed clockwiseonly one
survives after the first round.
If messages are passed counter-clockwise...
Best case O(N), worst case O(N2).

Elect 2
Elect 1
1
3
Elect 3
Elect 0
0
18
HS (Hirschberg Sinclair) Ring Election (1)

Motivation O(N2) is a lot of messages. Improve
it to O(N log N).
Assumptions the ring size can be unknown. The
communications must be bidirectional. All nodes
start more or less at the same time. Each node
operates in phases and sends out tokens. The
tokens carry hop-counts and direction flags in
addition to the ID of the sender.

ID3,2 hops Counter-clckws
ID3 2 hops clockwise
3
19
HS Ring Election (2)

Phases are numbered 0, 1, 2, 3, ?log2N?. In
each phase, k, node j sends out tokens uj
containing its ID in both directions.
The tokens travel 2k hops then return to their
origin j.
Travel only the distance of 2k
If both tokens make it back, process j continues
with the next phase (increments k). If both
tokens do not make it back, process j simply
waits to be told who the results of the election.

Outbound
x
3
x
Inbound
20
HS Ring Election (3)

All processes always relay inbound tokens.
If a process i receives a token uj going in the
outbound direction, it compares the tokens ID
with its own.
If it has a larger ID, it simply discards the
token.
If it has a smaller ID, it relays the token as
requested.
If it is equal to the token ID, it has received
its own token in the outbound direction, so the
token has gone clear around the ring and the
process declares itself leader.

ID3, 2 hops clockwise
4
21
Complexity

Communications Complexity In the first phase,
every process sends out 2 tokens and they go one
hop and return. This is a total of 4N messages
for the tokens to go out and return.
In phase k, where kgt0, a node sends out tokens if
it was not overruled in the previous phase, that
is by a process within a distance of 2k-1 in
either direction. This implies that within group
of 2k-11consecutive nodes, at most one goes on
to send out tokens in phase k.
This limits the message complexity to O(N log N).

22
Mutual Exclusion in DS

Mutual exclusion is needed for restricting access
to a shared resource.
We use semaphores, monitors and similar
constructs to enforce mutual exclusion on a
centralized system.
We need the same capabilities on DS.
As in the one processor case, we are interested
in safety (mutual exclusion), progress, and
bounded waiting (fairness).

23
Solutions

Centralized lock manager
Token-passing lock manager
Distributed lock manager
Ricard/Agrawala Algorithm
Voting
Quorum

24
A Centralized Algorithm
a) Process 1 asks the coordinator for permission
to enter a critical region. Permission is
granted b) Process 2 then asks permission to
enter the same critical region. The coordinator
does not reply. c) When process 1 exits the
critical region, it tells the coordinator, when
then replies to 2
25
Problems with Centralized Locking?
Other issues?
26
The Token Ring Algorithm

Assumption Processes are ordered in a ring.
Communications are reliable and can be limited to
one direction.
Size of ring can be unknown and each process is
only required to know his immediate neighbor.
A single token circulates around the ring (in one
direction only).

3
0
token
5
27
Algorithm Details

When a process has the token, he can enter the CR
at most once. Then he must pass the token on.
Only the process with the token can enter the CR,
thus Mutual Exclusion is ensured.
Bounded waiting since the token circulates.
Liveness as long as the process with the token
doesnt fail, progress in ensures. Global
snapshots can be used if a lost token is
suspected.

3
0
token
5
28
Problems with Token-Algorithm

1. How to distinguish if token is lost or if it
is used very long?
2. What happens if token-holder crashes for some
time?
3. How to maintain a logical ring if a
participant drops out (voluntarily or by failure)
of the system?
4. How to identify and add new participants?
5. Token is perpetually passed over the ring even
when none of the participants wants to enter its
CS ? unnecessary overhead consuming bandwidth
6. Ring imposes an average delay of N/2 hops
limiting scalability

29
Distributed Algorithm Ricart and Agrawala
Timestamp Algorithm

Assumption there is a total ordering of all
events in the system (Lamports timestamps will
provide this).
Communications are reliable.
Each process must maintain a queue for each
critical region or resource if there is more than
one resource to be shared.

resource
0
1
2
30
Ricart and Agrawala (2)

When a process wants to enter the Critical Region
or obtain a resource, it sends a message with its
ID and a Lamport timestamp (t, pid) to all other
processes.
It can proceed to enter the CR when it gets an
OK message from all other processes.
When it is done with the CR, it sends an OK
message to every process on its wait queue and
removes them from the queue.

31
Ricart and Agrawala (3)

When a process, P1, receives a request for the
resource from process, P2
If P1 is not in the CR and does not want the CR,
it sends back an OK message.
If P1 is currently in the CR, it does not reply,
but queues P2s request.
If P1 wants to enter the CR but has not yet
received all the permissions, it compares the
timestamp in P2s message with the one in the
message that P1 sent out to request the CR. The
lowest timestamp wins.
If TS(P1) lt TS(P2), then P2s message is put on
the queue.
If TS(P1) gt TS(P2), then P1 sends P2 an OK
message.

32
Ricart and Agrawala (4)

Two processes want to enter the same critical
region at the same moment.
Process 0 has the lowest timestamp, so it wins.
When process 0 is done, it sends an OK also, so 2
can now enter the critical region.

33
Analysis

No tokens anymore
Cooperative voting to determine sequence of CSs
Does not rely on an interconnection media
offering ordered messages
Serialization based on logical time stamps (
total ordering)
If a participant wants to enter its CS it asks
all others for permission and does not proceed
until all others have agreed
If a participant gets a permission request and is
not interested in its CS, it returns permission
immediately to the requester.
Message complexity 2(N-1).
Algorithm ensures
mutual exclusion (no 2 have the lowest timestamp)
progress (someone has the lowest timestamp)
bounded waiting

34
Voting for Mutual Exclusion

Potential problems You must be sure you have
more votes than any other process to enter the
CR if P1 has 4 and P2 has 3 and P3 has 2, P1 has
the most votes, but how does he know without
communicating (costly) with other contenders?
Just having 4 votes is not enough what if P1 has
4 and P2 has 5 ?
Potential solution require a simple majority to
win. But 4 is not a majority of 9, so in this
example, no one can go. Worse processes are
deadlocked.
Must be a way to resolve this kind of deadlock.

35
Timestamp Resolution

When a process makes a request, it attaches a
Lamport timestamp. Voters will prefer candidates
with the smaller timestamp.
If voter V has voted for P1 and then receives a
request for vote from P2 with an earlier
timestamp, V will try to retrieve its vote. V
retrieves his vote by sending an INQUIRE message
to P1. If P1 has not yet received all the needed
votes, he must relinquish Vs vote, in which
case, V now gives his vote to P2. This avoids
deadlock.
When the P1 is finished with the CR, he sends
release messages to all his voters, so they can
give their votes to new candidates.

36
Anti-quorum Resolution

An anti-quorum is any set of nodes that has a
non-empty intersection with all quorums.
A voter votes YES to one process and NO to other
processes seeking the same resource.
When process gets a quorum of YES votes proceeds
to the CR. When he gets an anti-quorum of NO
votes, he knows he will not get enough YES votes,
so he withdraws his candidacy and releases his
votes.
After waiting a specified time, he tries again to
gain enough votes.

37
Quorums

Do we need to get a majority of votes or is there
some smaller set of votes that will do?
Different nodes could have different voting
districts as long as any two districts have a
non-empty intersection.
Quorums have the property that any 2 have a
non-empty intersection.
Simple majorities are quorums. Any 2 sets whose
sizes are simple majorities must have at least
one element in common.

38
Quorums (2)

Grid quorum arrange nodes in logical grid
(square). A quorum is all of a row and all of a
column. Quorum size is 2sqrt(n) 1.
Finite Projective Plane (Maekawa) if N7, form
coteries of 3

39
Comparison
40
Transaction Property

Atomicity. Either all operations of the
transaction are properly reflected in the
database or none are.
Consistency. Execution of a transaction in
isolation preserves the consistency of the
database.
Isolation. Although multiple transactions may
execute concurrently, each transaction must be
unaware of other concurrently executing
transactions. Intermediate transaction results
must be hidden from other concurrently executed
transactions.
Durability. After a transaction completes
successfully, the changes it has made to the
database persist, even if there are system
failures.

41
Example Funds Transfer

Transaction to transfer 50 from account A to
account B
1. read(A)
2. A A 50
3. write(A)
4. read(B)
5. B B 50
6. write(B)
Consistency requirement the sum of A and B is
unchanged by the execution of the transaction.
Atomicity requirement if the transaction fails
after step 3 and before step 6, the system
ensures that its updates are not reflected in the
database.

42
Example Funds Transfer continued

Durability requirement once the user has been
notified that the transaction has completed
(i.e., the transfer of the 50 has taken place),
the updates to the DB must persist despite
failures.
Isolation requirement if between steps 3 and 6,
another transaction is allowed to access the
partially updated database, it will see an
inconsistent database (the sum A B will be less
than it should be).Can be ensured by running
transactions serially.

43
The Transaction Model
44
Transaction Types

Flat transactions
No partial results available
A nested transaction is a transaction that is
logically decomposed into a hierarchy of
sub-transactions.
Allow partial results to be committed
A distributed transaction is a logically flat
indivisible transaction that operates on
distributed data.

45
Distributed Transactions Illustration
46
Private Workspace

The file index and disk blocks for a three-block
file
The situation after a transaction has modified
block 0 and appended block 3
After committing

Q the cost of copying data?
47
More Efficient Implementation

Two common methods of implementation are
write-ahead logs and before/after images.
With write-ahead logs, the transactions act on
the permanent workspace, but before they can make
a change, a log record is written to stable
storage with the transaction and data item ID and
the old and new values.
This log can then be used if the transaction
aborts and the changes need to be rolled back.

48
Write-ahead Log

a) A transaction
b) d) The log before each statement is executed

49
Before- and After- Images

A before- and after-image is kept for each data
item.
When a data item is changed, the old value is
written to the before-image and the new value is
the after-image.
Other transactions are not allowed to see the
new value until the current transaction commits.
The after-image is made permanent and durable
once the transaction which wrote it commits.
If the transaction aborts, the before-image is
restored.

50
DBMS Organization

General organization of managers for handling
transactions.

51
DBMS Organization
52
Levels of Consistency (SQL92)

Serializable default
Repeatable read only committed records to be
read, repeated reads of same record must return
same value. However, a transaction may not be
serializable.
Read committed only committed records can be
read, but successive reads of record may return
different (but committed) values.
Read uncommitted even uncommitted records may
be read (browse).

53
Serializability
54
Two-Phase Locking (2PL)
55
Strict 2PL
56
Pessimistic Timestamp Ordering

Target enforce serializability
Every transaction gets a (Lamport, totally
ordered) timestamp.
Every data item has a read ts and a write ts and
a commit bit c.
The commit bit c is true if and only if the most
recent transaction to write to that item has
committed.
The scheduler maintains the item timestamps and
checks to make sure the reads and writes are
correct.

57
Read Too Late

T1 tries to read X, but ts(T1) lt write-ts(X)
meaning X has been written to by a later
transaction.
T1 should not be allowed to read X because it was
written by a transaction that occurs later in the
serialization order (transactions are serialized
by start time).
Solution T1 is aborted.

58
Write Too Late

T1 tries to write X, but the read-ts indicates
that some other transaction should have read the
value about to be written.
Solution T1 is aborted.

59
Dirty Reads

T1 reads X that was last written by T2. The
timestamps are properly ordered, but the commit
bit cfalse so if T2 later aborts then T1 must
abort.
Solution We can avoid cascading aborts by
delaying T1s read until T2 has committed (though
not necessary to ensure serializability).

60
Thomas Write Rule

T2 has written to X before T1. When T1 tries to
write, the appropriate action is to do nothing.
No other transaction T3 that should have read
T1s value of X got T2s value instead, because
it would have been aborted because of a too late
read. Future reads of X want T2s value or a
later value, not T1s value.
Solution T1s write can be skipped.

61
TS Ordering Rules

When scheduler receives a read request from
transaction T,
if ts(T)gt write-ts(X) and c(X) is true, grant
request and set read-ts(X) to
MAXts(T),read-ts(X)
if ts(T)gt write-ts(X) and c(X) is false, delay
T until c(X) becomes true or txn aborts.
If ts(T)lt write-ts(X), abort T and restart with
new timestamp.

62
TS Ordering Rules, continued

When scheduler receives a write request from
transaction T,
if ts(T)gt read-ts(X) and ts(T)gt write-ts(X),
grant request, set write-ts(X) to ts(T) and
c(X)false
if ts(T)gt read-ts(X) and ts(T)lt write-ts(X),
dont do the operation but allow T to continue as
if done (Thomas write rule).
If ts(T)lt read-ts(X), abort T and restart with
new timestamp.

63
Optimistic Timestamp Ordering

In any optimistic concurrency control, each
transaction does its writes to a private
workspace until completion of a validation phase.
In the validate phase, the scheduler validates
the transaction by comparing its read set and
write set with those of other transactions.
After validation, the write set values are
written to the database and the transaction
commits
Validation is frequently done with the help of
timestamps.

64
Two-Phase Commit (2PC)

When several database take part in a single
transaction a protocol called Two-Phase Commit is
used
Each database is assumed to have its own local
resource manager
A single system component called the Coordinator
controls the whole process.

65
Steps