Interprocess Communication and Coordination II

About This Presentation

Title:

Interprocess Communication and Coordination II

Description:

Functional: profession= professor , specialty= computer science) 23 ... Drawbacks (Weird) N points of failure. How to distinguish a dead participant from a refuse ... – PowerPoint PPT presentation

Number of Views:57

Avg rating:3.0/5.0

Slides: 76

Provided by: ccNct

Category:

more less

Transcript and Presenter's Notes

Title: Interprocess Communication and Coordination II

1
Inter-process Communication and Coordination (II)
2
Outline

Message Passing Communication
Request/Reply Communication (Remote Procedure
Call)
Transaction Communication
Name and Directory Services
Distributed Mutual Exclusion
Leader Election

3
Transaction Communication
4
Introduction

Transaction communication
Service-oriented request/reply communication
multicast
Transaction fundamental unit of interaction
between client and server processes in a database
system
Database transaction a sequence of synchronous
request/reply operations that satisfy the ACID
properties
Transaction communication a set of asynchronous
request/reply communications that have the ACID
properties but are without the sequential
constraint of the operations in a database
transaction
Multicast of the same message to replicated
servers

5
The Transaction Model

Transaction to reserve three flights commits
Transaction aborts when third flight is
unavailable

6
The ACID Properties

Achieve concurrency transparency
Allow sharing of objects with interference
The execution of a transaction appears to take
place in a critical section however, operations
from different transactions may be interleaved in
some safe way to achieve more concurrency
ACID tries to achieve consistency

W/O Interleave
With Interleave
7
The ACID Properties (Cont.)

Atomicity consistency of replicated or
partitioned objects
Either all of the operations in a transaction are
performed or none of them are, in spite of
failures
Consistency (Serializability)
The execution of interleaved transactions is
equivalent to a serial execution of the
transactions in some order
Isolation (does not see something that has never
occurred)
Partial results of an incomplete transaction are
not visible to others before the transaction is
successfully committed
Durability (see something that has actually
occurred)
The system guarantees that the results of a
committed transaction will be made permanent even
if a failure occurs after the commitment

8
Two-Phase Commit Atomic Transaction Protocol

Coordinator the processor that initiates the
transaction
Participants all the remaining processors
Unanimous voting scheme ? atomicity
Voting is initiated by the coordinator. All
participants must come to an agreement about
whether to commit or abort the transaction and
must wait for the announcement of the decision
Before a participant can vote to commit, it must
be prepared to perform the commit
A transaction is committed only if all
participants agree and are ready to commit

9
Two-Phase Commit Atomic Transaction Protocol
(Cont.)

Each participant and coordinator maintains a
private work space for keeping track of updated
data objects
Each update contains the old and new value of a
data object
Updates will not be made permanent until the
transaction is finally committed ? isolation
To cope with failures ? flush the updates to a
stable storage
Durability and failure recovery
Two synchronization points pre-commit and commit
Write and flush update logs, and then pre-commit
or commit

10
Two-Phase Commit Atomic Transaction Protocol
(Cont.)
11
Two-Phase Commit Atomic Transaction Protocol
(Cont.)
Failures and recovery actions for the 2PC protocol
12
Two-Phase Commit Atomic Transaction Protocol
(Cont.)
Finite State Machine for Coordinator
Finite State Machine for Participant
13
Two-Phase Commit Atomic Transaction Protocol
(Cont.)

The protocol can easily fail when a process
crashes for other processes may be indefinitely
waiting for a message from that process
Timeout can be used for detecting the failure of
other processes
Coordinator
Blocked in state WAIT ?GLOBAL_ABORT
Participant
Blocked in state INIT ? VOTE_ABORT
Blocked in state READY ? consult the state of
other processes

14
Two-Phase Commit Atomic Transaction Protocol
(Cont.)

Actions taken by a participant P when residing in
state READY and having contacted another
participant Q

If all processes are in READY, block until the
server recover
15
Two-Phase Commit Atomic Transaction Protocol
(Cont.)
16
Two-Phase Commit Atomic Transaction Protocol
(Cont.)
17
Two-Phase Commit Atomic Transaction Protocol
(Cont.)
18
Two-Phase Commit Atomic Transaction Protocol
(Cont.)

2PC is a blocking commit protocol
A participant may need to block until the
coordinator recovers
To avoid blocking
Use a multicast primitive by which a receiver
immediately multicasts a received message to all
other processes
Allow participants to reach a final decision,
even if the coordinator has not yet recovered
Three-phase commit protocol (3PC)
Section 12.1.1

19
Name and Directory Services
20
Name and Address Resolution (I)

Name and directory services look-up operations
Given the name or some attribute of an object
entity, more attribute information is obtained
Object entity services and objects (users,
computers, files)
Name services how a named object can be
addressed and subsequently located by using its
address
Directory services
Special name service (directory service of a file
system)
All kinds of attribute look-ups on different
object types, not just limited to address
information
Sometimes, name and directory services are used
interchangeably

21
Name and Address Resolution (II)

Two stages of resolution process
Name resolution names ? logical addresses
Name application-oriented denotations of objects
(who it is)
Address object representation carrying structure
information relevant to OS (where it can be
found)
Ex. map a server name to its port addresses
Address resolution logical addresses ? network
routes
Address contains intermediate object
identification information between names and
route.
Route are the lowest level of location
information
Ex. Map a server port to its Ethernet port

22
Object Attributes and Name Structure (I)

An object entity is characterized by its
attribute
User affiliation attribute File version
number, creation date
Attributes involved in name resolution process
name and address
Name space collections of names, recognized by a
name service, with their corresponding attributes
and addresses
A name space containing different object class ?
type attribute
Name structure
Flat name how to achieve unique name
Names of concatenated attributes (Hierarchical
structure name)
Names of collections of attributes
(Structure-free name)
Attribute partition for an object name
Physical ltuser.host.networkgt (explicit location
information)
Organizational ltuser.department.organizationgt
(location transparent)
Functional professionltprofessorgt,
specialtyltcomputer science)

23
Object Attributes and Name Structure (II)
24
Name Space and Information Base

DIB and DIT
Directory Information Base (DIB) conceptual data
model for storing and representing object
information
DIT Directory information Tree
Object entity in DIB ? a node in DIT
Distinguished attributes attributes for naming
purpose
Path from object node to root
Suitable for both hierarchical and
attribute-based resolution
Naming domain a subname space for which there is
a single administrative authority for name
management
Naming context a partial subtree of the DIT
The basic units for distribution the DIB to DSA
DSA Directory Service Agents (servers for name
service)

25
Distribution of a DIT
DSA
Naming Context
26
Name Resolution Process (I)

Name resolution process
Initiated by a Directory User Agent (DUA)
The resolution request is sent from one DSA to
another until the object is found in the DIT and
returned to the DUA
Interaction mode among DSAs
Recursive chaining normal mode for structured
name resolution
Transitive chaining
Fewer messages each message carries source DUA
address
Violate RPC and client/server programming
paradigm
Referral DSA suggests another suitable DSA to
DUA
Can be used for name and attribute-based
resolutions
Multicast request is sent to multiple DSAs
concurrently
Suitable when structural information is not
available

27
Name Resolution Interaction Modes
28
Name Resolution Process (II)

Techniques for enhancing the name resolution
performance
Caching recently used names and their addresses
are kept in cache
Replication naming contexts can be duplicated
in different DSAs
Inconsistency issues
Object entries in a directory may be an alias or
a group
Pointers to other object names and are leaf-nodes
in the DIT

29
The DNS Name Space

DNS Internet Domain Name Service
Loop up host addresses and mail servers
DNS name space hierarchical rooted tree
Rootltnl, vu, cs, flitsgt ? flits.cs.vu.nl.
Domain (Naming context) a subtree domain name
a path name to its root node
Node content a collection of resource records
Zone administrative unit (Naming Domain)
Each zone is implemented by a primary name server
and maybe other secondary name server
Zone update is done locally in the DNS definition
files in the primary name server secondary name
servers request the primary server to transfer
its content

30
Type of Resource Records in DNS
31
An Simple Example for DNS
32
Part of the description for the vu.nl domain
which contains the cs.vu.nl domain
33
OSI X.500

DNS given a (hierarchically structured) name,
resolve it to a node in the naming graph and
returns the content of that node in the form of a
resource record
Similar to telephone book for looking up phone
numbers
X.500 directory service ? a client can look for
an entity based on a description of properties
instead of a full name
Similar to yellow page
A simplified version of X.500 Light-weight
Directory Access Protocol (LDAP)

34
The X.500 Name Space

X.500 consists of a number of records (directory
entries)
Each record is made up of a collection of
(attribute, value) pairs
Attribute type
Single-valued and multiple-valued attributes
Directory Information Base (DIB) the collection
of all directory entries in an X.500 directory
service
Unique name for each record is formed by a
sequence of its naming attributes (Relative
Distinguished Name, RDN)
/CNL/OVrije Universiteit/OUMath. Comp. Sc.
Directory Information Tree (DIT) listing RDNs in
sequence leads to a hierarchy of the collection
of directory entries

35
An Simple Example of a X.500 Directory Entry
36
Part of the DIT

Each node represents a directory entry and may
also act as a directory in the traditional sense
(may have children)
Use read to read a single record given its path
name in the DIT
Use list to get the names of all the children of
a node in the DIT

37
Two directory entries having Host_Name as RDN
38
X.500 Implementation

Similar to DNS, but
X.500 supports more lookup operations
Answer search((CNL)(OVrije
Universiteit)(OU)(CNMain Server))
Butexpensivemay access many DSA
Directory Service Agents (DSA) ? Name Server
Naming Domain ? Zone
Each part of a partitioned DIT ? Naming Context ?
Domain

39
Other Issues in Name Service

Name Services for Directories, Files
Locating Mobile Entities
Removing Un-referenced Entities

40
Mutual Exclusion

Ensure that concurrent processes make a
serialized access to shared resources or data

41
Classification

Contention-based mutual exclusion
Centralized mutual exclusion
Distributed mutual exclusion
Timestamp Priority Schemes
Voting scheme (variant of timestamp priority
schemes)
Token-based mutual exclusion
Ring structure
Tree structure
Broadcast structure

42
A Centralized Algorithm (I)

Process 1 asks the coordinator for permission to
enter a critical region. Permission is granted
Process 2 then asks permission to enter the same
critical region. The coordinator does not reply.
When process 1 exits the critical region, it
tells the coordinator, when then replies to 2

43
A Centralized Algorithm (II)

Characteristics
Guarantee mutual exclusion
Fair requests are granted in the order in which
they are received
No starvation
Can be used for more general resource allocation
Drawbacks
Single point of failure
How to distinguish a dead coordinator from
permission denied
Performance bottleneck

44
TimeStamp Prioritized Schemes
3(N-1) for the completion of a CS

Lamports logical clock totally order requests
for entering CS
A process requests the CS by broadcasting a
timestamped REQUEST message to all other
processes (including itself)
Each process maintains a queue of pending REQUEST
messages arranged in the ascending timestamp
order
Once receiving a REQUEST message, a process
inserts the message in its queue and sends a
REPLY message to the requesting process
A process is allowed to enter its CS only if it
has gathered all the REPLY messages and its
request message is at the top of the request
queue.

45
TimeStamp Prioritized Schemes (Cont.)

When exiting the CS, a process broadcasts a
RELEASE message to every process
Once receiving a RELEASE message, a process
removes the completed request from its request
queue.
At that moment, if the process only request is
at the top of the request queue, it enters its CS
provided that all REPLY messages have been
received.
When receiving REQUEST, RELEASE, and REPLY
messages, a process adjusts its logical clock
accordingly

46
TimeStamp Prioritized Schemes Improved

When a process receives a REQUEST message
Not in the critical region AND not want to enter
? OK (REPLY)
Already in the critical region ? No REPLY and
queue the request
After it exits the CS, send REPLY message
Want to enter and not yet don so ? compare the
timestamps of the incoming message with the one
that it sent to everyone
The lowest one win
if incoming message is lower ?OK (REPLY)
Otherwise ? No REPLY and queue the request
2(N-1) for the completion of a CS
REPLY and RELEASE are combined

47
TimeStamp Prioritized Schemes Improved (Cont.)

Two processes want to enter the same critical
region at the same moment.
Process 0 has the lowest timestamp, so it wins.
When process 0 is done, it sends an OK also, so 2
can now enter the critical region.

48
TimeStamp Prioritized Schemes (Cont.)

Characteristics
Guarantee mutual exclusion
No deadlock or starvation
Drawbacks (Weird)
N points of failure
How to distinguish a dead participant from a
refuse
N points of bottleneck
Improvement
A simple majority of other processes ? VOTING
Avoid N points of failure

49
Voting Schemes

As soon as a candidate has a majority of the
votes ? WIN
When a process receives a REQUEST message, it
sends a REPLY (i.e. vote) only if the process has
not voted for any other candidate (requesting
process)
Once a process has voted, it is not allowed to
send any more REPLY messages until its vote has
been returned (e.g. by a RELEASE message)
A candidate obtains permission to enter the CS
when it has received a majority of the votes
Problem deadlock (Say two processes both win
half votes)
Solved by change vote when a process receives a
more attractive candidate (judged by timestamp or
other criteria)
INQUIRE ? RELINQUISH

50
Voting Schemes (Cont.)

Require O(N) messages per CS entry
some messages for deadlock avoidance
Reduce the message overhead by reducing the
number of votes required to enter the CS
Each process i has a request set (quorum) Si, and
a process needs the vote from every member of its
request set to enter the CS
To ensure CS, Si ? Sj ? ?
It is possible that each quorum to be of
size
See Chapters 6 and 10 for further discussion

51
A Toke Ring Algorithm
Not necessarily in process ID order

An unordered group of processes on a network.
A logical ring constructed in software.

52
A Toke Ring Algorithm (Cont.)

Idea
Processes are connected in a logical ring
structure
A token circulates in the ring
A process possessing the token is allowed to
enter CS
Finished the CS ? passes token to the successor
node
Advantages
Simple, deadlock-free, fair
Token can carry state information such as
priority
Disadvantages
The token circulates in the ring even if no
process wants to enter CS
Result in unnecessary network traffic
A process must wait until the token arrive for
entering CS

53
Comparison
54
Tree structure

Require a process to explicitly request the
token, and to only move the token if it knows of
a pending request (not like ring)
Indefinite postponement and deadlock ? but for
tree structure, both OK
Impose a hierarchical structure on the processors
? need to maintain it
Root the process owning the token
How to navigate a request to the token
Unique path between a process and the token
holder
How to navigate the token to the next processor
to enter CS
A FIFO queue for pending requests is maintained
by each node
The head of the FIFO queues of all nodes forms
the global FIFO queue
Algorithm
Algorithm List 10.9, 10.10, 10.11, and 10.12
An example

55
Tree Structure (Cont.)

Each process has a FIFO request queue and a
pointer to its immediate predecessor
When a process receives a request, appends to the
queue
If queue is empty and the process does not have
the token, request the token from its predecessor
Otherwise, the token will arrive soon ? no
further action is taken
If a process has token, but is not using it, and
has a nonempty queue ? remove the first entry
from queue and sends token to that process
Occur when a request arrives, when the token
arrives, or when the process releases the token
Also change its pointer to the process to which
it sent the token
If the queue is not empty, the process will need
to re-obtain the token, so it sends a request to
the new token holder
If the process is the first entry in the FIFO
list, enters the CS

56
Tree Structure (Cont.)
T0 ? P4 requests T2 ? P3 requests
3
4 3
3

4
2
3
2
3
2
3
1
1
7
2

4
4
4
4 3
3
5
6
57
Broadcast Structure

Proposed by Suzuki/Kasami
Use group communication without being aware of
topology
Token
Token vector T numbers of completions of a CS
by processes
Q pending request queue
Each P local sequence number a sequence vector
S
Local sequence number no. of requests for CS
(attached to REQUEST)
S Store the highest sequence number of every
process heard by P

58
Broadcast Structure (Cont.)

Pi requests CS by broadcasting REQUEST (with
local seq)
Pj updates its sequence vector when receiving a
REQUEST message from Pi ? Sji max (Sji,
seq)
If Pj holds an idle token (empty Q), Pj sends the
token to Pi if Sji Ti 1 ? Pi enter CS
when it receives the token
Upon completion of CS,
Pi update Ti equal to Sii
Appends all processes with SikTik 1(where
k ? i) to Q
Pi removes the top entry from the request queue
and sends it the token
If Q is empty the token stays with Pi

59
Broadcast Structure (Cont.)
?????Process Request???

t0, P1 holds the tokenand P1 wants to enter
CS. Its OK for P1 to enter CS
Token T Q
?????Process????CS
t1, P2 wants to enter CS
t2, P4 wants to enter CS
60
Broadcast Structure (Cont.)
24
4
t3, P1 leaves CS.Update T and Q.Send Token to
P2
Token T Q
t4, P3 wants to enter CS
4
61
Broadcast Structure (Cont.)
t5, P2 leaves CS
4 3
62
Broadcast Structure (Cont.)

3
4
Sequence Vector Si
Token Vector T
Token Queue Q

No central controller and the management of the
shared token is distributed
The contention for mutual exclusion is centrally
serialized by the FIFO token queue
No deadlock or starvation

63
Leader Election
64
Overview

Elect a process as coordinator, initiator
Especially when the existing coordinator (leader)
fails
Usually detect by time-out
Leader election criteria
Extrema finding based on a global priority
Preference-based vote for a leader for a
personal preference (locality, reliability)
Leader election VS. mutual exclusion
3rd Paragraph of Page 139
Leader election algorithms depend on the
topological structure assumption of the process
group

65
Bully Algorithm

Assumption
Complete topology processes can reach each other
in one message
Each process has a unique number (process id)
Election locate the process with the highest id
and designate it as the new coordinator
Every process knows the process id of every other
process
But do not know which ones are currently up or
down
Reliable network and only the processes may fail
Detect failure of a process by time-out
A failed process can rejoin the group by forcing
an election upon its recovery

66
Bully Algorithm (Cont.)

Any process (say P) notices that the coordinator
is no longer responding to requests, it initiates
an election
P sends an ELECTION message to all processes with
higher ID
If no one responds, P wins the election and
becomes coordinator
If one of the higher process answers, it takes
over. Ps job is done.
A process can get an ELECTION message from
processes with lower ID
Send OK back to the sender to indicate he is
alive and will take over
Hold an election, unless it is already holding
one
All processes give up but one, and the one is the
new coordinator
It sends messages to all processes to tell them
he is the new coordinator
If a process that was previously down comes back
up, it holds an election

67
Bully Algorithm (Cont.)

Process 4 holds an election
Process 5 and 6 respond, telling 4 to stop
Now 5 and 6 each hold an election

68
Bully Algorithm (Cont.)
69
Ring Algorithm

Assumption
processes are physically or logically ordered
Each process knows the current members of the
ring and the order
Election cycle any process notices that the
coordinator is no longer responding to requests,
it initiates an election
A ELECTION message containing its own ID is sent
to the successor
If successor dies, skip over until a running
process is located
Any process receiving the ELECTION message add
its ID into the message and resend it to its
successor
Finally, the ELECTION message go back to the
initiator
Coordinator announcement cycle
A COORDINATOR message is circulated once again
Tell who is the coordinator (the one with highest
ID) and who the members of the new ring are

70
Ring Algorithm (Cont.)
5,6,0,1
5,6,0,1,2
5,6,0,1,2,3
5,6,0,1,2,3,4
The topological order may not thesame as the
process ID order
71
Ring Algorithm (Cont.)

Improvement Figure 4.22 (pp. 141)
When a process sends a msg, simply forward the
larger of its id and the received value to the
successor
A process that is already involved in the
election process does not need to forward a msg
unless msg contains a value higher than its id
Time and message complexity
Only one initiator O(N) (time and msg)
N simultaneous election initiators
No optimization O(N2) msgs
Optimization O(N) or O(N2) msgs ? whether the
ring is arranged in ascending or descending order
of the nodes id

72
Ring Algorithm (Cont.)

Further Improvement O(N log N)
Idea Disable some election initiated by
lower-priority nodes as much as possible,
irrespective of the topological order of nodes
Compare a nodes id with those of its left and
right neighbors
Initiator node remains active if its ID is
higher than both neighbors
Otherwise, become passive and only relay messages
Effectively eliminate at least half of the nodes
in each round of message exchanges reduce O(N)
to O(log N)
Require bi-direction ring
For unidirectional ring ? buffering two
consecutive messages before a node is determined
to be in an active or passive mode

73
Tree Topologies

Dynamically build minimum-weight spanning tree
(MST) for a network of N nodes
If all edges in a connected graph have unique
weights ? unique MST
Leader election and building MST can be reduced
to each other
Gallager, Humbelt, and Spira Approach GHS83
Searching and combining
merge fragments
Fragment minimum-weight subtrees of the final
MST
Bottom-up from a single node
Each fragment finds its minimum-weight outgoing
edge of the fragment and uses it to join with a
node in a different fragment
The new fragment is still minimum-weight

74
Tree Topologies (Cont.)

Tree topology and leader election
The last node that merges and yields to the final
MST can be the leader
Elect a leader after an MST has been constructed
An initiator broadcasts a Campaign-For-Leader
(CFL) message, which carries a logical timestamp,
to all nodes along the MST
When message reaches a leaf, it replies a
Voting(V) to its parent
A parent will send the voting message to its
parent after all its childrens voting have been
collected
Once a node reply finishes its reply, it is done
? wait for the announcement of the new leader and
accepts no CFL
For multiple initiators ? the lowest timestamp
wins

75
Tree Topologies (Cont.)

Build a spanning tree by message flooding
Robust in a failure-prone network
Idea
Every node repeats a received message (which has
not been seen yet) to all neighboring nodes,
except the sender
Eventually every node will be reached and a
spanning tree is formed
Steps
Initiators flood the system with CFL messages
As messages are flooding, a spanning forest with
each tree rooted at an initiator is built up.
Reply messages are sent by backtracking the path
from leaf to root
For multiple initiators the lowest timestamp wins