Interprocess Communication and Coordination II - PowerPoint PPT Presentation

1 / 75
About This Presentation
Title:

Interprocess Communication and Coordination II

Description:

Functional: profession= professor , specialty= computer science) 23 ... Drawbacks (Weird) N points of failure. How to distinguish a dead participant from a refuse ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 76
Provided by: ccNct
Category:

less

Transcript and Presenter's Notes

Title: Interprocess Communication and Coordination II


1
Inter-process Communication and Coordination (II)
2
Outline
  • Message Passing Communication
  • Request/Reply Communication (Remote Procedure
    Call)
  • Transaction Communication
  • Name and Directory Services
  • Distributed Mutual Exclusion
  • Leader Election

3
Transaction Communication
4
Introduction
  • Transaction communication
  • Service-oriented request/reply communication
    multicast
  • Transaction fundamental unit of interaction
    between client and server processes in a database
    system
  • Database transaction a sequence of synchronous
    request/reply operations that satisfy the ACID
    properties
  • Transaction communication a set of asynchronous
    request/reply communications that have the ACID
    properties but are without the sequential
    constraint of the operations in a database
    transaction
  • Multicast of the same message to replicated
    servers

5
The Transaction Model
  • Transaction to reserve three flights commits
  • Transaction aborts when third flight is
    unavailable

6
The ACID Properties
  • Achieve concurrency transparency
  • Allow sharing of objects with interference
  • The execution of a transaction appears to take
    place in a critical section however, operations
    from different transactions may be interleaved in
    some safe way to achieve more concurrency
  • ACID tries to achieve consistency

W/O Interleave
With Interleave
7
The ACID Properties (Cont.)
  • Atomicity consistency of replicated or
    partitioned objects
  • Either all of the operations in a transaction are
    performed or none of them are, in spite of
    failures
  • Consistency (Serializability)
  • The execution of interleaved transactions is
    equivalent to a serial execution of the
    transactions in some order
  • Isolation (does not see something that has never
    occurred)
  • Partial results of an incomplete transaction are
    not visible to others before the transaction is
    successfully committed
  • Durability (see something that has actually
    occurred)
  • The system guarantees that the results of a
    committed transaction will be made permanent even
    if a failure occurs after the commitment

8
Two-Phase Commit Atomic Transaction Protocol
  • Coordinator the processor that initiates the
    transaction
  • Participants all the remaining processors
  • Unanimous voting scheme ? atomicity
  • Voting is initiated by the coordinator. All
    participants must come to an agreement about
    whether to commit or abort the transaction and
    must wait for the announcement of the decision
  • Before a participant can vote to commit, it must
    be prepared to perform the commit
  • A transaction is committed only if all
    participants agree and are ready to commit

9
Two-Phase Commit Atomic Transaction Protocol
(Cont.)
  • Each participant and coordinator maintains a
    private work space for keeping track of updated
    data objects
  • Each update contains the old and new value of a
    data object
  • Updates will not be made permanent until the
    transaction is finally committed ? isolation
  • To cope with failures ? flush the updates to a
    stable storage
  • Durability and failure recovery
  • Two synchronization points pre-commit and commit
  • Write and flush update logs, and then pre-commit
    or commit

10
Two-Phase Commit Atomic Transaction Protocol
(Cont.)
11
Two-Phase Commit Atomic Transaction Protocol
(Cont.)
Failures and recovery actions for the 2PC protocol
12
Two-Phase Commit Atomic Transaction Protocol
(Cont.)
Finite State Machine for Coordinator
Finite State Machine for Participant
13
Two-Phase Commit Atomic Transaction Protocol
(Cont.)
  • The protocol can easily fail when a process
    crashes for other processes may be indefinitely
    waiting for a message from that process
  • Timeout can be used for detecting the failure of
    other processes
  • Coordinator
  • Blocked in state WAIT ?GLOBAL_ABORT
  • Participant
  • Blocked in state INIT ? VOTE_ABORT
  • Blocked in state READY ? consult the state of
    other processes

14
Two-Phase Commit Atomic Transaction Protocol
(Cont.)
  • Actions taken by a participant P when residing in
    state READY and having contacted another
    participant Q

If all processes are in READY, block until the
server recover
15
Two-Phase Commit Atomic Transaction Protocol
(Cont.)
16
Two-Phase Commit Atomic Transaction Protocol
(Cont.)
17
Two-Phase Commit Atomic Transaction Protocol
(Cont.)
18
Two-Phase Commit Atomic Transaction Protocol
(Cont.)
  • 2PC is a blocking commit protocol
  • A participant may need to block until the
    coordinator recovers
  • To avoid blocking
  • Use a multicast primitive by which a receiver
    immediately multicasts a received message to all
    other processes
  • Allow participants to reach a final decision,
    even if the coordinator has not yet recovered
  • Three-phase commit protocol (3PC)
  • Section 12.1.1

19
Name and Directory Services
20
Name and Address Resolution (I)
  • Name and directory services look-up operations
  • Given the name or some attribute of an object
    entity, more attribute information is obtained
  • Object entity services and objects (users,
    computers, files)
  • Name services how a named object can be
    addressed and subsequently located by using its
    address
  • Directory services
  • Special name service (directory service of a file
    system)
  • All kinds of attribute look-ups on different
    object types, not just limited to address
    information
  • Sometimes, name and directory services are used
    interchangeably

21
Name and Address Resolution (II)
  • Two stages of resolution process
  • Name resolution names ? logical addresses
  • Name application-oriented denotations of objects
    (who it is)
  • Address object representation carrying structure
    information relevant to OS (where it can be
    found)
  • Ex. map a server name to its port addresses
  • Address resolution logical addresses ? network
    routes
  • Address contains intermediate object
    identification information between names and
    route.
  • Route are the lowest level of location
    information
  • Ex. Map a server port to its Ethernet port

22
Object Attributes and Name Structure (I)
  • An object entity is characterized by its
    attribute
  • User affiliation attribute File version
    number, creation date
  • Attributes involved in name resolution process
    name and address
  • Name space collections of names, recognized by a
    name service, with their corresponding attributes
    and addresses
  • A name space containing different object class ?
    type attribute
  • Name structure
  • Flat name how to achieve unique name
  • Names of concatenated attributes (Hierarchical
    structure name)
  • Names of collections of attributes
    (Structure-free name)
  • Attribute partition for an object name
  • Physical ltuser.host.networkgt (explicit location
    information)
  • Organizational ltuser.department.organizationgt
    (location transparent)
  • Functional professionltprofessorgt,
    specialtyltcomputer science)

23
Object Attributes and Name Structure (II)
24
Name Space and Information Base
  • DIB and DIT
  • Directory Information Base (DIB) conceptual data
    model for storing and representing object
    information
  • DIT Directory information Tree
  • Object entity in DIB ? a node in DIT
  • Distinguished attributes attributes for naming
    purpose
  • Path from object node to root
  • Suitable for both hierarchical and
    attribute-based resolution
  • Naming domain a subname space for which there is
    a single administrative authority for name
    management
  • Naming context a partial subtree of the DIT
  • The basic units for distribution the DIB to DSA
  • DSA Directory Service Agents (servers for name
    service)

25
Distribution of a DIT
DSA
Naming Context
26
Name Resolution Process (I)
  • Name resolution process
  • Initiated by a Directory User Agent (DUA)
  • The resolution request is sent from one DSA to
    another until the object is found in the DIT and
    returned to the DUA
  • Interaction mode among DSAs
  • Recursive chaining normal mode for structured
    name resolution
  • Transitive chaining
  • Fewer messages each message carries source DUA
    address
  • Violate RPC and client/server programming
    paradigm
  • Referral DSA suggests another suitable DSA to
    DUA
  • Can be used for name and attribute-based
    resolutions
  • Multicast request is sent to multiple DSAs
    concurrently
  • Suitable when structural information is not
    available

27
Name Resolution Interaction Modes
28
Name Resolution Process (II)
  • Techniques for enhancing the name resolution
    performance
  • Caching recently used names and their addresses
    are kept in cache
  • Replication naming contexts can be duplicated
    in different DSAs
  • Inconsistency issues
  • Object entries in a directory may be an alias or
    a group
  • Pointers to other object names and are leaf-nodes
    in the DIT

29
The DNS Name Space
  • DNS Internet Domain Name Service
  • Loop up host addresses and mail servers
  • DNS name space hierarchical rooted tree
  • Rootltnl, vu, cs, flitsgt ? flits.cs.vu.nl.
  • Domain (Naming context) a subtree domain name
    a path name to its root node
  • Node content a collection of resource records
  • Zone administrative unit (Naming Domain)
  • Each zone is implemented by a primary name server
    and maybe other secondary name server
  • Zone update is done locally in the DNS definition
    files in the primary name server secondary name
    servers request the primary server to transfer
    its content

30
Type of Resource Records in DNS
31
An Simple Example for DNS
32
Part of the description for the vu.nl domain
which contains the cs.vu.nl domain
33
OSI X.500
  • DNS given a (hierarchically structured) name,
    resolve it to a node in the naming graph and
    returns the content of that node in the form of a
    resource record
  • Similar to telephone book for looking up phone
    numbers
  • X.500 directory service ? a client can look for
    an entity based on a description of properties
    instead of a full name
  • Similar to yellow page
  • A simplified version of X.500 Light-weight
    Directory Access Protocol (LDAP)

34
The X.500 Name Space
  • X.500 consists of a number of records (directory
    entries)
  • Each record is made up of a collection of
    (attribute, value) pairs
  • Attribute type
  • Single-valued and multiple-valued attributes
  • Directory Information Base (DIB) the collection
    of all directory entries in an X.500 directory
    service
  • Unique name for each record is formed by a
    sequence of its naming attributes (Relative
    Distinguished Name, RDN)
  • /CNL/OVrije Universiteit/OUMath. Comp. Sc.
  • Directory Information Tree (DIT) listing RDNs in
    sequence leads to a hierarchy of the collection
    of directory entries

35
An Simple Example of a X.500 Directory Entry
36
Part of the DIT
  • Each node represents a directory entry and may
    also act as a directory in the traditional sense
    (may have children)
  • Use read to read a single record given its path
    name in the DIT
  • Use list to get the names of all the children of
    a node in the DIT

37
Two directory entries having Host_Name as RDN
38
X.500 Implementation
  • Similar to DNS, but
  • X.500 supports more lookup operations
  • Answer search((CNL)(OVrije
    Universiteit)(OU)(CNMain Server))
  • Butexpensivemay access many DSA
  • Directory Service Agents (DSA) ? Name Server
  • Naming Domain ? Zone
  • Each part of a partitioned DIT ? Naming Context ?
    Domain

39
Other Issues in Name Service
  • Name Services for Directories, Files
  • Locating Mobile Entities
  • Removing Un-referenced Entities

40
Mutual Exclusion
  • Ensure that concurrent processes make a
    serialized access to shared resources or data

41
Classification
  • Contention-based mutual exclusion
  • Centralized mutual exclusion
  • Distributed mutual exclusion
  • Timestamp Priority Schemes
  • Voting scheme (variant of timestamp priority
    schemes)
  • Token-based mutual exclusion
  • Ring structure
  • Tree structure
  • Broadcast structure

42
A Centralized Algorithm (I)
  • Process 1 asks the coordinator for permission to
    enter a critical region. Permission is granted
  • Process 2 then asks permission to enter the same
    critical region. The coordinator does not reply.
  • When process 1 exits the critical region, it
    tells the coordinator, when then replies to 2

43
A Centralized Algorithm (II)
  • Characteristics
  • Guarantee mutual exclusion
  • Fair requests are granted in the order in which
    they are received
  • No starvation
  • Can be used for more general resource allocation
  • Drawbacks
  • Single point of failure
  • How to distinguish a dead coordinator from
    permission denied
  • Performance bottleneck

44
TimeStamp Prioritized Schemes
3(N-1) for the completion of a CS
  • Lamports logical clock totally order requests
    for entering CS
  • A process requests the CS by broadcasting a
    timestamped REQUEST message to all other
    processes (including itself)
  • Each process maintains a queue of pending REQUEST
    messages arranged in the ascending timestamp
    order
  • Once receiving a REQUEST message, a process
    inserts the message in its queue and sends a
    REPLY message to the requesting process
  • A process is allowed to enter its CS only if it
    has gathered all the REPLY messages and its
    request message is at the top of the request
    queue.

45
TimeStamp Prioritized Schemes (Cont.)
  • When exiting the CS, a process broadcasts a
    RELEASE message to every process
  • Once receiving a RELEASE message, a process
    removes the completed request from its request
    queue.
  • At that moment, if the process only request is
    at the top of the request queue, it enters its CS
    provided that all REPLY messages have been
    received.
  • When receiving REQUEST, RELEASE, and REPLY
    messages, a process adjusts its logical clock
    accordingly

46
TimeStamp Prioritized Schemes Improved
  • When a process receives a REQUEST message
  • Not in the critical region AND not want to enter
    ? OK (REPLY)
  • Already in the critical region ? No REPLY and
    queue the request
  • After it exits the CS, send REPLY message
  • Want to enter and not yet don so ? compare the
    timestamps of the incoming message with the one
    that it sent to everyone
  • The lowest one win
  • if incoming message is lower ?OK (REPLY)
  • Otherwise ? No REPLY and queue the request
  • 2(N-1) for the completion of a CS
  • REPLY and RELEASE are combined

47
TimeStamp Prioritized Schemes Improved (Cont.)
  • Two processes want to enter the same critical
    region at the same moment.
  • Process 0 has the lowest timestamp, so it wins.
  • When process 0 is done, it sends an OK also, so 2
    can now enter the critical region.

48
TimeStamp Prioritized Schemes (Cont.)
  • Characteristics
  • Guarantee mutual exclusion
  • No deadlock or starvation
  • Drawbacks (Weird)
  • N points of failure
  • How to distinguish a dead participant from a
    refuse
  • N points of bottleneck
  • Improvement
  • A simple majority of other processes ? VOTING
  • Avoid N points of failure

49
Voting Schemes
  • As soon as a candidate has a majority of the
    votes ? WIN
  • When a process receives a REQUEST message, it
    sends a REPLY (i.e. vote) only if the process has
    not voted for any other candidate (requesting
    process)
  • Once a process has voted, it is not allowed to
    send any more REPLY messages until its vote has
    been returned (e.g. by a RELEASE message)
  • A candidate obtains permission to enter the CS
    when it has received a majority of the votes
  • Problem deadlock (Say two processes both win
    half votes)
  • Solved by change vote when a process receives a
    more attractive candidate (judged by timestamp or
    other criteria)
  • INQUIRE ? RELINQUISH

50
Voting Schemes (Cont.)
  • Require O(N) messages per CS entry
  • some messages for deadlock avoidance
  • Reduce the message overhead by reducing the
    number of votes required to enter the CS
  • Each process i has a request set (quorum) Si, and
    a process needs the vote from every member of its
    request set to enter the CS
  • To ensure CS, Si ? Sj ? ?
  • It is possible that each quorum to be of
    size
  • See Chapters 6 and 10 for further discussion

51
A Toke Ring Algorithm
Not necessarily in process ID order
  • An unordered group of processes on a network.
  • A logical ring constructed in software.

52
A Toke Ring Algorithm (Cont.)
  • Idea
  • Processes are connected in a logical ring
    structure
  • A token circulates in the ring
  • A process possessing the token is allowed to
    enter CS
  • Finished the CS ? passes token to the successor
    node
  • Advantages
  • Simple, deadlock-free, fair
  • Token can carry state information such as
    priority
  • Disadvantages
  • The token circulates in the ring even if no
    process wants to enter CS
  • Result in unnecessary network traffic
  • A process must wait until the token arrive for
    entering CS

53
Comparison
54
Tree structure
  • Require a process to explicitly request the
    token, and to only move the token if it knows of
    a pending request (not like ring)
  • Indefinite postponement and deadlock ? but for
    tree structure, both OK
  • Impose a hierarchical structure on the processors
    ? need to maintain it
  • Root the process owning the token
  • How to navigate a request to the token
  • Unique path between a process and the token
    holder
  • How to navigate the token to the next processor
    to enter CS
  • A FIFO queue for pending requests is maintained
    by each node
  • The head of the FIFO queues of all nodes forms
    the global FIFO queue
  • Algorithm
  • Algorithm List 10.9, 10.10, 10.11, and 10.12
  • An example

55
Tree Structure (Cont.)
  • Each process has a FIFO request queue and a
    pointer to its immediate predecessor
  • When a process receives a request, appends to the
    queue
  • If queue is empty and the process does not have
    the token, request the token from its predecessor
  • Otherwise, the token will arrive soon ? no
    further action is taken
  • If a process has token, but is not using it, and
    has a nonempty queue ? remove the first entry
    from queue and sends token to that process
  • Occur when a request arrives, when the token
    arrives, or when the process releases the token
  • Also change its pointer to the process to which
    it sent the token
  • If the queue is not empty, the process will need
    to re-obtain the token, so it sends a request to
    the new token holder
  • If the process is the first entry in the FIFO
    list, enters the CS

56
Tree Structure (Cont.)
T0 ? P4 requests T2 ? P3 requests
3
4 3
3

4
2
3
2
3
2
3
1
1
7
2

4
4
4
4 3
3
5
6
57
Broadcast Structure
  • Proposed by Suzuki/Kasami
  • Use group communication without being aware of
    topology
  • Token
  • Token vector T numbers of completions of a CS
    by processes
  • Q pending request queue
  • Each P local sequence number a sequence vector
    S
  • Local sequence number no. of requests for CS
    (attached to REQUEST)
  • S Store the highest sequence number of every
    process heard by P

58
Broadcast Structure (Cont.)
  • Pi requests CS by broadcasting REQUEST (with
    local seq)
  • Pj updates its sequence vector when receiving a
    REQUEST message from Pi ? Sji max (Sji,
    seq)
  • If Pj holds an idle token (empty Q), Pj sends the
    token to Pi if Sji Ti 1 ? Pi enter CS
    when it receives the token
  • Upon completion of CS,
  • Pi update Ti equal to Sii
  • Appends all processes with SikTik 1(where
    k ? i) to Q
  • Pi removes the top entry from the request queue
    and sends it the token
  • If Q is empty the token stays with Pi

59
Broadcast Structure (Cont.)
?????Process Request???

t0, P1 holds the tokenand P1 wants to enter
CS. Its OK for P1 to enter CS
Token T Q
?????Process????CS
t1, P2 wants to enter CS
t2, P4 wants to enter CS
60
Broadcast Structure (Cont.)
24
4
t3, P1 leaves CS.Update T and Q.Send Token to
P2
Token T Q
t4, P3 wants to enter CS
4
61
Broadcast Structure (Cont.)
t5, P2 leaves CS
4 3
62
Broadcast Structure (Cont.)

3
4
Sequence Vector Si
Token Vector T
Token Queue Q
  • No central controller and the management of the
    shared token is distributed
  • The contention for mutual exclusion is centrally
    serialized by the FIFO token queue
  • No deadlock or starvation

63
Leader Election
64
Overview
  • Elect a process as coordinator, initiator
  • Especially when the existing coordinator (leader)
    fails
  • Usually detect by time-out
  • Leader election criteria
  • Extrema finding based on a global priority
  • Preference-based vote for a leader for a
    personal preference (locality, reliability)
  • Leader election VS. mutual exclusion
  • 3rd Paragraph of Page 139
  • Leader election algorithms depend on the
    topological structure assumption of the process
    group

65
Bully Algorithm
  • Assumption
  • Complete topology processes can reach each other
    in one message
  • Each process has a unique number (process id)
  • Election locate the process with the highest id
    and designate it as the new coordinator
  • Every process knows the process id of every other
    process
  • But do not know which ones are currently up or
    down
  • Reliable network and only the processes may fail
  • Detect failure of a process by time-out
  • A failed process can rejoin the group by forcing
    an election upon its recovery

66
Bully Algorithm (Cont.)
  • Any process (say P) notices that the coordinator
    is no longer responding to requests, it initiates
    an election
  • P sends an ELECTION message to all processes with
    higher ID
  • If no one responds, P wins the election and
    becomes coordinator
  • If one of the higher process answers, it takes
    over. Ps job is done.
  • A process can get an ELECTION message from
    processes with lower ID
  • Send OK back to the sender to indicate he is
    alive and will take over
  • Hold an election, unless it is already holding
    one
  • All processes give up but one, and the one is the
    new coordinator
  • It sends messages to all processes to tell them
    he is the new coordinator
  • If a process that was previously down comes back
    up, it holds an election

67
Bully Algorithm (Cont.)
  • Process 4 holds an election
  • Process 5 and 6 respond, telling 4 to stop
  • Now 5 and 6 each hold an election

68
Bully Algorithm (Cont.)
69
Ring Algorithm
  • Assumption
  • processes are physically or logically ordered
  • Each process knows the current members of the
    ring and the order
  • Election cycle any process notices that the
    coordinator is no longer responding to requests,
    it initiates an election
  • A ELECTION message containing its own ID is sent
    to the successor
  • If successor dies, skip over until a running
    process is located
  • Any process receiving the ELECTION message add
    its ID into the message and resend it to its
    successor
  • Finally, the ELECTION message go back to the
    initiator
  • Coordinator announcement cycle
  • A COORDINATOR message is circulated once again
  • Tell who is the coordinator (the one with highest
    ID) and who the members of the new ring are

70
Ring Algorithm (Cont.)
5,6,0,1
5,6,0,1,2
5,6,0,1,2,3
5,6,0,1,2,3,4
The topological order may not thesame as the
process ID order
71
Ring Algorithm (Cont.)
  • Improvement Figure 4.22 (pp. 141)
  • When a process sends a msg, simply forward the
    larger of its id and the received value to the
    successor
  • A process that is already involved in the
    election process does not need to forward a msg
    unless msg contains a value higher than its id
  • Time and message complexity
  • Only one initiator O(N) (time and msg)
  • N simultaneous election initiators
  • No optimization O(N2) msgs
  • Optimization O(N) or O(N2) msgs ? whether the
    ring is arranged in ascending or descending order
    of the nodes id

72
Ring Algorithm (Cont.)
  • Further Improvement O(N log N)
  • Idea Disable some election initiated by
    lower-priority nodes as much as possible,
    irrespective of the topological order of nodes
  • Compare a nodes id with those of its left and
    right neighbors
  • Initiator node remains active if its ID is
    higher than both neighbors
  • Otherwise, become passive and only relay messages
  • Effectively eliminate at least half of the nodes
    in each round of message exchanges reduce O(N)
    to O(log N)
  • Require bi-direction ring
  • For unidirectional ring ? buffering two
    consecutive messages before a node is determined
    to be in an active or passive mode

73
Tree Topologies
  • Dynamically build minimum-weight spanning tree
    (MST) for a network of N nodes
  • If all edges in a connected graph have unique
    weights ? unique MST
  • Leader election and building MST can be reduced
    to each other
  • Gallager, Humbelt, and Spira Approach GHS83
  • Searching and combining
  • merge fragments
  • Fragment minimum-weight subtrees of the final
    MST
  • Bottom-up from a single node
  • Each fragment finds its minimum-weight outgoing
    edge of the fragment and uses it to join with a
    node in a different fragment
  • The new fragment is still minimum-weight

74
Tree Topologies (Cont.)
  • Tree topology and leader election
  • The last node that merges and yields to the final
    MST can be the leader
  • Elect a leader after an MST has been constructed
  • An initiator broadcasts a Campaign-For-Leader
    (CFL) message, which carries a logical timestamp,
    to all nodes along the MST
  • When message reaches a leaf, it replies a
    Voting(V) to its parent
  • A parent will send the voting message to its
    parent after all its childrens voting have been
    collected
  • Once a node reply finishes its reply, it is done
    ? wait for the announcement of the new leader and
    accepts no CFL
  • For multiple initiators ? the lowest timestamp
    wins

75
Tree Topologies (Cont.)
  • Build a spanning tree by message flooding
  • Robust in a failure-prone network
  • Idea
  • Every node repeats a received message (which has
    not been seen yet) to all neighboring nodes,
    except the sender
  • Eventually every node will be reached and a
    spanning tree is formed
  • Steps
  • Initiators flood the system with CFL messages
  • As messages are flooding, a spanning forest with
    each tree rooted at an initiator is built up.
  • Reply messages are sent by backtracking the path
    from leaf to root
  • For multiple initiators the lowest timestamp wins
Write a Comment
User Comments (0)
About PowerShow.com