State Machines - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

State Machines

Description:

State Machines Sabina Petride General Problems Consensus a particular problem algorithms and different formulations correctness and time analysis Application To Data ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 26
Provided by: camu7
Category:

less

Transcript and Presenter's Notes

Title: State Machines


1
State Machines
  • Sabina Petride

2
General Problems
  • Consensus
  • a particular problem
  • algorithms and different formulations
  • correctness and time analysis
  • Application To Data Replication
  • replica coordination
  • group membership reintegration
  • unique identifiers using logical/real clocks

3
The Paxos Parliament And The Consensus Problem
  • The Paxos Parliament
  • determine the law of the land, defined by the
    sequence of decrees passed
  • each legislator had his own ledger with decrees,
    their unique number and their contents
  • entries in ledgers could not be modified or
    deleted
  • legislators could leave the court for very long
    periods of time and return later
  • communication only by messangers (could lose the
    message, send it many times or lose the messages)
  • Requirements
  • consistency of the ledgers
  • progress to ensure that some decree will
    eventually be passed
  • The Synod
  • basically, the same problem as with the
    Parliament, just that a single decree had to be
    passed
  • the group of priests/legislators asked to vote
    for a decree was called the quorum

4
  • This can be modelled as a consensus problem
  • Agreement no two ledgers should contain
    different decrees with the same number (no
    conflicts among ledgers)
  • Validity any decree should be written in the
    standard form
  • Termination (the progress condition)
  • Agreement and validation are guaranteed and
    progress is possible if three conditions are
    satisfied
  • B1 Each ballot has a unique number.
  • B2 The quorums of any two ballots have at least
    one priest in common.
  • B3 For every ballot, if any priest in a quorum
    has voted in an earlier ballot, then the decree
    equals the decree of the latest of those earlier
    ballots.

5
Assumptions About The System
  • partial synchronous distributed system in which
    processes take actions within l time and messages
    are delivered within d time
  • the system doen not necessarily exhibits this
    normal timing behavior
  • each process has a direct communication channel
    with each other process
  • allowed failures
  • timig failures (the bounds of l and d can be
    occasionally exceded)
  • loss, duplication or reordering of messages
  • process stopping
  • some stable storage is needed
  • process recovery is considered

6
The Synod Algorithm
  • (1) Priest p chooses a new ballot number b. p
    sends message NextBallot(b) to some set of
    priests.
  • (2) When a priest q recieves a NextBallot(b), he
    checks the notes in the back of his ledger and
    determines the vote v with the largest ballot
    number less then b that he has voted for. If such
    a vote doesnt exist, then a default value
    null(q) is used.
  • q sends p a LastVoted(b,v) message.
  • (3) After p receives a LastVoted(b,v) message
    from all the priests in a majority set Q, he
    initiates a new ballot with number b, quorum Q,
    and decree chosen according to B3.
  • p records the new ballot and sens
    BeginBallot(b,d) to Q.
  • (4) If q receives BeginBallot(b,d) and decides to
    vote, then he records the vote in the back of his
    ledger and sends Voted(b,q) to p.

7
(5) If p has recieved a Voted(b,q) from all q in
Q, then he writes d in his ledger and sends
Success(d) to all priests. (6) After receiving
Success(d), a priest enters d in his ledger.
8
Notes on The Synod Algorithm
  • to maintain B1, each ballot has to receive a
    unique number this can be done by
  • having each priest noting the ballots in his
    ledger
  • patitioning the set of possible ballots among the
    priests
  • ( later we will talk about different
    implementations)
  • a priest should not cast the vote after receiving
    BeginBallot(b,d) if he has already sent a
    LastVote(b,v) message for some other ballot and
    v.balltbltb.
  • It follows that
  • a priest must record
  • the number of every ballot he has initiated
  • every vote he has cast
  • every LastVote message he has sent

9
Stating The Problem in Terms of State Machines
  • a state machine consists of
  • state variables (encoded in states)
  • commands (which transform the states)
  • each command is implemented by a deterministic
    program and its execution is atomic with respect
    to other commands
  • clock I/O automaton specific state machine
    devised by Lynch and Tuttle for modelling,
    verifying, and analyzing time-based systems

10
Clock I/O Automata
  • An I/O time automaton A consists of
  • a set of states states(A)
  • a nonempty set start(A) of start states
  • a set of actions partitioned in input, output,
    internal, and time-passage actions and specified
    in the signature of A
  • a transition relation steps(A) subset of
    states(A)acts(A)states(A).
  • No input action can be blocked for all s state,
    for all a input action, there is a state s such
    that (s,a,s ) is a step in A.
  • A time-passage action (t) models the passage of
    real time t.
  • A special real variable Clock is included in each
    state to model the local clock of the process. It
    is not necessary that Clock simulates the real
    time.

11
The Synod Algorithm In Terms Of Clock GTA
  • The Distributed Setting
  • relation with the Paxos problem
  • priest/process
  • law book/state
  • passing a decree/executing a command
  • complete network of n processes with unique
    identifiers in a totally ordered set known by all
    processes
  • clock GT automata are used to model both
    processes and channels each automaton has a
    local clock and the local clock for a channel is
    used to detect timig failures
  • The Algorithm
  • ideea propose values until one of them is
    accepted by a majority of processes
  • any process may propose a value by initiating a
    round for that value it becomes the leader of
    that round
  • the leader and the other processes are agents

12
  • (1) The leader sends a Collect message to all
    agents
  • (2) If an agent recieves a Collect message and it
    is already committed for a round with a biger
    round number, it sends an OldRound message
    otherwise, it sends a Last message with its
    information about rounds previously conducted.
  • (3) If the leader receives more than n/2 Last
    messages, it initiates a new round and sends to
    all agents a Begin message.
  • (4) If an agent receives the Begin message and is
    committed, it sends an OldRound message
    otherwise, it accepts the value proposed and
    responds with an Accept message.
  • (5) If the leader receives more than n/2 Accept
    messages, then the round is successful and its
    own output value is the value of the round.
  • (6) The leader broadcasts the reached decision.
  • Notes
  • the set of agents Last (Accept) messages are
    received frominfo-quorum (accepting-quorum)

13
Implementation(1)
14
Implementation(2)
  • BPLEADER(I) (clock GTA running the leader at
    process i)
  • Input NewRound(i), Leader(i)
  • NotLeader(i)
  • Receive(m)(j,i), mLast, Accept,
    Success, OldRound
  • Output Send(m)(j,i), mCollect, Begin
  • BeginCast(i)
  • RndSuccess(v)(i)
  • Internal Collect(i), GatherLast(i) ...
  • Time-passage ...
  • BPAGENT(I) (clock GTA running an agent at process
    i)
  • Input Receive(m)(j,i), mCollect, Begin
  • Output Send(m)(j,i), mLast, Accept, OldRound
  • Internal LastAccept(i), Accept(i), ...
  • Time-passage ...

15
Correctness Proof
  • execution fragment sequence of states followed
    by actions in steps according to the automaton
  • problem specification set of allowable behaviors
    (behavior sequence of external actions from an
    execution fragment)
  • an automaton A solves the problem if each of its
    behaviors is contained in the problem
    specification
  • safety properties must hold in every state of a
    computation
  • liveness properties specify events that must
    eventually be performed

16
Safety/Liveness Properties
  • safety property in any execution of the system
    agreement and validity are guaranteed
  • liveness property under some conditions,
    termination is guaranteed
  • an execution fragment is nice if
  • no loss or duplication takes place
  • at each time-passage action the local clock is
    incremented with the real time variation
  • every process is either stopped or alive
  • a majority of process are alive
  • Theorem If a nice execution fragment starts in a
    reachable state and it has a unique leader and
    lasts for more than 16l8nl9d time units, then
    by the time 16l8nl9d the leader has reached a
    decision.
  • Note proofs are based on invariants.

17
Other Results On Time Performance
  • If a nice execution fragment starts in a
    reachable state and lasts more than 24l10nl13d,
    then
  • the leader decides by the time 21l8nl11d and at
    most 8n messages are sent
  • all alive processes decide by time 24l10nl13d
    and at most 2n additional messages are sent

18
Generalization Of The Synod Protocol MULTIPAXOS
  • consensus has to be reached on a sequence of
    values
  • for each value we run BAXICPAXOS
  • the automata used for each instance of the
    algorithm are like automata in BAXIXPAXOS, except
    that an additional parameter (the index of the
    proposed value) is present in each action
  • concurrency several leaders may concurrently
    initiate rounds and these round are carried out
    concurrently
  • several leaders initiating values concurrently is
    an important difference between Paxos algorithm
    and three phase commit protocol

19
Data Replication
  • problem providing distributed and concurrent
    access to data objects
  • simple implementation maintain the object at a
    single process accessed by multiple clients
  • some disadvantages
  • not good scaling when the number of clients
    increases
  • not fault-tolerant
  • other solution data replication
  • servers are replicated each server runs the same
    state machine
  • clients make requests which are redirected to
    specific servers

20
Replica Coordination(1)
  • Requirements
  • requests should be processed by state machines
    one at a time
  • the order of processing should be consistent with
    potential causality
  • outputs determined only by the sequence of
    requests, independent of time or any other
    activity in the system
  • Replica coordination
  • agreement every nonfaulty state machine replica
    receives every request
  • order every nonfaulty state machine replica
    processes the requests it receives in the same
    relative order
  • issues to be considered fault-tolerance and
    reconfiguration
  • MULTIPAXOS possible solution to the problem

21
Replica Coordination(2)MULTIPAXOS For Replica
Coordination
  • each process in the system maintains a copy of
    the data object
  • a client requests un update operation
  • a process proposes the operation in an instance
    of MULTIPAXOS
  • after some time, the update operation is the
    output value of the instance of MULTIPAXOS
  • the leader of the round updates its local copy
    because of correctness, all the alive processes
    update their copies, too
  • a report to the client is given
  • a client requests a read operation
  • the request is immediately satisfied based on the
    local copy
  • Note majority to achieve consistency-gt majority
    voting
  • a unique leader required to achieve
    termination-gt primary copy replication

22
Replica Coordination(3)Order and Stability
  • unique identifiers for requests (total order)
  • implementation a replica next processes the
    stable request with the smallest unique
    identifier (stable request no request from a
    correct client and with a lower uid can be
    subsequently delivered to that state machine)
  • using logical clocks to ensure order and
    stability
  • each process has a local counter
  • local counter is incremented after each event at
    that process
  • each message sent is timestamped with the local
    clock
  • upon receipt of a message, the local clock of the
    receiver becomes 1maximum of timestamp and local
    clock
  • a uid for each event is given by appending a
    fixed-length bit (encodes the process id) to the
    counter value of the process where the event
    takes place
  • using real clocks to ensure order and stability
  • assumptions
  • the degree of clock synchronization better than
    min message delivery time
  • a request r will be received by every correct
    process no later then uid(r)?
  • stability test a request r is stable at a state
    machine if the local clock reads time t and
    tgtuid(r) ?

23
Replica Coordination(4)Reconfiguration
  • at time t there are P(t) processes, F(t) faulty
  • necessary condition for correct output
  • P(t)gtF(t)/2 if Byzantine failures are possible
  • P(t)gtF(t) if only fail-stop failures
  • system described by 3 sets clients (C), state
    machines (S), and output devices (O)
    information about them stored in state variables
    and changed by commands
  • C and O make periodical queries-gt better share
    processors
  • messages sent by S always contain information
    about future reconfiguration-gt permanent
    communication Slt-gtC and Slt-gtO
  • requests to change a configuration of the system
    made by failure/recovery detector mechanism

24
Replica Coordination(6)Integrating A Repaired
Object
  • goal integrate element e at request r
  • notation er is the state a non-faulty system
    element e should be in after processing all the
    requests up to r
  • if processors are fail stop and logical clocks
    are implemented, then the cooperation of only one
    state machine replica is needed (if the sm has
    not failed, then it is correct, and because of
    consensus among replicas, its information on the
    system is correct and complete with respect to
    other sm) -gt the used sm should have access to
    enough information
  • implementation er is sent to e before the
    output produced by processing any request with
    uid larger than uid(r)
  • e in O er usually is device-specific setup
    information
  • can be stored in state variables of sm
  • e in C er usually based on sensor values read
  • use information from C to sm

25
Replica Coordination(7)Integrating A Repaired
State Machine
  • try to use the algorithm sm sends to e the
    values of all its state variables before the
    output produced by processing any request with
    uid larger than uid(r) .... problem some client
    request might be recieved by sm after sending
    er, but delivered to e before its repair
  • solution sm must relay to e requests received
    from clients
  • how long as soon as e has received a request
    directly from a client c, requests from the same
    c with larger uid need not be relayed to e
  • so, e should inform sm of the uid of requests
    received directly from c
  • algorithm
  • (1) sm sends e the values of its state
    variables and copies of pending requests
  • (2) sm sends to e every subsequent
    request r received from client c s.t.
    uid(r)ltuid(rc) (rc is the first request e has
    directly recieved from c, after e restarted)
Write a Comment
User Comments (0)
About PowerShow.com