State Machines - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

State Machines

Description:

State Machines Sabina Petride General Problems Consensus a particular problem algorithms and different formulations correctness and time analysis Application To Data ... – PowerPoint PPT presentation

Number of Views:67

Avg rating:3.0/5.0

Slides: 26

Provided by: camu7

Learn more at: http://www.cs.cornell.edu

Category:

more less

Transcript and Presenter's Notes

Title: State Machines

1
State Machines

Sabina Petride

2
General Problems

Consensus
a particular problem
algorithms and different formulations
correctness and time analysis
Application To Data Replication
replica coordination
group membership reintegration
unique identifiers using logical/real clocks

3
The Paxos Parliament And The Consensus Problem

The Paxos Parliament
determine the law of the land, defined by the
sequence of decrees passed
each legislator had his own ledger with decrees,
their unique number and their contents
entries in ledgers could not be modified or
deleted
legislators could leave the court for very long
periods of time and return later
communication only by messangers (could lose the
message, send it many times or lose the messages)
Requirements
consistency of the ledgers
progress to ensure that some decree will
eventually be passed
The Synod
basically, the same problem as with the
Parliament, just that a single decree had to be
passed
the group of priests/legislators asked to vote
for a decree was called the quorum

This can be modelled as a consensus problem
Agreement no two ledgers should contain
different decrees with the same number (no
conflicts among ledgers)
Validity any decree should be written in the
standard form
Termination (the progress condition)
Agreement and validation are guaranteed and
progress is possible if three conditions are
satisfied
B1 Each ballot has a unique number.
B2 The quorums of any two ballots have at least
one priest in common.
B3 For every ballot, if any priest in a quorum
has voted in an earlier ballot, then the decree
equals the decree of the latest of those earlier
ballots.

5
Assumptions About The System

partial synchronous distributed system in which
processes take actions within l time and messages
are delivered within d time
the system doen not necessarily exhibits this
normal timing behavior
each process has a direct communication channel
with each other process
allowed failures
timig failures (the bounds of l and d can be
occasionally exceded)
loss, duplication or reordering of messages
process stopping
some stable storage is needed
process recovery is considered

6
The Synod Algorithm

(1) Priest p chooses a new ballot number b. p
sends message NextBallot(b) to some set of
priests.
(2) When a priest q recieves a NextBallot(b), he
checks the notes in the back of his ledger and
determines the vote v with the largest ballot
number less then b that he has voted for. If such
a vote doesnt exist, then a default value
null(q) is used.
q sends p a LastVoted(b,v) message.
(3) After p receives a LastVoted(b,v) message
from all the priests in a majority set Q, he
initiates a new ballot with number b, quorum Q,
and decree chosen according to B3.
p records the new ballot and sens
BeginBallot(b,d) to Q.
(4) If q receives BeginBallot(b,d) and decides to
vote, then he records the vote in the back of his
ledger and sends Voted(b,q) to p.

7
(5) If p has recieved a Voted(b,q) from all q in
Q, then he writes d in his ledger and sends
Success(d) to all priests. (6) After receiving
Success(d), a priest enters d in his ledger.
8
Notes on The Synod Algorithm

to maintain B1, each ballot has to receive a
unique number this can be done by
having each priest noting the ballots in his
ledger
patitioning the set of possible ballots among the
priests
( later we will talk about different
implementations)
a priest should not cast the vote after receiving
BeginBallot(b,d) if he has already sent a
LastVote(b,v) message for some other ballot and
v.balltbltb.
It follows that
a priest must record
the number of every ballot he has initiated
every vote he has cast
every LastVote message he has sent

9
Stating The Problem in Terms of State Machines

a state machine consists of
state variables (encoded in states)
commands (which transform the states)
each command is implemented by a deterministic
program and its execution is atomic with respect
to other commands
clock I/O automaton specific state machine
devised by Lynch and Tuttle for modelling,
verifying, and analyzing time-based systems

10
Clock I/O Automata

An I/O time automaton A consists of
a set of states states(A)
a nonempty set start(A) of start states
a set of actions partitioned in input, output,
internal, and time-passage actions and specified
in the signature of A
a transition relation steps(A) subset of
states(A)acts(A)states(A).
No input action can be blocked for all s state,
for all a input action, there is a state s such
that (s,a,s ) is a step in A.
A time-passage action (t) models the passage of
real time t.
A special real variable Clock is included in each
state to model the local clock of the process. It
is not necessary that Clock simulates the real
time.

11
The Synod Algorithm In Terms Of Clock GTA

The Distributed Setting
relation with the Paxos problem
priest/process
law book/state
passing a decree/executing a command
complete network of n processes with unique
identifiers in a totally ordered set known by all
processes
clock GT automata are used to model both
processes and channels each automaton has a
local clock and the local clock for a channel is
used to detect timig failures
The Algorithm
ideea propose values until one of them is
accepted by a majority of processes
any process may propose a value by initiating a
round for that value it becomes the leader of
that round
the leader and the other processes are agents

(1) The leader sends a Collect message to all
agents
(2) If an agent recieves a Collect message and it
is already committed for a round with a biger
round number, it sends an OldRound message
otherwise, it sends a Last message with its
information about rounds previously conducted.
(3) If the leader receives more than n/2 Last
messages, it initiates a new round and sends to
all agents a Begin message.
(4) If an agent receives the Begin message and is
committed, it sends an OldRound message
otherwise, it accepts the value proposed and
responds with an Accept message.
(5) If the leader receives more than n/2 Accept
messages, then the round is successful and its
own output value is the value of the round.
(6) The leader broadcasts the reached decision.
Notes
the set of agents Last (Accept) messages are
received frominfo-quorum (accepting-quorum)

13
Implementation(1)
14
Implementation(2)

BPLEADER(I) (clock GTA running the leader at
process i)
Input NewRound(i), Leader(i)
NotLeader(i)
Receive(m)(j,i), mLast, Accept,
Success, OldRound
Output Send(m)(j,i), mCollect, Begin
BeginCast(i)
RndSuccess(v)(i)
Internal Collect(i), GatherLast(i) ...
Time-passage ...

BPAGENT(I) (clock GTA running an agent at process
i)
Input Receive(m)(j,i), mCollect, Begin
Output Send(m)(j,i), mLast, Accept, OldRound
Internal LastAccept(i), Accept(i), ...
Time-passage ...

15
Correctness Proof

execution fragment sequence of states followed
by actions in steps according to the automaton
problem specification set of allowable behaviors
(behavior sequence of external actions from an
execution fragment)
an automaton A solves the problem if each of its
behaviors is contained in the problem
specification
safety properties must hold in every state of a
computation
liveness properties specify events that must
eventually be performed

16
Safety/Liveness Properties

safety property in any execution of the system
agreement and validity are guaranteed
liveness property under some conditions,
termination is guaranteed
an execution fragment is nice if
no loss or duplication takes place
at each time-passage action the local clock is
incremented with the real time variation
every process is either stopped or alive
a majority of process are alive
Theorem If a nice execution fragment starts in a
reachable state and it has a unique leader and
lasts for more than 16l8nl9d time units, then
by the time 16l8nl9d the leader has reached a
decision.
Note proofs are based on invariants.

17
Other Results On Time Performance

If a nice execution fragment starts in a
reachable state and lasts more than 24l10nl13d,
then
the leader decides by the time 21l8nl11d and at
most 8n messages are sent
all alive processes decide by time 24l10nl13d
and at most 2n additional messages are sent

18
Generalization Of The Synod Protocol MULTIPAXOS

consensus has to be reached on a sequence of
values
for each value we run BAXICPAXOS
the automata used for each instance of the
algorithm are like automata in BAXIXPAXOS, except
that an additional parameter (the index of the
proposed value) is present in each action
concurrency several leaders may concurrently
initiate rounds and these round are carried out
concurrently
several leaders initiating values concurrently is
an important difference between Paxos algorithm
and three phase commit protocol

19
Data Replication

problem providing distributed and concurrent
access to data objects
simple implementation maintain the object at a
single process accessed by multiple clients
some disadvantages
not good scaling when the number of clients
increases
not fault-tolerant
other solution data replication
servers are replicated each server runs the same
state machine
clients make requests which are redirected to
specific servers

20
Replica Coordination(1)

Requirements
requests should be processed by state machines
one at a time
the order of processing should be consistent with
potential causality
outputs determined only by the sequence of
requests, independent of time or any other
activity in the system
Replica coordination
agreement every nonfaulty state machine replica
receives every request
order every nonfaulty state machine replica
processes the requests it receives in the same
relative order
issues to be considered fault-tolerance and
reconfiguration
MULTIPAXOS possible solution to the problem

21
Replica Coordination(2)MULTIPAXOS For Replica
Coordination

each process in the system maintains a copy of
the data object
a client requests un update operation
a process proposes the operation in an instance
of MULTIPAXOS
after some time, the update operation is the
output value of the instance of MULTIPAXOS
the leader of the round updates its local copy
because of correctness, all the alive processes
update their copies, too
a report to the client is given
a client requests a read operation
the request is immediately satisfied based on the
local copy
Note majority to achieve consistency-gt majority
voting
a unique leader required to achieve
termination-gt primary copy replication

22
Replica Coordination(3)Order and Stability

unique identifiers for requests (total order)
implementation a replica next processes the
stable request with the smallest unique
identifier (stable request no request from a
correct client and with a lower uid can be
subsequently delivered to that state machine)
using logical clocks to ensure order and
stability
each process has a local counter
local counter is incremented after each event at
that process
each message sent is timestamped with the local
clock
upon receipt of a message, the local clock of the
receiver becomes 1maximum of timestamp and local
clock
a uid for each event is given by appending a
fixed-length bit (encodes the process id) to the
counter value of the process where the event
takes place
using real clocks to ensure order and stability
assumptions
the degree of clock synchronization better than
min message delivery time
a request r will be received by every correct
process no later then uid(r)?
stability test a request r is stable at a state
machine if the local clock reads time t and
tgtuid(r) ?

23
Replica Coordination(4)Reconfiguration

at time t there are P(t) processes, F(t) faulty
necessary condition for correct output
P(t)gtF(t)/2 if Byzantine failures are possible
P(t)gtF(t) if only fail-stop failures
system described by 3 sets clients (C), state
machines (S), and output devices (O)
information about them stored in state variables
and changed by commands
C and O make periodical queries-gt better share
processors
messages sent by S always contain information
about future reconfiguration-gt permanent
communication Slt-gtC and Slt-gtO
requests to change a configuration of the system
made by failure/recovery detector mechanism

24
Replica Coordination(6)Integrating A Repaired
Object

goal integrate element e at request r
notation er is the state a non-faulty system
element e should be in after processing all the
requests up to r
if processors are fail stop and logical clocks
are implemented, then the cooperation of only one
state machine replica is needed (if the sm has
not failed, then it is correct, and because of
consensus among replicas, its information on the
system is correct and complete with respect to
other sm) -gt the used sm should have access to
enough information
implementation er is sent to e before the
output produced by processing any request with
uid larger than uid(r)
e in O er usually is device-specific setup
information
can be stored in state variables of sm
e in C er usually based on sensor values read
use information from C to sm

25
Replica Coordination(7)Integrating A Repaired
State Machine

try to use the algorithm sm sends to e the
values of all its state variables before the
output produced by processing any request with
uid larger than uid(r) .... problem some client
request might be recieved by sm after sending
er, but delivered to e before its repair
solution sm must relay to e requests received
from clients
how long as soon as e has received a request
directly from a client c, requests from the same
c with larger uid need not be relayed to e
so, e should inform sm of the uid of requests
received directly from c
algorithm
(1) sm sends e the values of its state
variables and copies of pending requests
(2) sm sends to e every subsequent
request r received from client c s.t.
uid(r)ltuid(rc) (rc is the first request e has
directly recieved from c, after e restarted)