Title: State Space Exploration
1State Space Exploration
- Objective
- Explore the different execution paths that a
system might take, in order to check the presence
or absence of certain properties. - In this context, a state represents a condition
of the system that if reproduced in another
system, the future behaviour of the second system
would be indistinguishable from the first
2Context
- Assumptions
- There are a set of processes p1, , pn.
- Each process has a queue of finite length where
incoming messages are stored until consumed.
3An exploration state is
- If a process is based on a finite-state machine
(FSM), then an exploration state is comprised of
the combination for all processes of - the FSM state
- the set of messages in the input queue waiting to
be received - Terminology FSM is interpreted here in its
strictest sense i.e. there are no variables
associated with the state machine, and there are
only states, inputs, outputs, and transitions.
4Example
Process B
Process A
-/y
-/-
A1
B1
x/y
w/x
z/w
y/z
A2
B2
input queue
5Initial system state (0)
Process B
Process A
-/y
-/-
A1
B1
x/y
w/x
z/w
y/z
A2
B2
input queue
6State 1
Process B
Process A
-/y
-/-
A1
B1
x/y
w/x
z/w
y/z
A2
B2
y
input queue
7State 2
Process B
Process A
-/y
-/-
A1
B1
x/y
w/x
z/w
y/z
A2
B2
y
input queue
8State 3
Process B
Process A
-/y
-/-
A1
B1
x/y
w/x
z/w
y/z
A2
B2
z
input queue
9State 4
Process B
Process A
-/y
-/-
A1
B1
x/y
w/x
z/w
y/z
A2
B2
w
input queue
10State 5
Process B
Process A
-/y
-/-
A1
B1
x/y
w/x
z/w
y/z
A2
B2
x
input queue
11State 2 again
Process B
Process A
-/y
-/-
A1
B1
x/y
w/x
z/w
y/z
A2
B2
y
input queue
12Start again at state 0
Process B
Process A
-/y
-/-
A1
B1
x/y
w/x
z/w
y/z
A2
B2
input queue
13State 6
Process B
Process A
-/y
-/-
A1
B1
x/y
w/x
z/w
y/z
A2
B2
input queue
14State 2, yet again
Process B
Process A
-/y
-/-
A1
B1
x/y
w/x
z/w
y/z
A2
B2
y
input queue
15Reachable states
16Potential queue states
front of queues
4 possibilities (w, x, y, or z)
17Potential system states in the example
- Number of combinations of process states
- 3?3 9
- Number of combinations of messages in the two
queues - Four messages could occupy each queue slot, plus
combinations of empty queue slots - 46 2?45 3?44 4?43 3?42 2?4 1 7221
- Total potential number of system states 64,989
- Only 7 of these are actually reachable from the
initial state.
18Goals of state space exploration
- Determine which states are reachable.
- Of these states, are there any with undesirable
properties? - Structural problems
- deadlock
- unspecified reception
- queue overflow
- User may also have application-dependent goals
- Example ensure that the only paths to a state
that provides access to a secure resource are
ones that correctly pass through an authorization
procedure.
19Deadlock
- The system is in a deadlock state if
- The input queues for all processes are empty
- No process has a state that can execute without
an input, and produce an output.
20Deadlock
Process B
Process A
-/y
-/-
A1
B1
x/y
w/x
z/w
y/z
A2
B2
input queue
21Unspecified Reception
- An unspecified reception occurs when
- The input at the head of the input queue is not
specified for the current state for that process.
22Unspecified reception
Process B
Process A
-/y
-/-
A1
B1
x/y
w/x
z/w
y/z
A2
B2
y
input queue
w
23Queue overflow
- A process is selected for execution, and it sends
a message to another process for which the input
queue is already filled.
24Queue Overflow
Process B
Process A
-/y
-/-
A1
B1
x/y
w/x
z/w
y/z
A2
B2
y
x
input queue
y
y
25Exploration Algorithm
- Conduct a depth-first search of the reachability
graph - Need to be able to recognize states seen
previously
1
3
2
4
5
0
6
26Depth-first search
- Start at an initial state, and mark it visited.
- Determine the set of transitions that can be made
to a new state. - Choose one of transitions and follow it to reach
a new state. - If the new state has already been marked as
visited, go back to the start of the transition
and choose another. - If the new state had not already been marked as
visited, mark it and continue searching from
there. - If all transitions from a state lead to marked
vertices, retrace path backward to previous state
and continue. - Returning to the initial state with all
transitions explored should terminate the search.
27Practical considerations
- Tree formed by all possible execution states of a
system is often extremely large - Individual system states need to be
- recognizable
- markable
- Major constraint memory
- The previous example may require recognizing
64,989 distinct states, and marking whether each
one of them has been visited. - Most real systems are much larger ?
28How to record state information?
- Bit-state hashing
- Reserve a large block of memory, and set all bits
to 0. - Use hash function
- f(A state, B state, A queue, B queue) ? a memory
bit - When a state is visited, compute hash function to
identify a memory bit, and then check the current
value of that bit - If 1 state has already been visited
- If 0 set bit to 1.
29Bit-state hash function considerations
- Hash function has to effectively distribute range
of values over the allocated memory, while
minimizing chance of collision - Provide estimation of collision risk to user.
- If user can interactively set size of memory that
is allocated, then hash function needs to be
parameterized.
30Restricting the Exploration
- Methods to keep exploration manageable
- Start at a specific internal state and explore a
subset of complete tree - Maximum path length from starting point
- When maximum is reached, terminate traversal on
that branch and begin again from starting point. - Restrict size of memory for states
- Limits on memory or execution time.
31Interactive Let user choose paths
32Generalized state-space exploration
- When processes have variables, and messages have
parameters, then additional distinguishable
states are - Combination of set of values for all variables in
processes, and set of values for all parameters
of all messages in all queues - While there are not an infinite number of states,
there are certainly more than can be reasonably
visited. - typical range 109 to 1011
33Managing generalized exploration
- Restrictions on message parameters
- Only specific values allowed
- Example restrict an integer to be in the
range 0, 1, 2 in any message parameter. - Equivalence classes
- Choose set of representative values
- Example -55,0,55, representing positive,
negative, and zero values. - Use restrictive data types
- Enumerated data types, instead of integers.
34Representing the user
- Treat user as a process that could, at any
time, send any message to any process. - User process has no state and no input queue.
- Not counted as part of global system state.
- If system has more structure, then restrict user
process to sending only messages that are
permitted by the architecture - Example In UML, a user (an actor) would only
be able to send messages specified as incoming
from the system boundary, and on appropriate
channels.
35Bringing in the User
Process B
User
Process A
-/y
-/-
U1
A1
B1
-/w -/x -/y -/z
x/y
w/x
z/w
y/z
A2
B2
User can send any message to any queue at any
time.
input queue
36Constraining the User
- If we leave the user to be this unrestricted,
then no matter how large the input queues are, we
could always send enough messages from the user
to fill the queues. - To give the system a chance to execute, we may
need to introduce priorities as to what types of
transitions we can select as alternatives.
37Transition Priorities
- Priority rule If several transitions are
potentially executable based on the current
global state, we must select the transition to
execute from the set of transitions at the
highest priority. - Transitions of equal priority represent
alternative selections that must be considered.
1
1
priority 1
0
2
0
priority 2
3
3
effective reachability graph
38Effect of priorities
- Suppose we set user transitions at priority 2,
and internal transitions at priority 1 (most
important). - Advantages
- We only ask for user input when the system would
otherwise deadlock (be idle) - This gives a fair chance for the system to
execute and respond to user input - Disadvantages
- User may be unable to perform cancel operations
- Does not account for malicious users
39Timers
- Timers are also treated as if they were a process
- Unlike the user, there could be multiple
instances of timer processes - Timers can receive set or cancel messages.
- Timer expiry spontaneously sends the timeout
message to the process that started it, but only
after a set message that has not cancelled it. - Priority of timers also needs to be carefully
considered, relative to user and internal
actions. - Want to avoid situation where timer always
expires or never expires.
40Timers
Timer process
-/-
T1
-/timeout
set/-
cancel/-
T2
41Partial Order Reduction
- The reachability graph can be reduced, in terms
of the number of states, by recognizing
equivalent paths and removing states that are not
significant. - Example changing two data variables
1
a ? 2
1
b ? 3
3
2
a ? 2
b ? 3
a ? 2
b ? 3
4
4
42Transition independence
- Two transitions in the reachability graph are
independent if - both can be selected
- executing one transition will not disable the
other - combined effect of executing both transitions is
the same no matter which order is taken - Intermediate states for independent transitions
need not be stored
43Statement Merging
1
1
- Straight line sequences with no alternatives can
be merged to reduce the number of states. - Must be single entry and single exit
2
5
6
2/3
5
6
3
4
4
44Example of bit-state hashing
- SPIN state space search and model checking
algorithm (G. Holzmann) - Set up large range of memory and set all bits to
0 - Use two different CRC (cyclic redundancy check)
error check calculations to identify two bits in
memory range - CRC shift input left m bits, and determine
remainder when divided by a specific CRC
polynomial of degree m - If at least one of the two bits is 0, assume that
this state has not been visited - If both bits are 1, assume that this state has
been visited
45Hash collisions
- A collision is when the hash function
determines two addresses containing 1 bits when
that state has not been visited - Result target state is missed
- If interesting properties (deadlock, etc.)
occur in that state, they will not be reported - Are successors of the state also missed?
- Not necessarily if there is more than one path
to successor states, alternative routes may be
found to reach these states.
46Probability of Hash Collisions
- If using k hash functions on m bits of memory,
probability of a specific bit being 0 after
storing r states is - Probability of hash collision
47Logic Model Checking
- Source G. Holzmann, Caltech course notes
- General approach
- L(S) set of behaviours that is possible from a
system S - L(p) a set of valid/desirable properties for
system S - We would like to check that L(S) ? L(p)
- It can be shown that the above statement is
equivalent to L(S) ? L(? (p)) ? - That is, the system cannot exhibit any
undesirable behaviours
48What can be checked?
- Best for errors related to concurrency and
multi-threading - deadlocks, livelocks, starvation
- race conditions
- locking problems, priority problems
- resource allocation errors
- reliance on relative speeds of execution of
threads - violations of known system bounds
- specification incompleteness
- specification redundancy (dead code)
- logic problems missing causal or temporal
relations
49Two parts
- Specification of the system design model
- Formalism such as finite state machines (and
extensions), Petri Nets, Promela, etc. - Specification of the desired system properties
(requirements) - Example Linear Temporal Logic (LTL)
50Safety and Liveness
- Claims about properties
- Inevitable the property must be true at some
point in the future - Liveness something good eventually happens
- For model checking attempt to postpone the
good thing indefinitely - Impossible the property can never be true at
any point in the future - Safety nothing bad ever happens
- For model checking search for executions in
which the bad thing happens.
51Properties
- Properties of states
- Assertions a property is true at a particular
state. - Invariants a property is true in every state
- End states define the states that are the end
points of a successful execution trace. - Properties of sequences of states
- Trace assertions a property about the trace as a
whole - Acceptance a system that runs continuously
should continue to pass through acceptance
states. - Progress starvation has been avoided
- Never claims a bad state is never reached
52SPIN
- Creator Gerard Holzmann, now at NASAs Jet
Propulsion Laboratory - References
- G. Holzmann, The Design and Validation of
Computer Protocols, 1993 - G Holzmann, The Spin Model Checker, 2004
- www.spinroot.com
- Consists of
- PROMELA PRocess MEta LAnguage
- Used to model a design
- LTL Linear Temporal Logic
- Used to specify requirements
- SPIN Simple Promela Interpreter
- Model checker generator
53Promela
- Non-deterministic, guarded command language for
specifying the possible system behaviours in a
distributed system design - Systems of interacting, asynchronous threads of
execution - The purpose is not to prevent the specification
of bad or unstructured designs (on the contrary) - e.g., gotos are supported
- The purpose is to allow the specification of
designs in such away that they can be checked
with a model checker
54Context
Promela model
Executable model checker
model checking C code
Compile
Spin
LTL Properties
55Central Promela Concepts
- Finite-state models only Promela models are
always bounded - Boundedness in our case guarantees decidability
- finite state models can still permit infinite
executions - Asynchronous behaviour
- no hidden global system clock
- no implied synchronization between processes
- Non-deterministic control structures
- to support abstraction from implementation level
detail - Executability as a core part of the semantics
- Every basic and compound statement is defined by
a precondition and an effect - A statement can be executed, producing the
effect, only when its precondition is satisfied
otherwise, the statement is blocked
56Three types of Promela Objects
- Processes
- Can be created at system initialization, or
during run time. - Message channels
- Modelled as queues of finite length
- Data
- Local and global variables
57Example
- Create a Promela model for the river crossing
problem - Model consists of a process for each side of the
river. - Syntax of Promela is C-like, with the following
notable additions - choice of executable statements
- (c)-gt execute if c evaluates to non-zero
- ch!m send a message m on channel ch
- ch?m receive a message m on channel ch
- if .. fi one-time choice of alternatives
- do .. od loop for choice of alternatives
58Promela Example (1/2)
- / Define message types /
- mtype Wolf, Goat, Cabbage, Farmer
- / Define channels /
- chan ch12 1 of mtype / queue length is
1 / - chan ch21 1 of mtype
- active proctype River ( ) / active process
exists / - / on startup /
-
- / process for each side of the river /
- / parameters presence of farmer, wolf,
goat, cabbage - initially, plus input and
output channels / - run side1( true, true, true, true, ch21, ch12
) - run side2( false, false, false, false, ch12,
ch21 )
59Promela process type for river side
- proctype side( bool farmerHere bool wolfHere
bool goatHere - bool cabbageHere chan in chan out
) -
- do
- ( farmerHere true ) -gt
- if
- (wolfHere true) -gt
- d_step out!Wolf wolfHere false farmerHere
false - (goatHere true ) -gt
- d_step out!Goat goatHere false farmerHere
false - (cabbageHere true ) -gt
- d_step out!Cabbage cabbageHere
false farmerHere false - out!Farmer-gt farmerHere false
- fi
- (farmerHere false ) -gt
- if
- in?Wolf -gt d_step wolfHere true
farmerHere true - in?Goat -gt d_step goatHere true
farmerHere true - in?Cabbage -gt d_step cabbageHere
true farmerHere true
60Running the SPIN simulator
61Running the state-space exploration
62Viewing a Promela state machine
63Correctness Claims
- An assertion formalizes the claim
- It is impossible for the given expression to
evaluate to false when the assertion is reached - An end-state label formalizes the claim
- It is impossible for the system to terminate
without all active processes having either
terminated, or having stopped at a state that was
marked with an end-state label - A progress-state label formalizes the claim
- It is impossible for the system to execute
forever without passing through at least one of
the states that was marked with a progress-state
label infinitely often.
64Correctness Claims
- An accept-state label formalizes the claim
- It is impossible for the system to execute
forever while passing through at least one of the
states that was marked with an accept state label
infinitely often. - A never claim formalizes the claim
- It is impossible for the system to exhibit the
behaviour (finite or infinite) that completely
matches the behaviour that is specified in the
claim. - A trace assertion formalizes the claim
- It is impossible for the system to exhibit
behaviour that does not completely match the
pattern defined in the trace assertion.
65What SPIN does
- Promela design model is converted to a Buchi
automaton - Essentially, this is the state space of the
system. - Properties expressed in linear temporal logic
(LTL) are negated and converted to a second Buchi
automaton. - The language accepted by the intersection of the
two automata is determined. - If this language is null, no requirements were
violated.
66Linear Temporal Logic
- We need a clear, concise, and unambiguous
notation for stating desired properties of
concurrent systems - Propositional linear temporal logic (LTL)
- introduced by Amir Pnueli in late 70s
- Example
- ((a ! b) -gt ltgt (a b))
- It is always the case that when (a ! b)
eventually ltgt we must also have (logical
implication) (a b) - Note that this defines a class of executions
(runs), rather than an instance of an execution
(run)
67Linear Temporal Logic
- Temporal logic formulae can specify both safety
and liveness properties - LTL propositional logic temporal operators
- P always P
- ltgtP eventually P
- P U Q P is true until Q becomes true
68LTL formulae
69Spins Algorithms
- Checking safety properties
- basic depth-first search
- variant1 stateless search
- variant2 depth-limited search
- breadth-first search
- Checking liveness properties
- non-progress cycles
- acceptance cycles
- Spins nested depth-first search algorithm
- Optimization methods
- partial order reduction, state compression,
- alternate state representation methods
70Breadth-first vs. Depth-first search
- Pro
- With the breadth-first search, safety violations
are detected at the shortest possible distance
from the root - Con
- In breadth-first search, we can no longer use the
contents of the stack to produce a complete
counter-example when a safety violation is found - We must now store with each state a pointer to at
least one predecessor state to be able to
reconstruct the path from root to error state,
which increases memory use - The statespace cannot be lossy
- No efficient strategy is known for extending a
breadth-first search to do cycle detection (to
check liveness properties).