Title: The Rare Glitch Project: Scenario Graph Generation and MDPBased Analysis
1The Rare Glitch ProjectScenario Graph
Generation andMDP-Based Analysis
Jeannette M. Wing
- Computer Science DepartmentCarnegie Mellon
UniversityPittsburgh, PA
The work on survivability analysis is funded by
DARPA. It is done jointly with Somesh Jha
(University of Wisconsin) and Oleg Sheyner (CMU
graduate student).
2Overview of Our Method
3Relation to ARO Proposal
- Hypothesis Our two-phase method and tool suite
are applicable to analyzing embedded systems. - Rationale
- Network model is simpler (bus-based, not
end-to-end). - Reliability and cost are important factors in
embedded systems design. - Our plan
- Enrich and make current tool suite robust.
- Apply method to embedded systems examples.
- Pursue foundational issues wrt models for
reliability and cost analyses.
4Survivable Systems
- What if
- a terrorist hacker brings down the nations power
grid? - an act of Mother Nature causes the international
financial network to fail? - Critical infrastructures
- Utilities gas, electricity, nuclear, water,
- Communications telephone, networks,
- Transportation airlines, railways, highways,
- Medical emergency services, hospitals,
- Financial banking, trading,
5Survivability
- A system is survivable if it can continue to
provide end services despite the presence of
faults.
6Modeling for Survivability Analysis
- Our starting point
- Handle both benign and malicious faults.
- Throw out independence assumption.
- Incorporate semantics of end service in model.
- Do not necessarily treat nodes and links the
same. - Include cost in the model from the start.
- Steps in our approach
- Model general network topology (nodes and links)
- Analyze in two phases for
- Functional behavior
- Reliability, cost, etc.
7Phase 1
Network Model
Survivability Property
A set of concurrently executing Finite State
Machines.
A predicate in CTL.
Model Checker (modified) NuSMV
Scenario Graph
A set of related examples.
8Simple Example A Banking System
FRB 3
FRB 2
MC 3
MC 2
MC 1
b1
b2
c1
a1
a2
9Network Model
- Processes
- Nodes and links are processes (i.e., FSMs)
- banks, money centers, federal reserve banks, and
links - Communication via shared variables (i.e., finite
queues) - representing channels, and hence
interconnections. - Failures
- Faults represented by special state variable
- faultnormal, failed, intruded
- Links and banks can fail at any time
- Failed link blocks all traffic.
- Failed bank routes all checks to an arbitrarily
chosen money center. - Money centers and federal reserve banks do not
fail.
10Survivability Properties
- Fault-related
- Money never deposited into wrong account.
- AG(?error)
- Service-related
- A check issued eventually clears.
- AG(checkIssued ? AF(checkCleared))
11Inputs to Model Checker
- State machines
- MODULE main
- fault normal, fail-stop, Byzantine,
hacker-attack, terrorist-attack, link-down, -
- next (fault) case
- fault normal normal, fail-stop,
-
- Pi(vn) hacker-attack, terrorist-attack
- default fault
- esac
- MODULE bank(user, ltother input parametersgt)
- next () case
- Pj(vm) fault normal gt ltroute check to
user.destinationgt - ...
- Property
- AG(?error)
12Scenario Graphs
- Given a state machine, M, and a property, P, a
scenario graph is a concise representation of the
set of traces of M with respect to P. - P fault property
- A fault scenario graph represents all system
traces that end in a state that does not satisfy
P. - P service property
- A service success (fail) scenario graph
represents all system traces in which an issued
service successfully finishes (fails to finish).
13Fault Scenario Graph
- Intuition
- Each counterexample spit out by the model
checker is a scenario. - Survivability property gives a slice of the
model.
14Survivability Properties
- Fault-related
- Money never deposited into wrong account.
- AG(?error)
- Service-related
- A check issued eventually clears.
- AG(checkIssued ? AF(checkCleared))
15A Service Success Scenario Graph
issueCheck(A, C)
send(A, MC-2)
send(A, MC-1)
send(MC-2, FRB-1)
send(MC-1, FRB-2)
send(FRB-1, FRB-3)
send(FRB-2, FRB-3)
send(FRB-3, MC-3)
send(MC-3, C)
debitAccount
16A Service Fail Scenario Graph
issueCheck(A, C)
down(A)
up(A)
pick(MC-2)
pick(MC-1)
down(c1)
FAIL
up(a2)
down(a2)
down(a1)
up(a1)
send(A, MC-2)
send(A, MC-1)
down(c1)
down(c1)
FAIL
FAIL
FAIL
17Overview of Method
Network Model
Survivability Property
Phase 1
Checker
Scenario Graph
Reliability Query,Cost Query, etc.
Analyzer
Phase 2
Scenario Set
18Phase 2 Reliability Analysis (in a Nutshell)
- Annotations Probabilities
- Use Bayesian Networks to model dependence of
events. - Symbolic
- Use symbolic probabilities
- high, medium, low
- Use NDFA theory to compute scenario set.
- Continuous
- Use numeric probabilities
- 0.0, 1.0
- Use Markov Decision Processes to model both
nondeterministic and probabilistic transitions.
19Phase 2a Symbolic Analysis
Annotated Scenario Graph
Reliability Query
Bayesian Network Scenario Graph
Regular Expression (DFA)
Composer ASG DFA
Scenario Set
High-risk scenarios
20Symbolic Reliability Analysis
- Symbolic values
- high, medium, low
- Operations on symbolic values
- Joint probability of two events, x ? y
- ? high medium low
- high high high high, medium
- medium high high, medium medium, low
- low high, medium medium, low low
- Complement of an event 1 ? x
- 1 - high low
- 1 - medium medium
- 1 - low high
21Bayesian Network
P(a1) medium
a1
a2
22Annotated Scenario Graph
issueCheck(A, C)
down(A)
up(A)
pick(MC-2)
pick(MC-1)
up(a1)
down(a1)
down(a2)
down(a2)
FAIL
23Phase 2 Continuous Analysis
- Use real values for probabilities.
- May leave probabilities of some events
unspecified. - Markov Decision Processes
- Mix of nondeterministic and probabilistic
transitions - Why? System is not closed.
- Hard to assign probabilities to some faults
(e.g., intrusions). - Environment makes choice (i.e., decision) and can
be demonic!
24Reliability Analysis
- Goal of (malicious) environment Devise an
optimal policy to minimize reliability. - Assign to each state, s, a value, V(s), computed
using a standard policy iteration algorithm from
MDP literature. - Let V be the value function after convergence.
Then, for initial state of scenario graph, s0,
V(s0) computes worst-case probability of service
eventually finishing.
25A Typical Example
0.6
0.6
0.7
V(Bad) 0.0
V(Good) 1.0
26Bayesian Network for Bank Example
P(a1) 1/2
a1
a2
27A Service Success Scenario Graph
issueCheck(A, C)
send(A, MC-2)
send(A, MC-1)
send(MC-2, FRB-1)
send(MC-1, FRB-2)
send(FRB-1, FRB-3)
send(FRB-2, FRB-3)
send(FRB-3, MC-3)
The worst case probability that a check issued by
Bank A on Bank C is (1/2 3/8) (1/2 1/4)
5/16
send(MC-3, C)
debitAccount
28Cost-Benefit Analysis
- Goal Choose a set of links to upgrade to achieve
higher reliability, given my cost constraints
(e.g., fixed budget). - Identify new actions that correspond to decisions
an architect needs to make (e.g., upgrade a1). - Associate a cost with each action.
- Define constraints on costs.
29Upgrade Links in Banking System
FRB 3
FRB 2
MC 3
MC 2
MC 1
b1
b2
c1
a1
a2
30Cost Constraint Example
- Assume
- If we upgrade a1 and c1 then P(a1) and P(c1) both
increase to 3/4 . - If a2 is upgraded, then P(a2) is
- P(a2a1) 3/4
- P(a2 ?a1) 3/8
- Aim Maximize the worst-case reliability subject
to the constraint that at most two links can be
upgraded. Solve this non-linear integer
programming problem - xa1 xa2 xc1 lt 2
- Best option Upgrade a1 and c1.
- xa1 1 and xa2 1 7/16
- xa1 1 and xc1 1 39/64
- xa2 1 and xc3 1 9/16
31Constrained Markov Decision Processes
- ltS, A, P, c, dgt
- S is a finite state space.
- A is a finite set of actions.
- P are transition probabilities. Psas is the
probability of moving from state s to s if
action a is chosen. - c (S x A) ? ? is the immediate cost. c(s, a) is
the cost of choosing action a at state s. - d (S x A) ? ?k is a k-dimensional vector of
immediate costs, captures additional cost
constraints.
32Survivability Case Studies
- Somesh Jha
- Trading floor model of major investment bank
(being sanitized) - 10K lines of NuSMV
- half-million nodes in scenario graph
- 50 threat scenarios
- 45 found by system
- 5 new threat scenarios found
- With independence assumption, too many misses.
- B2B e-commerce NYC start-up (Jha)
- 50K lines of Statecharts
- 2 million NuSMV beyond capability of tool
- Oleg Sheyner
- Intrusion detection (ongoing)
33Intrusion Detection System Case Study
- Done by Oleg Sheyner and Lincoln Labs.
- Motivated by hand drawn poster of attack
scenarios. - Illustrates only first part of method.
34Example Attack Tree for Novice Attacker
- Attackers goal is to corrupt the database on S1
- Any path from the beginning to end accomplishes
this goal - Events in red are detected by an intrusion
detection system - Goal is to generate a complete attack graph and
determine visibility automatically
35Example of Attack Tree Developed by a
Professional Red Team
- Sandia Red Team White Board attack tree from
DARPA CC20008 Information battle space
preparation experiment
36Phase 1 ExampleMultistage Network Penetration
Goal Root access to host ip2
37Model
- Network
- hosts
- services
- connectivity
- trust relationships
- Adversary
- Knowledge about the network
- Privilege levels on hosts
- Attacks
- Preconditions
- Local (adversary)
- Global (network-wide)
- Traces
- Effects
- Local (adversary)
- Global (network-wide)
- Different flavors
- Intrusion detection system
- Network (inter-host)
- Host-based (local)
38NuSMV Encoding
- Network
- 1 attack host, 2 target hosts with services
- 3x3 connectivity matrix
- existence of routing path
- ability to connect to ftp and ssh services
- 3x3 trust matrix
- Adversary
- Privilege levels for each host
- Attacks
- 4 attacks
- some have multiple flavors
- NuSMV Statistics
- 82 bits of state (282 states)
- lt40K representation nodes
- 7000 reachable states
- 2 sec runtime on 1GHz Pentium III
- 8MB of memory used
39Scenario-Generating Properties
- Dont care about detection
- AG (adversary.privilege2 lt root)
- Want stealth
- AG ((adversary.privilege2 lt root) or
(IDS.detected))
40(No Transcript)
41(No Transcript)
42Survivability Analysis Tool Suite
43The Rare Glitch Tool Suite
Checkers and Provers
Analysis Engines
Specification and Modeling Languages
Reliability and Cost Analyzers
nuSMV
44Plan in Relationship to The Rare Glitch Project
- Enrich and make current tool suite robust.
- Integrate with other existing project tools.
- Apply method to embedded systems examples.
- Work with Clarke to add reliability analysis to
automotive example. - Pursue foundational issues wrt models for
reliability and cost analyses. - Understand relationship to other probabilistic
models, hybrid models, etc.