Title: Survivability Analysis of Networked Systems
1Survivability Analysis of Networked Systems
Jeannette M. Wing with Somesh Jha (Wisconsin) and
Oleg Sheyner DARPA co-PI Tom Longstaff (SEI)
- Computer Science Department
- Carnegie Mellon University
- Pittsburgh, PA
- DARPA OASIS PI Meeting, Norfolk, VA
- 14 February 2001
2Survivability
- What if
- a terrorist hacker brings down the nations power
grid? - an act of Mother Nature causes the US banking
network to fail? - Critical infrastructures
- Utilities gas, electricity, nuclear, water,
- Communications telephone, networks,
- Transportation airlines, railways, highways,
- Medical emergency services, hospitals,
- Financial banking, trading,
3Survivability
- A system is survivable if it can continue to
provide end services despite the presence of
faults. - Faults
- Accidental or malicious
- Not necessarily independent
- Finer-grained reliability analysis is required.
- Service-oriented
- Exploit semantics of application
- Not all network nodes and links are treated
equally.
4Foundational Questions
- What is the difference between models for
survivability and those for - Fault-tolerant distributed systems?
- Secure systems?
- Our starting point
- Independence assumption goes out the window.
- Cost must be included in the equation.
5Determining Survivability Strategies
Improved Requirements/ Architecture
Survivable Network Analysis
System Requirements/ Architecture
Essential Services Intrusion Effects Mitigation
Strategies
SEI CERT/CC Intrusion Knowledge
6Two Parts in Cooperation
- The Survivable Network Analysis Method (SEI)
- Measures existing systems for survivability
- Focuses on user and intruder models
- Applying Formal Methods to SNA
- Applies model checking and other techniques to
survivability - Allows systems that are formally specified to be
submitted to survivability analysis
7The Survivable Network Analysis Method
- STEP 2
- ESSENTIAL CAPABILITY DEFINITION
- Essential service/asset selection/scenarios
- Essential component identification
- STEP 4
- SURVIVABILITY ANALYSIS
- Softspot component (essential compromisable)
identification - Resistance, recognition, and recovery analysis
- Survivability Map development
- STEP 1
- SYSTEM DEFINITION
- Mission requirements definition
- Architecture definition and elicitation
- STEP 3
- COMPROMISABLE CAPABILITY DEFN
- Intrusion selection/scenarios
- Compromisable component identification
8Simple Example A Banking System
FRB 3
FRB 2
MC 3
MC 2
MC 1
b1
b2
c1
a1
a2
9Overview of Our Formal Method
10Phase 1
Network Model
Survivability Property
A set of concurrently executing Finite State
Machines.
A predicate in CTL.
Model Checker (modified) NuSMV
Scenario Graph
A set of related examples.
11Network Model
- Processes
- Nodes and links are processes (i.e., FSMs)
- banks, money centers, federal reserve banks, and
links - Communication via shared variables (i.e., finite
queues) - representing channels, and hence
interconnections. - Failures
- Faults represented by special state variable
- faultnormal, failed, intruded
- Links and banks can fail at any time
- Failed link blocks all traffic.
- Failed bank routes all checks to an arbitrarily
chosen money center. - Money centers and federal reserve banks do not
fail.
12Survivability Properties
- Fault-related
- Money never deposited into wrong account.
- AG(?error)
- Service-related
- A check issued eventually clears.
- AG(checkIssued ? AF(checkCleared))
13Inputs to Model Checker
- State machines
- MODULE main
- fault normal, fail-stop, Byzantine,
hacker-attack, terrorist-attack, link-down, -
- next (fault) case
- fault normal normal, fail-stop,
-
- Pi(vn) hacker-attack, terrorist-attack
- default fault
- esac
- MODULE bank(user, ltother input parametersgt)
- next () case
- Pj(vm) fault normal gt ltroute check to
user.destinationgt - ...
- Property
- AG(?error)
14Output Fault Scenario Graph
- Intuition
- Each counterexample spit out by the model
checker is a scenario. - Survivability property gives a slice of the
model.
15Survivability Properties
- Fault-related
- Money never deposited into wrong account.
- AG(?error)
- Service-related
- A check issued eventually clears.
- AG(checkIssued ? AF(checkCleared))
16A Service Success Scenario Graph
issueCheck(A, C)
send(A, MC-2)
send(A, MC-1)
send(MC-2, FRB-1)
send(MC-1, FRB-2)
send(FRB-1, FRB-3)
send(FRB-2, FRB-3)
send(FRB-3, MC-3)
send(MC-3, C)
debitAccount
17A Service Fail Scenario Graph
issueCheck(A, C)
down(A)
up(A)
pick(MC-2)
pick(MC-1)
down(c1)
FAIL
up(a2)
down(a2)
down(a1)
up(a1)
send(A, MC-2)
send(A, MC-1)
down(c1)
down(c1)
FAIL
FAIL
FAIL
18Overview of Method
Network Model
Survivability Property
Phase 1
Checker
Scenario Graph
Reliability Query,Cost Query, etc.
Analyzer
Phase 2
Scenario Set
19Phase 2 Reliability Analysis (in a Nutshell)
- Annotations Probabilities
- Use Bayesian Networks to model dependence of
events. - Symbolic
- Use symbolic probabilities
- high, medium, low
- Use NDFA theory to compute scenario set.
- Continuous
- Use numeric probabilities
- 0.0, 1.0
- Use Markov Decision Processes to model both
nondeterministic and probabilistic transitions.
20Phase 2a Symbolic Analysis
Annotated Scenario Graph
Reliability Query
Bayesian Network Scenario Graph
Regular Expression (DFA)
Composer ASG DFA
Scenario Set
High-risk scenarios
21Symbolic Reliability Analysis
- Symbolic values
- high, medium, low
- Operations on symbolic values
- Joint probability of two events, x ? y
- ? high medium low
- high high high high, medium
- medium high high, medium medium, low
- low high, medium medium, low low
- Complement of an event 1 ? x
- 1 - high low
- 1 - medium medium
- 1 - low high
22Bayesian Network
P(a1) medium
a1
a2
23Annotated Scenario Graph
issueCheck(A, C)
down(A)
up(A)
pick(MC-2)
pick(MC-1)
up(a1)
down(a1)
down(a2)
down(a2)
FAIL
24Phase 2b Continuous Analysis
- Use real values for probabilities.
- May leave probabilities of some events
unspecified. - Markov Decision Processes
- Mix of nondeterministic and probabilistic
transitions - Why? System is not closed.
- Hard to assign probabilities to some faults
(e.g., intrusions). - Environment makes choice (i.e., decision) and can
be demonic!
25Reliability Analysis
- Goal of (malicious) environment Devise an
optimal policy to minimize reliability. - Assign to each state, s, a value, V(s), computed
using a standard policy iteration algorithm from
MDP literature. - Let V be the value function after convergence.
Then, for initial state of scenario graph, s0,
V(s0) computes worst-case probability of service
eventually finishing.
26A Typical Example
0.6
0.6
0.7
V(Bad) 0.0
V(Good) 1.0
27A Service Success Scenario Graph
issueCheck(A, C)
send(A, MC-2)
send(A, MC-1)
send(MC-2, FRB-1)
send(MC-1, FRB-2)
send(FRB-1, FRB-3)
send(FRB-2, FRB-3)
send(FRB-3, MC-3)
The worst case probability that a check issued by
Bank A on Bank C is (1/2 3/8) (1/2 1/4)
5/16
send(MC-3, C)
debitAccount
28Phase 2c Latency and Cost Analysis
- Latency Analysis
- Associate with each edge in scenario graph an
immediate cost (e.g., time it takes to execute
event). - Q What is the worst case latency scenario?
- Cost Analysis
- Identify new actions that correspond to decisions
an architect needs to make. - Associate a cost with each action.
- Define constraints on costs.
- Q Which set of links can I afford to upgrade to
achieve higher reliability, given my cost
constraints?
29Cost-Benefit Analysis
- Goal Choose a set of links to upgrade to achieve
higher reliability, given my cost constraints
(e.g., fixed budget). - Identify new actions that correspond to decisions
an architect needs to make (e.g., upgrade a1). - Associate a cost with each action.
- Define constraints on costs.
30Simple Example A Banking System
FRB 3
FRB 2
MC 3
MC 2
MC 1
b1
b2
c1
a1
a2
31Constrained Markov Decision Processes
- ltS, A, P, c, dgt
- S is a finite state space.
- A is a finite set of actions.
- P are transition probabilities. Psas is the
probability of moving from state s to s if
action a is chosen. - c (S x A) ? ? is the immediate cost. c(s, a) is
the cost of choosing action a at state s. - d (S x A) ? ? is a k-dimensional vector of
immediate costs, captures additional cost
constraints.
32Progress To Date Tools
- Trishul tool
- Uses NuSMV model checker, done by Somesh Jha.
- History variable explodes state space, leading
to - New tool
- Uses SPIN, ongoing by Oleg Sheyner.
- No need for history variable.
33Progress To Date Case Studies
- Trading floor model of major investment bank
(being sanitized) - 10K lines of NuSMV
- half-million nodes in scenario graph
- 50 threat scenarios
- 45 found by system
- 5 new threat scenarios found
- With independence assumption, too many misses.
- B2B e-commerce NYC start-up (Jha)
- 50K lines of Statecharts
- 2 million NuSMV beyond capability of tool
- Lincoln Labs example (Sheyner)
- TBD
34Next Steps
- Show applicability of the CMDP model for other
critical infrastructure examples. - Via Lincoln Labs connection
- Combine with other tools to further automate the
analysis. - Linear programming package, theorem provers,
- Integrate with informal SEI Survivability Network
Analysis Method - Via case studies
35References
- Applying Formal Methods to SNA
- Jha and Wing, Survivability Analysis of
Networked Systems, to appear, Proceedings of
the International Conference of Software
Engineering, 2001. - Survivable Networking Analysis
- Ellison, Linger, Longstaff, and Mead, Survivable
Network System Analysis A Case Study, IEEE
Software, July/August 1999. - The Vigilant Healthcare System
- Ellison, Fisher, Linger, Lipson, Longstaff, and
Mead, Survivability Protection Your Critical
Systems, IEEE Internet Computing,
November/December 1999. - Web site IEEE article and other reports
- www.sei.cmu.edu/organization/programs/nss/sur
v-net-tech.html
36Other OASIS Connection
- Recovery service for PASIS (Greg Ganger, Pradeep
Khosla, Carnegie Mellon) - Anticipate intrusion
- Proactive secret-sharing
- Upon intrusion detection
- Reactive secret-sharing