Title: Honeywell Laboratories
1Honeywell Laboratories
CORTEX Mission-Aware Closed-Loop Cyber
Assessment and Response
- 1/27/05 PI Meeting
- David Musliner
- Christopher Geib
- Mike Pelican
2Outline
- Project overview.
- Thin-slice initial demo.
- Proactive response planning.
- Planner evaluation tools.
- Quadchart.
3Project Overview
- Technical Objectives Automated defense systems
that - Model and understand their changing mission
needs. - Automatically develop defensive plans to
recognize and stop attacks. - Automatically regenerate and rebuild system
infrastructure. - Learn to prevent attacks.
- Resulting in a highly reliable self-regenerative
system. - Existing Practice Very limited condition-action
rules within some IDS systems. - Not mission aware, not self-aware.
- No lookahead, no proactive resource testing.
- No dynamic replanning or performance tradeoffs.
4Project Overview
- Technical Approach Integrate, extend improve
- Scyllarus state of the art intrusion
detection/correlation technology. - CIRCADIAs automated planning and controller
synthesis. - Learning methods to
- Refine models of attacks.
- Improve recognition of new attacks.
- Truly New
- Mission-aware, context-sensitive response and
self-regeneration. - Planned preemptive self-testing to detect faults
in mission-critical assets before they are
required. - Focused learning to improve the systems
performance on its specific mission.
5The CORTEX Vision
System Reference Model (Mission, behaviors,
faults, threats)
-
Mission Aware Meta Planner
Controller Synthesis Module
Custom reactive plan (proactive protection,
reactive defense, and healing)
Unexpected states, unhandled contingencies
Sensor inputs
Likely Security Situation
Active Security Controller Executive
Dynamic Evidence Aggregator
6Overview (contd)
- Major Risks and Mitigations
- Planning domain complexity
- System demonstrations on limited-scope domain.
- Scalable synthetic evaluation domains for
planning. - Alternative planning approaches.
- Learning
- Focused learning techniques for knowledge-rich
parts of the problem (e.g., learning size limits
on buffer overflow vulnerability). - Aggressive schedule
- Thin-slice first demonstration emphasizing
infrastructure. - Cyclic development plan focusing on incremental
improvement in each sub-area.
7Overview (contd)
- Quantitative Metrics
- Measures of attack learning and detection rates.
- Respond to 100 of detected attacks.
- Expected Major Achievements
- High confidence intrusion assessment and
diagnosis. - Pre-planned responses to contain/recover from
faults and attacks. - Automatic tradeoffs of security vs. service level
accessibility. - Learning to recognize and defeat novel attack.
8Overview
- Task Schedule
- Develop thin-slice demonstration (first version
complete). - Extend scenario (in progress).
- Develop learning capability experiments (in
progress). - Model mission phases (in progress).
- Proactive response planning (in progress).
- Milestones
Demos
Thin slice demo
Mission Aware Demo
Learning Demo
DEC 04
APR 05
DEC 05
9Thin Slice Demo Self-Regenerative MySQL
10Demo Objectives
- Implement taste-tester architecture to form a
redundant, high-reliability MySQL server system. - Illustrate detection and self-regenerative
response to successful attack. - Illustrate (simple) learning to improve immunity.
- Provide basis for future demonstrations of
multi-phase mission-awareness and learning.
11Demo Scenario
- N (8) MySQL servers are available as redundant,
replicable assets. - Queries arrive and are processed by the
designated Lead Taster. - If the Lead Taster has no problem with the query,
it is replicated to each of the servers. - If the Lead Taster fails
- Bad query is not sent to other servers.
- A backup server becomes Lead Taster.
- Bad query is sent to learning module for
generalization. - Dead server is restarted.
- Future occurrences of the same or similar
exploits are ineffective.
12Demo Development Process
- Design architecture for integrating sensor data
aggregation, reaction planning, plan execution,
and learning. - Design reduced-scope architecture for Demo 1.
- Survey MySQL vulnerabilities to identify suitable
host versions and exploits. - Build infrastructure and simple visualization
machinery. - Execute demonstration with hand-generated plan.
- Build planning input model of domain.
- Evaluate planner performance on domain model.
13Demo System Architecture
14Demo 1 Architecture
Snort Rules
Append new rule
After rule update Kill -HUP
If(hb_sync_bad) switch to next taster
Verter
Tail alerts
Push Cache
HB_sync, good/bad, Query
RTS
Snort
SQL Query
HB_sync
If(alert) QQb Else QQg
Good Bad Query Result
If(hb_sync_good) Replicate to all tasters
Tail xml
High Events
Replicator
If(hb_sync_bad) Send bad query to learning
Are we dead after this good query?
Good Query
Alert Distributor
Write new snort rules via CIRCADIA proto
Lead Taster
Learning
Tasters
15Survey of MySQL Vulnerabilities
16Assumptions
- Attacks take the Lead Taster off line.
- We are now beginning to look at other forms of
attacks. - The query just processed is responsible for
failures. - Queries must be transactional in effect.
- Required adding synchronous commits for
non-transactional administrative commands that
did, in fact, contain a vulnerability. - For binary poisons, we assume that preventing
the final step of the attack is sufficient.
17Before the Attack
Tasters
Bad Guy
Snort
Good Guy
Replicator
RTS (Executive)
Verter
18Before the Attack
Bad Guy enters exploit
19After the Attack
Lead Taster died
New Lead Taster
RTS detects failure and switches Lead, sends bad
query to learning
20Before the 2nd Attack
Dead Taster is restarted
Learner builds new tailored Snort rule
21After the 2nd Attack
Bad Guy enters exploit again
To no avail system has learned to block bad query
22Show Movie
23Proactive Response Planning
24Simple Planner Model for Demo 1
- (def-temporal query-arrives
- preconds ((query F))
- postconds ((query T))
- delay-distribution (uniform-distribution 10
20) - min-delay 10
- )
- (def-temporal query-stale
- preconds ((query T))
- postconds ((failure T))
- delay-distribution (uniform-distribution 20
50) - min-delay 20
- )
25Planner Model (contd)
- (def-reliable process
- preconds ((taster T) (query T))
- postconds ( (.5 (taster F) (query F) (hb-sync
F)) - (.5 (current F) (query F) (hb-sync
T))) - delay-distribution (uniform-distribution 1 1)
- cost 1
- )
- (def-action replicate-to-tasters
- preconds ( (current F) (taster T) (backup T)
) - postconds ( (current T) )
- wcet 1
- cost 1
- )
26Planner Model
- Goal maximize Expected Utility (EU).
- Rewards maintain (current T) for 10
utils/tick. - Arbitrary duration 200 ticks.
- Maximum possible EU lt 2000 (200 duration 10
utils/tick) - Less than because some queries will arrive,
incurring cost. - Planner uses goal-driven heuristic to derive
plan. - Evaluates safety and EU performance of plan using
simulation (sampling). - Backtracks/jumps to create new plans, directed by
failures. - Not yet well-directed in search after non-failure
plan found.
27Plan EU vs. Elapsed Planning Time
28First Safe Plan Found
Blue states satisfy goal. Two non-goal states. EU
1880. Elapsed planning time 800
milliseconds. If query kills taster, wait until
next query arrives to switch tasters and rebuild
the dead one.
2912th Safe Plan Found
- Only one non-goal state.
- EU 1940.
- Elapsed planning time 30 minutes.
- Key Switch tasters and restart backup server
immediately, even though you are in the goal
state. - Pre-position for eventuality of being pushed out
of goal state and pre-arranging to speed
restoration of goal state.
30Improving the Planner
- Local search (plan patching) based on heuristic
guidance. - E.g. If the current plan includes a multi-step
chain to re-establish a maintenance goal, try to
move one or more of the steps earlier, before the
goal is violated. - Random restarts probably required to escape local
maxima. - Investigate alternative solution method map to
MDPs. - Younes (CMU) Tempastic-DTP planner maps GSMDP
problems to MDPs using phase-type distributions. - Exponential state space growth, but solution
method is non-iterative.
31Scalable Planner Evaluation Domains
- In addition to demo-specific domains, we have
built scalable test domain generators to provide
rigorous evaluation metrics. - Expands test coverage to domains where utilities
and probabilities determine success. - Include abstractions for important SRS domain
characteristics. - Goal help drive Cortex planner development by
identifying relevant weaknesses.
32Basic Abstractions
- Each test consists of "games", revolving around a
single "goal". - Dwell goals per-tick reward for maintaining a
feature in face of clobbering threats, e.g.,
providing a network service, while under attack. - Achievement goals one-time reward for completing
multi-step process, e.g., configuring a network. - Goals and threats can be combined to test
scalability or the ability to make trade-offs.
33Example Scalability Baseline
- Domain single dwell goal subject to N threats.
- Threat delay uniform distribution from 1 to 100.
- Time-to-failure 20 ticks.
- Response time 1 tick.
34CORTEX Mission-Aware Closed-Loop Cyber
Assessment and Response
NEW IDEAS
Attacks, intrusions
- System Reference Model including mission models
drives intrusion assessment, diagnosis, and
response. - Automatically search for response policies that
optimize tradeoff of security against mission
ops. - Taste-tester server redundancy supports
robustness and learning from new attacks.
Security Tradeoff Planner
Computing services
Networks, Computers
Controller Synthesis Module
Scyllarus Intrusion Assessment
Active Security Controller Executive
CIRCADIA
IMPACT
SCHEDULE
- High confidence intrusion assessment and
diagnosis. - Pre-planned automatic responses to contain and
recover from faults and attacks. - Automatic tradeoffs of security vs. service
level accessibility. - Learns to recognize and defeat novel attacks.
Demos
Thin slice demo
Mission Aware Demo
Learning Demo
DEC 04
APR 05
DEC 05
35The End
36How Scyllarus Intrusion Detection Works
Intrusion Reference Model
H1
H2
Intrusion in progress
Accidentally mis-configured application
Likely Security Situation
Hypotheses (Possible situations)
Intrusions Attacks
Audit report of communication attempt
Audit report of unauthorized user
Audit report of network probe
Dynamic Evidence Aggregator
37Sifting Key Events from Raw Reports
IDS-1
16,000 Raw Reports
Interesting events
Evidence Analysis
IDS-2
Clustering Reports into Events
Believable Interesting events
1000
10
4000
IDS-3
Uninteresting events
38Example of How Scyllarus Reduces Workload
39Controller Synthesis Module
- Controller Synthesis Module reasons about models
of goals, threats, cyberspace dynamics and
actions to derive new sets of control rules
online. - Timed automata models capture temporal
constraints, probabilities. - Game theoretic view plus time search for
controller automaton while projecting adversarys
moves. - Temporal reasoning derives requirements on
sensing/monitoring. - Formal methods verify controller behavior against
policy requirements.
40Controlled State Space Graph
- Considers different orders of attacker actions,
consistent with preconditions. - Factored, transition-based attacker model allows
CIRCADIA to generalize beyond single-path
characterization of a given attack script. - Includes sequences of CIRCADIA actions to prevent
further damage and recover from current
(non-goal) situations.
41Motivation
- Current computational mission (resources, tasks)
affects - Detection of attacks and failures.
- Appropriate responses.
- Existing intrusion detection and response does
not incorporate knowledge of mission. - Thesis mission awareness will enable
Self-Regenerative System behavior.
42Scyllarus
- A management and analysis system for network
security monitoring - Correlates reports from many disparate intrusion
detectors to provide information useful to
operating personnel or administrators. - Weighs evidence for/against intrusions to reduce
false alarms. - Assesses intrusion events for plausibility and
severity. - Discounts attacks against non-susceptible
targets. - Consolidates and retains all report data for
forensic investigation. - Maintains detector and system configuration
information.
43 Scyllarus Capability Summary
- Process reports from a variety of intrusion
detection sensors - Network, host, and hybrid.
- Commercial, open-source, research.
- Process substantial report volume thousands of
reports/hour. - Provide significant reductions in report volume
thousands -gt tens. - Monitor sizeable networks
- Up to 1000 nodes with one system.
- Cluster and correlate reports from multiple
sensors - More effective detection of stealthy attacks.
- Vast reduction in false alarms and noise.
- Categorize events for efficient review
- Plausibility, severity, utility of events.
- Discount attacks on unsusceptible targets.
- Retain events and reports in database for
forensic analysis.
44CIRCADIA
- Cooperative Intelligent Real-time Control
Architecture for Dynamic Information Assurance - Autonomic defense for computing resources.
- Adaptive monitoring.
- Real-time reactive control responses.
- Uses control-theoretic methods to automatically
synthesize its control strategies, rather than
relying on hand-built rules or other knowledge.
45Automatically Synthesizing Security Control
Systems
Computational mission services
Security Tradeoff Planner
Networks, computers
Controller Synthesis Module
Intrusion Assessment
Active Security Controller Executive
CIRCADIA
NEW IDEAS
IMPACT
- Use control theory to derive appropriate
response actions automatically. - Automatically tailor monitoring and responses
according to mission, available resources,
varying threats, and policies. - Reason explicitly about response time
requirements to provide performance guarantees.
- Automatic responses guaranteed to defeat
intruders in real-time. - System derives appropriate responses for novel
attack combinations. - Automatic tradeoffs of security and monitoring
vs. service and accessibility. - Easier to deploy maintain than manual rule
bases.
46CORTEX Advances (Beyond Scyllarus)
- Add mission modeling capability to form System
Reference Model. - Incorporate propagation models to represent
information flow and filtering components. - Enhance state assessment for mission awareness
- Mission affects expected sensor behavior.
- Mission affects criticality of failures and
attacks. - Bring state assessment fully online for soft
real-time performance. - Stretch Goal Retrospective revision of alerts
based on new information.
47CORTEX Advances (Beyond CIRCADIA)
- Automatically map System Reference Model elements
to planning problem for controller synthesis. - Develop new controller synthesis algorithms for
qualitative probabilistic models, based on local
search. - Develop meta-level control to focus and adjust
response planning algorithms based on mission
phasing and urgency of self-reconfiguration. - Interface to state assessment for real-time
response.
48CORTEX Advances (Learning)
- Adapt existing concept drift algorithms to update
surprise levels (qualitative probabilities)
within the threat models. - Adapt performance profiles within the Mission
models and Self (meta-level) models. - Develop strategies for preemptively testing
resource capacities based on mission, self, and
threat models. - Predict and test for failures and adapt before
they are critical.
49- (def-action rebuild-taster
- preconds ( (backup F) )
- postconds ( (backup T) )
- wcet 5
- cost 1
- )
- problem def
- (def-machine system-ops (query-arrives
- query-stale
- process
- )
- )
- (def-machine manage-system (send_to_learning_switc
h_tasterdb
50 cortex-taster.lisp (defun t1 () (load
"domains/taster/cortex-taster")) (set-verifier-mo
de meu) (set-search-mode forward) (setf
sim-maxtime 200) (setf max-utility
2000) (setf debug-list NIL) (pushnew top
debug-list) (pushnew csm debug-list) (pushnew
meu debug-list) (setf max-number-of-intermed
iate-plans-considered 10000) (setf
TEMPSWITCH-FIX-MC-SIM-CULPRIT-NO-OP-BUG
T) (setf store-all-improved-plans T) (setf
check-all-plans-diff T) (setf
backjump-if-inferior T) (setf
cautious-culprit-match T) (reset-randoms)
testing results stuff.... (setf omit-no-ops
nil) a first plan produced... (setf a (first
(last stored-plan-list))) (setf b (first
stored-plan-list)) (diff a b) (mapcar 'eu
stored-plan-list) (mapcar 'elapsed-time
stored-plan-list) (restore-stored-plan
a) (davinci-draw-sim-reachable-states) (restore-s
tored-plan b) (davinci-draw-sim-reachable-states)
(def-state scenario1-initial-state
features ((failure F) (query F)
(current T) backups are current
(taster T) taster is up (hb-sync T)
last query was good (backup T) backup
is up ) ) (def-temporal
query-arrives preconds ((query F))
postconds ((query T)) delay-distribution
(uniform-distribution 10 20) min-delay 10
) (def-temporal query-stale preconds
((query T)) postconds ((failure T))
delay-distribution (uniform-distribution 20 50)
min-delay 20 ) (def-reliable process
preconds ((taster T) (query T)) postconds (
(.5 (taster F) (query F) (hb-sync F))
(.5 (query F) (hb-sync T) (current F)))
delay-distribution (uniform-distribution 1 1)
delay (make-range 1 1) cost 1 )
manage tasters (def-ac
tion send-to-learning-switch-tasterdb
preconds ( (taster F) (backup T) )
postconds ( (taster T ) (backup F) ) wcet
1 cost 1 ) (def-action
replicate-to-tasters preconds ( (current F)
(taster T) (backup T)) postconds ( (current
T) ) wcet 1 cost 1 ) (def-action
rebuild-taster preconds ( (backup F) )
postconds ( (backup T) ) wcet 5 cost
1 ) problem def
(def-machine system-ops
(query-arrives query-stale process
) ) (def-machine manage-system
(send_to_learning_switch_tasterdb
replicate-to-tasters rebuild-taster
) ) (def-maintenance-goal dbcurrent
features ((current T)(taster T)(backup T))
features ((current T)) reward 10
) (def-problem cortex-taster version
"Revision 1.2 " machines (system-ops
manage-system ) initial-states
(scenario1-initial-state) transitions ()
goals (dbcurrent) ) (solve-problem
cortex-taster)
51GSMDP Solution Method
Continuous-time MDP
GSMDP
Discrete-time MDP
Discrete-time MDP
GSMDP
Continuous-time MDP
Phase-type distributions (approximation)
Uniformization (optional) Jensen 1953 Lippman
1975
MDP policy
GSMDP policy
Simulatephase transitions
52Continuous Phase-Type Distributions Neuts 1981
- Time to absorption in a continuous-time Markov
chain with n transient states
53Approximating GSMDP with Continuous-time MDP
- Approximate each distribution Ge with a
continuous phase-type distribution - Phases become part of state description
- Phases represent discretization into
random-length intervals of the time events have
been enabled
54Policy Execution
- The policy we obtain is a mapping from modified
state space to actions - To execute a policy we need to simulate phase
transitions - Times when action choice may change
- Triggering of actual event or action
- Simulated phase transition