Lucent Technologies - PowerPoint PPT Presentation

About This Presentation

Title:

Lucent Technologies

Description:

Web Application Studied. Front end JSP. Back end - MySql ... unigram: Probability of a user visiting a given page independent of previous page ... – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 24

Provided by: amies7

Learn more at: https://web.engr.oregonstate.edu

Category:

more less

Transcript and Presenter's Notes

Title: Lucent Technologies

1
Learning Sequential Models for Detecting
Anomalous Protocol Usage (work in progress)

Lloyd Greenwald, Lucent Bell Labs

2
Machine Learning Algorithms for Surveillance and
Event Detection

Surveillance
Network traffic
Event Detection
Unknown vulnerability exploits using sequences
of messages
Machine Learning Algorithms
Learning Markov models to capture recent
sequential protocol usage

3
NIDS Monitors Traffic and Detects Events That
Violate Security Policy
(from Bro user manual)
4
Example Attack Sequence NIDS Evasion Attack
Fake missing packet (to cause buffering) Send two
interspersed sequences for same connection Even
with same ttls there is ambiguity with how end
systems will re-create sequence
(from Handley et. al. 01)
5
Example Attack Multi-Step

Apache/mod_ssl worm (aka Slapper)
Probe/scan target for vulnerability by sending
HTTP GET request on tcp port 80 that violates 1.1
standard
Response identifies server as Apache
Exploit for SSLv2-enabled OpenSSL 0.9.6d
vulnerability sent to tcp port 443
Target sends traffic back to attacker on udp port
2002
Target begins scanning for other vulnerable hosts

6
Technical Approach

Automatically build sequential models of recent
protocol usage
Analyze models for common and uncommon sequences
Proactively exercise protocol implementation with
uncommon sequences sampled from models
Reactively detect uncommon sequences
Build new defense policies for NIDS

7
Prior Work Machine Learning Algorithms for
Automated Test Case Generation

Surveillance
Web logs
Event Detection
Exercise errors in web applications
Machine Learning Algorithms
Learning Markov models to capture recent
sequential web application usage

8
Prior Work Automated Test Case Generation

Leverage dynamic user information to
automatically generate NEW test cases for web
applications.

Session Data
Key contribution 1) sequential statistical
models built using machine learning techniques.
Key contribution 2) flexible test case
generation exploiting probabilistic sampling
methods.
9
Web Application Studied

Front end JSP
Back end - MySql
10K lines of code, 118 methods, 12 classes
123 user sessions (sequential application usage
extracted from web log)
Question Can we build models that can be used
to generate new, valid user sessions?

10
Building Markov Models From Web Logs

Extract User Sessions from Web Log
12.3.40.65 GET index.jsp
12.3.40.65 GET login.jsp
12.3.40.65 GET /apps/bookstore/reg.jsp?member_log
inhellomember_passwordworldmember_password2
world
12.3.40.65 GET myinfo.jsp
Control Model possible sequences of URLS that
are visited
Data Model possible sets of parameter values
(name-value pairs)

11
Control Models

unigram Probability of a user visiting a given
page independent of previous page
P(currentPageX)

default
search
0.10
0.20
0.65
book Detail
register
0.05
12
Control Models

bigram Conditional probability of a user
visiting a page, given the previous page
P(currentPageX lastPageY)

default
search
0.45
book Detail
register
13
Control Models

trigram Conditional probability of a user
visiting a page, given the previous two pages
P(currentPageX lastPage1Y1, lastPage2Y2)

default
search
0.30
0.05
0.10
book Detail
register
0.55
14
Reliability vs. Discrimination
Greater discrimination (more context)
Greater reliability (more training data)
unigram
bigram
trigram
15
Data Models

simple P(valuesX currentPage Y)

important parameter

Books.do?category3BookDetail.do?category3itemI
d8

Books.do?category3BookDetail.do?category3itemI
d8

advanced P(valuesX lastPageimportantParamsY1
,currentPageY2)
16
Simple Data Model
Page1 http//decide.cs/bookstore/BookDetail.do?it
emId18
quantity99itemId36
Page2 http//decide.cs/bookstore/AddOrder.do?
17
Advanced Data Model
Page1 http//decide.cs/bookstore/BookDetail.do?it
emId18
quantity1itemId18
Page2 http//decide.cs/bookstore/AddOrder.do?
18
Generating Test Cases by Combining Control and
Data Models

Generate arbitrary queries about user sessions
and use these queries to build test cases
What are the k most likely user sessions?
What are the k least likely user sessions?
Generate k user sessions randomly, according to
the distribution represented in a web log.

19
Can our models be used to generate valid user
sessions?
20
Network Protocol Modeling Challenges

Using live network data instead of logs
Access to reconstructed traffic in both
directions
Can build models using data from multiple
machines (instead of web log from single server)
What are we generating?
Sequences of packets
Sequence of high-level events that can be turned
into packets
What is a user session?
Single connection
Cluster connections from subset of 5-tuple
(srcIP, dstIP, srcPort, dstPort, Protocol)
What are control and data models?
Can we generate valid new sequences?

21
Building Sequential Model to Discover NIDS
Evasion Attack
Control model sequence numbers Data model TTLs
and payload How hard is it to discover that this
pattern is uncommon ?
(from Handley et. al. 01)
22
Discussion

Are Markov models sufficient for this task? Too
propositional?
Are data models too sparse? Are state spaces too
large?
How hard is anomaly detection in this framework?
What is a good definition for uncommon traffic
that doesnt produce many false positives or
false negatives? What about emerging new usage
patterns? How to avoid training attacks?
How much protocol knowledge to use in building
models?
Can signature matching events be used in data
model?
Besides generating sequences, what other analyses
can we perform? Entropy of models to determine
level of history-dependence in traffic?

23
Related Work

Host-based and Network-base Intrusion Detection
Systems (NIDS)
Signature-based anomaly detection -- manual
analysis
Packet-based or with context detect known
vulnerabilities and behaviors
Formal verification of protocols require
extensive protocol knowledge do not account for
implementation variations
Scrubbers and Normalizers remove TCP/IP
ambiguities do not account for
application-layer ambiguities and must make
tradeoffs concerning removing ambiguities that
change semantics or lead to performance loss
Fuzzing/Fault-injection random generation of
inputs for vulnerability detection generates
invalid sequences