Lucent Technologies - PowerPoint PPT Presentation

About This Presentation
Title:

Lucent Technologies

Description:

Web Application Studied. Front end JSP. Back end - MySql ... unigram: Probability of a user visiting a given page independent of previous page ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 24
Provided by: amies7
Category:

less

Transcript and Presenter's Notes

Title: Lucent Technologies


1
Learning Sequential Models for Detecting
Anomalous Protocol Usage (work in progress)
  • Lloyd Greenwald, Lucent Bell Labs

2
Machine Learning Algorithms for Surveillance and
Event Detection
  • Surveillance
  • Network traffic
  • Event Detection
  • Unknown vulnerability exploits using sequences
    of messages
  • Machine Learning Algorithms
  • Learning Markov models to capture recent
    sequential protocol usage

3
NIDS Monitors Traffic and Detects Events That
Violate Security Policy
(from Bro user manual)
4
Example Attack Sequence NIDS Evasion Attack
Fake missing packet (to cause buffering) Send two
interspersed sequences for same connection Even
with same ttls there is ambiguity with how end
systems will re-create sequence
(from Handley et. al. 01)
5
Example Attack Multi-Step
  • Apache/mod_ssl worm (aka Slapper)
  • Probe/scan target for vulnerability by sending
    HTTP GET request on tcp port 80 that violates 1.1
    standard
  • Response identifies server as Apache
  • Exploit for SSLv2-enabled OpenSSL 0.9.6d
    vulnerability sent to tcp port 443
  • Target sends traffic back to attacker on udp port
    2002
  • Target begins scanning for other vulnerable hosts

6
Technical Approach
  • Automatically build sequential models of recent
    protocol usage
  • Analyze models for common and uncommon sequences
  • Proactively exercise protocol implementation with
    uncommon sequences sampled from models
  • Reactively detect uncommon sequences
  • Build new defense policies for NIDS

7
Prior Work Machine Learning Algorithms for
Automated Test Case Generation
  • Surveillance
  • Web logs
  • Event Detection
  • Exercise errors in web applications
  • Machine Learning Algorithms
  • Learning Markov models to capture recent
    sequential web application usage

8
Prior Work Automated Test Case Generation
  • Leverage dynamic user information to
    automatically generate NEW test cases for web
    applications.

Session Data
Key contribution 1) sequential statistical
models built using machine learning techniques.
Key contribution 2) flexible test case
generation exploiting probabilistic sampling
methods.
9
Web Application Studied
  • Front end JSP
  • Back end - MySql
  • 10K lines of code, 118 methods, 12 classes
  • 123 user sessions (sequential application usage
    extracted from web log)
  • Question Can we build models that can be used
    to generate new, valid user sessions?

10
Building Markov Models From Web Logs
  • Extract User Sessions from Web Log
  • 12.3.40.65 GET index.jsp
  • 12.3.40.65 GET login.jsp
  • 12.3.40.65 GET /apps/bookstore/reg.jsp?member_log
    inhellomember_passwordworldmember_password2
    world
  • 12.3.40.65 GET myinfo.jsp
  • Control Model possible sequences of URLS that
    are visited
  • Data Model possible sets of parameter values
    (name-value pairs)

11
Control Models
  • unigram Probability of a user visiting a given
    page independent of previous page
  • P(currentPageX)

default
search
0.10
0.20
0.65
book Detail
register
0.05
12
Control Models
  • bigram Conditional probability of a user
    visiting a page, given the previous page
  • P(currentPageX lastPageY)

default
search
0.45
book Detail
register
13
Control Models
  • trigram Conditional probability of a user
    visiting a page, given the previous two pages
  • P(currentPageX lastPage1Y1, lastPage2Y2)

default
search
0.30
0.05
0.10
book Detail
register
0.55
14
Reliability vs. Discrimination
Greater discrimination (more context)
Greater reliability (more training data)
unigram
bigram
trigram
15
Data Models
  • simple P(valuesX currentPage Y)

important parameter
  • Books.do?category3BookDetail.do?category3itemI
    d8
  • Books.do?category3BookDetail.do?category3itemI
    d8

advanced P(valuesX lastPageimportantParamsY1
,currentPageY2)
16
Simple Data Model
Page1 http//decide.cs/bookstore/BookDetail.do?it
emId18
quantity99itemId36
Page2 http//decide.cs/bookstore/AddOrder.do?
17
Advanced Data Model
Page1 http//decide.cs/bookstore/BookDetail.do?it
emId18
quantity1itemId18
Page2 http//decide.cs/bookstore/AddOrder.do?
18
Generating Test Cases by Combining Control and
Data Models
  • Generate arbitrary queries about user sessions
    and use these queries to build test cases
  • What are the k most likely user sessions?
  • What are the k least likely user sessions?
  • Generate k user sessions randomly, according to
    the distribution represented in a web log.

19
Can our models be used to generate valid user
sessions?
20
Network Protocol Modeling Challenges
  • Using live network data instead of logs
  • Access to reconstructed traffic in both
    directions
  • Can build models using data from multiple
    machines (instead of web log from single server)
  • What are we generating?
  • Sequences of packets
  • Sequence of high-level events that can be turned
    into packets
  • What is a user session?
  • Single connection
  • Cluster connections from subset of 5-tuple
    (srcIP, dstIP, srcPort, dstPort, Protocol)
  • What are control and data models?
  • Can we generate valid new sequences?

21
Building Sequential Model to Discover NIDS
Evasion Attack
Control model sequence numbers Data model TTLs
and payload How hard is it to discover that this
pattern is uncommon ?
(from Handley et. al. 01)
22
Discussion
  • Are Markov models sufficient for this task? Too
    propositional?
  • Are data models too sparse? Are state spaces too
    large?
  • How hard is anomaly detection in this framework?
    What is a good definition for uncommon traffic
    that doesnt produce many false positives or
    false negatives? What about emerging new usage
    patterns? How to avoid training attacks?
  • How much protocol knowledge to use in building
    models?
  • Can signature matching events be used in data
    model?
  • Besides generating sequences, what other analyses
    can we perform? Entropy of models to determine
    level of history-dependence in traffic?

23
Related Work
  • Host-based and Network-base Intrusion Detection
    Systems (NIDS)
  • Signature-based anomaly detection -- manual
    analysis
  • Packet-based or with context detect known
    vulnerabilities and behaviors
  • Formal verification of protocols require
    extensive protocol knowledge do not account for
    implementation variations
  • Scrubbers and Normalizers remove TCP/IP
    ambiguities do not account for
    application-layer ambiguities and must make
    tradeoffs concerning removing ambiguities that
    change semantics or lead to performance loss
  • Fuzzing/Fault-injection random generation of
    inputs for vulnerability detection generates
    invalid sequences
Write a Comment
User Comments (0)
About PowerShow.com