Machine Learning for Network Anomaly Detection - PowerPoint PPT Presentation

About This Presentation
Title:

Machine Learning for Network Anomaly Detection

Description:

Machine Learning for Network Anomaly Detection Matt Mahoney Network Anomaly Detection Network Monitors traffic to protect connected hosts Anomaly Models ... – PowerPoint PPT presentation

Number of Views:117
Avg rating:3.0/5.0
Slides: 25
Provided by: MattM171
Category:

less

Transcript and Presenter's Notes

Title: Machine Learning for Network Anomaly Detection


1
Machine Learning for Network Anomaly Detection
  • Matt Mahoney

2
Network Anomaly Detection
  • Network Monitors traffic to protect connected
    hosts
  • Anomaly Models normal behavior to detect novel
    attacks (some false alarms)
  • Detection Was there an attack?

3
Host Based Methods
  • Virus Scanners
  • File System Integrity Checkers (Tripwire, DERBI)
  • Audit Logs
  • System Call Monitoring Self/Nonself (Forrest)

4
Network Based Methods
  • Firewalls
  • Signature Detection (SNORT, Bro)
  • Anomaly Detection (eBayes, NIDES, ADAM, SPADE)

5
User Modeling
  • Source address unauthorized users of
    authenticated services (telnet, ssh, pop3, imap)
  • Destination address IP scans
  • Destination port port scans

6
Frequency Based Models
  • Used by SPADE, ADAM, NIDES, eBayes, etc.
  • Anomaly score 1/P(event)
  • Event probabilities estimated by counting

7
Attacks on Public Services
  • PHF exploits a CGI script bug on older Apache
    web servers
  • GET /cgi-bin/phf?Qaliasx0a/usr
  • /bin/ypcat20passwd

8
Buffer Overflows
  • 1988 Morris Worm fingerd
  • 2003 SQL Sapphire Worm
  • char buf100
  • gets(buf)

buf
stack
Exploit code
0
100
Return Address
9
TCP/IP Denial of Service Attacks
  • Teardrop overlapping IP fragments
  • Ping of Death IP fragments reassemble to gt 64K
  • Dosnuke urgent data in NetBIOS packet
  • Land identical source and destination addresses

10
Protocol Modeling
  • Attacks exploit bugs
  • Bugs are most common in the least tested code
  • Most testing occurs after delivery
  • Therefore unusual data is more likely to be
    hostile

11
Protocol Models
  • PHAD, NETAD Packet Headers (Ethernet, IP, TCP,
    UDP, ICMP)
  • ALAD, LERAD Client TCP application payloads
    (HTTP, SMTP, FTP, )

12
Time Based Models
  • Training and test phases
  • Values never seen in training are suspicious
  • Score t/p tn/r where
  • t time since last anomaly
  • n number of training examples
  • r number of allowed values
  • p r/n fraction of values that are novel

13
Example tn/r
  • Training 0000111000 n/r 10/2
  • Testing 01223
  • 0 no score
  • 1 no score
  • 2 tn/r 6 x 10/2 30
  • 2 tn/r 1 x 10/2 5
  • 3 tn/r 1 x 10/2 5

14
PHAD Fixed Rules
  • 34 packet header fields
  • Ethernet (address, protocol)
  • IP (TOS, TTL, fragmentation, addresses)
  • TCP (options, flags, port numbers)
  • UDP (port numbers, checksum)
  • ICMP (type, code, checksum)
  • Global model

15
LERAD Learns conditional Rules
  • Models inbound client TCP (addresses, ports,
    flags, 8 words in payload)
  • Learns conditional rules
  • If port 80 then word1 GET, POST (n/r
    10000/2)

16
LERAD Rule Learning
Address Port Word1 Word2
Hume 80 GET /
Marx 80 GET /index.html
Marx 25 HELO Pascal
  • If word1 GET then port 80 (n/r 2/1)
  • word1 GET, HELO (n/r 3/2)
  • If address Marx then port 80, 25 (n/r 2/2)

17
LERAD Rule Learning
  • Randomly pick rules based on matching attributes
  • Select nonoverlapping rules with high n/r on a
    sample
  • Train on full training set (new n/r)
  • Discard rules that discover novel values in last
    10 of training (known false alarms)

18
DARPA/Lincoln Labs Evaluation
  • 1 week of attack-free training data
  • 2 weeks with 201 attacks

Internet
Router
Sniffer
Attacks
SunOS
Solaris
Linux
NT
19
Attacks out of 201 Detected at 10 False Alarms
per Day
20
Problems with Synthetic Traffic
  • Attributes are too predictable TTL, TOS, TCP
    options, TCP window size, HTTP, SMTP command
    formatting
  • Too few sources Client addresses, HTTP user
    agents, ssh versions
  • Too clean no checksum errors, fragmentation,
    garbage data in reserved fields, malformed
    commands

21
Real Traffic is Less Predictable
Real
r (Number of values)
Synthetic
Time
22
Mixed Traffic Fewer Detections, but More are
Legitimate
23
Project Status
  • Philip K. Chan Project Leader
  • Gaurav Tandon Applying LERAD to system call
    arguments
  • Rachna Vargiya Application payload tokenization
  • Mohammad Arshad Network traffic outlier
    analysis by clustering

24
Further Reading
  • Learning Nonstationary Models of Normal Network
    Traffic for Detecting Novel Attacks by Matthew V.
    Mahoney and Philip K. Chan, Proc. KDD.
  • Network Traffic Anomaly Detection Based on Packet
    Bytes by Matthew V. Mahoney, Proc. ACM-SAC.
  • http//cs.fit.edu/mmahoney/dist/
Write a Comment
User Comments (0)
About PowerShow.com