Machine Learning for Network Anomaly Detection

About This Presentation

Title:

Machine Learning for Network Anomaly Detection

Description:

Machine Learning for Network Anomaly Detection Matt Mahoney Network Anomaly Detection Network Monitors traffic to protect connected hosts Anomaly Models ... – PowerPoint PPT presentation

Number of Views:117

Avg rating:3.0/5.0

Slides: 25

Provided by: MattM171

Category:

more less

Transcript and Presenter's Notes

Title: Machine Learning for Network Anomaly Detection

1
Machine Learning for Network Anomaly Detection

Matt Mahoney

2
Network Anomaly Detection

Network Monitors traffic to protect connected
hosts
Anomaly Models normal behavior to detect novel
attacks (some false alarms)
Detection Was there an attack?

3
Host Based Methods

Virus Scanners
File System Integrity Checkers (Tripwire, DERBI)
Audit Logs
System Call Monitoring Self/Nonself (Forrest)

4
Network Based Methods

Firewalls
Signature Detection (SNORT, Bro)
Anomaly Detection (eBayes, NIDES, ADAM, SPADE)

5
User Modeling

Source address unauthorized users of
authenticated services (telnet, ssh, pop3, imap)
Destination address IP scans
Destination port port scans

6
Frequency Based Models

Used by SPADE, ADAM, NIDES, eBayes, etc.
Anomaly score 1/P(event)
Event probabilities estimated by counting

7
Attacks on Public Services

PHF exploits a CGI script bug on older Apache
web servers
GET /cgi-bin/phf?Qaliasx0a/usr
/bin/ypcat20passwd

8
Buffer Overflows

1988 Morris Worm fingerd
2003 SQL Sapphire Worm
char buf100
gets(buf)

buf
stack
Exploit code
0
100
Return Address
9
TCP/IP Denial of Service Attacks

Teardrop overlapping IP fragments
Ping of Death IP fragments reassemble to gt 64K
Dosnuke urgent data in NetBIOS packet
Land identical source and destination addresses

10
Protocol Modeling

Attacks exploit bugs
Bugs are most common in the least tested code
Most testing occurs after delivery
Therefore unusual data is more likely to be
hostile

11
Protocol Models

PHAD, NETAD Packet Headers (Ethernet, IP, TCP,
UDP, ICMP)
ALAD, LERAD Client TCP application payloads
(HTTP, SMTP, FTP, )

12
Time Based Models

Training and test phases
Values never seen in training are suspicious
Score t/p tn/r where
t time since last anomaly
n number of training examples
r number of allowed values
p r/n fraction of values that are novel

13
Example tn/r

Training 0000111000 n/r 10/2
Testing 01223
0 no score
1 no score
2 tn/r 6 x 10/2 30
2 tn/r 1 x 10/2 5
3 tn/r 1 x 10/2 5

14
PHAD Fixed Rules

34 packet header fields
Ethernet (address, protocol)
IP (TOS, TTL, fragmentation, addresses)
TCP (options, flags, port numbers)
UDP (port numbers, checksum)
ICMP (type, code, checksum)
Global model

15
LERAD Learns conditional Rules

Models inbound client TCP (addresses, ports,
flags, 8 words in payload)
Learns conditional rules
If port 80 then word1 GET, POST (n/r
10000/2)

16
LERAD Rule Learning
Address Port Word1 Word2
Hume 80 GET /
Marx 80 GET /index.html
Marx 25 HELO Pascal

If word1 GET then port 80 (n/r 2/1)
word1 GET, HELO (n/r 3/2)
If address Marx then port 80, 25 (n/r 2/2)

17
LERAD Rule Learning

Randomly pick rules based on matching attributes
Select nonoverlapping rules with high n/r on a
sample
Train on full training set (new n/r)
Discard rules that discover novel values in last
10 of training (known false alarms)

18
DARPA/Lincoln Labs Evaluation

1 week of attack-free training data
2 weeks with 201 attacks

Internet
Router
Sniffer
Attacks
SunOS
Solaris
Linux
NT
19
Attacks out of 201 Detected at 10 False Alarms
per Day
20
Problems with Synthetic Traffic

Attributes are too predictable TTL, TOS, TCP
options, TCP window size, HTTP, SMTP command
formatting
Too few sources Client addresses, HTTP user
agents, ssh versions
Too clean no checksum errors, fragmentation,
garbage data in reserved fields, malformed
commands

21
Real Traffic is Less Predictable
Real
r (Number of values)
Synthetic
Time
22
Mixed Traffic Fewer Detections, but More are
Legitimate
23
Project Status

Philip K. Chan Project Leader
Gaurav Tandon Applying LERAD to system call
arguments
Rachna Vargiya Application payload tokenization
Mohammad Arshad Network traffic outlier
analysis by clustering

24
Further Reading

Learning Nonstationary Models of Normal Network
Traffic for Detecting Novel Attacks by Matthew V.
Mahoney and Philip K. Chan, Proc. KDD.
Network Traffic Anomaly Detection Based on Packet
Bytes by Matthew V. Mahoney, Proc. ACM-SAC.
http//cs.fit.edu/mmahoney/dist/

Write a Comment

User Comments (0)