Machine learning in IDS - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

Machine learning in IDS

Description:

Utilized the Solaris SHIELD Basic Security Module (BSM) for user audit data. Perl script parsed the BSM data into separate audit files for four different users ... – PowerPoint PPT presentation

Number of Views:44

Avg rating:3.0/5.0

Slides: 24

Provided by: Rav26

Category:

more less

Transcript and Presenter's Notes

Title: Machine learning in IDS

1
Machine learning in IDS

March 15, 2004

2
Source Papers

T. Lane and C. E. Brodley An application of
machine learning to anomaly detection, NIST-NCSC
National Information Systems Security Conference,
1997
J. Ryan, M. Lin, R. Miikkulainen Intrusion
Detection with Neural Networks, MIT Press, 1998
A. K. Ghosh, A. Schwatzbard and M. Shatz Learning
Program Behavior Profiles for Intrusion
Detection, USENIX Workshop on Intrusion Detection
and Network Monitoring, 1999
D. Endler Intrusion detection Applying machine
learning to solaris audit data, ACSAC'98

3
Two Major Approaches

Misuse detection define intrusions ahead of
time and watch for their occurrence
Can detect well-known attacks via patterns
Future attacks cannot be preemptively detected
Anomaly detection detect behavior that deviates
from normal system use
Learn a normal system activity profile
Can abstract information about normal behavior to
detect attacks

4
Basic Terminology

Concept Drift behavioral changes undergone by
valid users during normal use
On-line systems
Run in real-time with users
Computationally expensive
Off-line systems
Run against stored user data at a scheduled time
Cannot respond in real-time

5
Paper 1

IDS must learn characteristic sequences of
actions
These sequences differ on a per-user basis
Characteristic differences between these
sequences differentiate valid users from
intruders
Use the sequence as the fundamental unit of
comparison
Omit filenames for privacy and focus on behavior
instead of content

6
Paper 1

Parse the command stream into a token stream
gt ls laf
gt cd /tmp
gt gunzip c foo.tar.gz (cd \ tar xf -)
becomes
ls laf cd lt1gt gunzip c lt1gt ( cd lt1gt tar -
lt1gt )
This token stream is stored in the dictionary,
along with a similarity measure and a set of
system parameters

7
Paper 1

Compute a numerical similarity measure for pairs
of sequences that have close resemblance

8
Paper 1

Collected data from four users
Experimented with different analysis methods
Sequence length had a major effect on accuracy
Dictionary must be kept small to avoid false
positives, and for performance reasons
The problem of informed, malicious users
The system performed well, some caveats
No concept drift
Novice users

9
Paper 2

Describes the NNID (Neural Network Intrusion
Detector)
Works off-line, identifies behavior using the
distribution of commands a user executes
Selected 100 commands to describe the users
behavior

10
Paper 2

A machine was selected that had 10 users, for a
total of 89 user-days
The network was trained on 8 randomly chosen days
of data and then tested against the remaining 4
days of data
Two separate tests were run
Identifying remaining vectors
Identifying randomly-generated vectors

11
Paper 2

Identified user vectors 93 of the time
False alarm rate of 7
Rejected 63 of the random user vectors
Had an anomaly detection rate of 96
All the false alarms were the same user, and were
attributed to lack of data

12
Paper 2

Overall, the system was a success
How well does the system scale with more users?
To what extent does user behavior change over
time?

13
Paper 3

Three algorithms were experimented with
Table lookup
Backpropagation network
Elman network
These three algorithms range from memorization to
generalization

14
Paper 3

Equality matching is simple but effective
Data is partitioned into fixed-size windows
For analysis, data is compared to a ROC (Receiver
Operating Characteristics) curve
This curve is essentially an intrusive measure
that calculates the probability of intrusion

15
Paper 3

A backpropagation network attempts to learn from
network behavior
Multiple networks were trained for each program,
and the best was kept
Networks were fed random data to generalize
everything as anomalous
Allows single anomalies, but recognizes sequences
of anomalies

16
Paper 3

An Elman network can recognize recurrent features
in the input
Perform classification of short sequences of
events as they occur within a larger stream of
events
The Elman network was the least tuned, but most
successful

17
Paper 3

Overall results

18
Paper 4

Utilized the Solaris SHIELD Basic Security Module
(BSM) for user audit data
Perl script parsed the BSM data into separate
audit files for four different users

19
Paper 4

Testing data consisted of normal sessions,
interspersed with simulated account break-ins
Number of signal features was reduced to 13 from
488
Ideal window size was determined to be 6

20
Paper 4
21
Paper 4

Ultimately, the best solution was a combination
of both anomaly and misuse detection

22
Common Problems

If an intruder can breach the system during the
learning phase, the system can learn the
malicious behavior
All tests were performed against low user numbers
No real-world testing was performed

23
Summary

Creating system usage fingerprints is a valid
methodology for IDS
Systems can be run both on-line and off-line
depending on the configuration needed
Real-world testing required before implementation

Write a Comment

User Comments (0)