Intrusion Detection Modeling Technique and Experiment Design - PowerPoint PPT Presentation

1 / 18

About This Presentation

Title:

Intrusion Detection Modeling Technique and Experiment Design

Description:

IDES flag observed activities that deviate significantly from the established ... IDIOT and STAT use patterns of well-known attacks or weak spots of the system ... – PowerPoint PPT presentation

Number of Views:67

Avg rating:3.0/5.0

Slides: 19

Provided by: ydo

Category:

more less

Transcript and Presenter's Notes

Title: Intrusion Detection Modeling Technique and Experiment Design

1
Intrusion Detection Modeling Technique and
Experiment Design

Yuhong Dong
ydong_at_cse.fau.edu
March 19, 2004

2
Table of Content

Review IDS system (anomaly detection and misuse
detection)
IDS Modeling Algorithm
-- Classification Modeling
-- Association Rule
-- Frequent Episode
Feature Construction
Experiments
Conclusion

3
Overview IDS system

Anomaly Detection System
IDES flag observed activities that deviate
significantly from the established normal usage
profiles
Misuse Detection System
IDIOT and STAT use patterns of well-known
attacks or weak spots of the system to match and
identify known intrusion, patterns or signatures

4
Building IDS is a hard work

System builders rely on their intrusion and
experience to select the statistical measures for
anomaly detection
Experts first analyze and categorize attack
scenarios and system vulnerabilities, and
hand-code the corresponding rules and patterns
for misuse detection.

5
Algorithm

Classification
maps a data item into one of several
predefined categories ( normal and abnormal)
--decision trees or rules
Link analysis
determines relations between fields in the
database records. Correlations of system features
in audit data.
-- A programmer, for example, may have
emacs highly associated with C file
Sequence analysis
models sequential patterns. These algorithms
can discover what time-based sequence of audit
events are frequently occurring together.
-- patterns from audit data containing
network-based denial-of-service(DOS) attacks
suggest that several per-host and per-service
measures should be included.

6
Classification Modeling normal / intrusion--
example of telnet records

Hot count of access of system
directory Compromised count of file/path not
found errors and Jump to instructions
7
Classification Modeling -- Example Ripper Rules
from Telnet Records
Ripper selects the unique feature values in
identifying the intrusions. These rules can be
first inspected and edited by security experts,
and then be incorporated into misuse detection
system. The accuracy of classification model
depends directly on the set of features provided
in the training data. For example, if the
features hot, compromised and root_shell were
removed from the records in the Table1, Ripper
would not be able to produce accurate rules to
identify buffer overflow connections
8
Association Rules
The goal of mining association rules is to derive
multifeature correlations from the database
table. Support(x) is defined as the percentage of
records that contain item set X. An association
rule is a set of item set X. An association rule
is an expression X-gtYc,s, ssupport(XUY) is the
support of the rule, and csupport(XUY)/support(X)
is the confidence.
9
Frequent Episodes

Given a set of time stamped event records, where
each record is a set of items, an intervalt1,t2
is the sequence of event records.
Support(x) is the ratio between the number of
minimum occurrences that contain X and the total
number of event records.
A frequent episode rule is the expression
X,Y-gtZ c,s,w ssupport(XUYUZ) is the
support of the rule, and csupport(XUYUZ)/support(
XUY) is the confidence, wt2-t1

10
Feature Construction

Conditions
--Network Intrusion Detection System
--Algorithm frequent episodes
--Pre-processing tcpdump data
Experiment
-- applying the frequent episodes program to
both normal connection data and intrusion data,
and compare the resulting patterns to find the
intrusion only patterns.
-- Then apply the algorithm to construct the
syn flood pattern, the result pattern a count
of connections to the same dst_host in the past 2
seconds, and among these connections, a
percentage of those that have the same service
and percentage of those that have the S0 flag.
Open problem
-- how to decide the right time window
value w.
-- how to select the appropriate feaures to
detect an intrusion
-- how to select the right axis and
reference features to generate the most
distinguishing and useful intrusion patterns

11
Experiments

The Data Resources DARPA data
-- Data Pre-processing
Misuse Detection
-- Manual and Automatic Feature Construction
-- Detection Models
-- Results
User Anomaly Detection
Conclusion and Future Directions

12
Experiment

Object of the Experiment
-- survey and evaluate the state of the art
in research in intrusion detection.
Procedure
-- Each participating site was required to
build intrusions detection models using the
training data, and send the results on the test
data back to DARPA for the performance
evaluation.
The DARPA data
-- 4 gigabytes of compressed tcpdump data of
7 weeks of network traffic.
-- This data can be processed into about 5
million of connection records of about 100 bytes
each.
-- the data contains content of every packet
transmitted between hosts inside and outside a
simulated military base.

13
Experiment

DARPA DATA ( continued)
Four main categories of attacks were
simulated
-- DOS, denial-of-service, for example, syn
flood
-- R2L, unauthorized access from a remote
machine, for example, guessing password
-- U2R, unauthorized access to local super
user privileges by a local unprivileged user,
buffer overflow attacks
-- Probing, surveillance and probing, for
example, port-scan, ping-sweep
Data Pre-processing each record includes these
intrinsic features
Misuse Detection Feature Construction and
Detection Models

14
Experiment Feature Construction Detection
Model

Detection Model
-- traffic model DOS and Probing attack
-- host-based traffic model slow Probing
attacks
-- content model R2L and U2R attack
Result

X-axis false alarm rate Y-axis detection rate
X-axis is the false alarm rate, calculated as
the percentage of normal connections classified
as an intrusion.
15
Experiment - Performance

This is an misuse detection system, it is better
performance for the known attack than unknown
attack. For al intrusions, an overall detection
rate of bellow 70 is hardly satisfactory in a
mission critical environment.

16
Experiment User Anomaly Detection

Initial exploratory approach is to mine the
frequent patterns from user command data, and
merge or add the patterns into an aggregate set
to form the normal usage profile of a user.
A new pattern can be merged with an old pattern
if they have the same left-hand-sides and
right-hand-sides, their support values are within
a 5 of each other, and their confidence values
are also within 5 of each other
To analyze a user login session, we mine the
frequent patterns from the sequence of commands
during this session. This new pattern set is
compared with the profile pattern set and a
similarity score is assigned. Assume that the new
set has n patterns and among them, there are m
patterns that have matches in the profile
pattern set, then the similarity score is simply
m/n, a higher similarity score means a higher
likelihood that the users behavior agrees with
his or her historical profile.

17
Conclusion and Future Directions

Data generated from network traffic monitoring
tends to have very high volume, dimensionality
and heterogeneity, and there is a need for high
performance modeling algorithms that will scale
to very large network traffic data sets.
Network data is temporal (streaming) in nature,
and development of algorithms for mining data
streams is necessary for building real-time
intrusion detection system.
Low frequency of computer attacks requires
modification of standard data mining algorithms
for their detection.
Cyber attacks may be launched from several
different locations and targeted to many
different destinations, thus creating a need to
analyze network data from several network
locations in order to detect these distributed
attacks.