Ensemble-based Adaptive Intrusion Detection - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

Ensemble-based Adaptive Intrusion Detection

Description:

and anomaly: some connection that is neither normal nor some known types of intrusions. ... based Artificial Anomaly (Fan et al, ... Anomaly Detection: ... – PowerPoint PPT presentation

Number of Views:54

Avg rating:3.0/5.0

Slides: 28

Provided by: www1CsC

Category:

more less

Transcript and Presenter's Notes

Title: Ensemble-based Adaptive Intrusion Detection

1
Ensemble-based Adaptive Intrusion Detection Wei
Fan IBM T.J.Watson Research Salvatore J.
Stolfo Columbia University
2
Data Mining for Intrusion Detection
Connection Records
(telnet, 10,3,...)
Feature Construction
(ftp,10,20,...)
Training Data
Label Existing Connections
Intrusion Detection Model
Inductive Learner
3
Some interesting requirements ... ...

New types of intrusions are constantly invented
by hackers.
Most recent coordinated attacks on many ebusiness
websites in 2000.
Hackers tend to use new types of intrusions that
intrusion detection system is unaware of or weak
at detecting them successfully.
Data mining for intrusion detection is a very
data-intensive process.
very large data
revolving patterns
real-time detection

4
Question

When new types of intrusions are invented, can we
quickly adapt our existing model to be able to
detect these new intrusions before they cause
more damages?
If we don't have a solution, the new attack will
make significant damage.
For this kind of problem, having a solution that
is not completely satisfactory is better than
having no solution.

5
Naive Approach - Complete Re-training
Existing Training Data
Merged Training Data
New Data
NEW Intrusion Detection Model
Inductive Learner
6
Problem with the Naive Approach

Since data (existing plus new) will be very
large, it takes a long time to compute a
detection model.
By the time, the model is constructed, the new
attack probably will have already made enough
damage to our system.

7
New Approach
New Data
NEW Model
Learner
Combined Model
Existing Model
Key point we only compute model from the data
on new types of intrusions only
8
How do we label connections?
a new connection
existing model
connection type unrecognized
NEW Model
normal or previously known intrusion types
normal or new intrusion types
9
Basic Idea

Existing model is built to identify THREE classes
normal
some type of intrusions
and anomaly some connection that is neither
normal nor some known types of intrusions.
anomaly detection - we use the artificial
anomaly generation method (Fan et al, ICDM 2001)

10
Anomaly Detection

Generate "artificial anomalies" from training
data similar to "near misses".
Artificial anomalies are data points that are
different from the training data.
The algorithm concentrates on feature values that
are infrequent in the training data.
Distribution-based Artificial Anomaly (Fan et al,
ICDM2001)

11
Four Configurations

H1(x) existing model.
H2(x) new model.
They differ in how H2(x) is computed.
and how H1(x) and H2(x) are combined
and how a connection is processed and classified.

12
Configuration I
13
Configuration II
14
Configuration III
15
Configuration IV
16
Experiment

1998 DARPA Intrusion Detection Evaluation Dataset
22 different types of intrusions.

17
Experiment

Sequence to introduce intrusions into the
training data to simulate new intrusions are
being invented and launched by hackers
22! unique sequences
we randomly used 3 unique sequences.
The results are averaged.
RIPPER
unordered rulesets

18
3 Unique Sequences
19
Measurements

All results on the new intrusion types
Precision
If I catch a potential thief, what is the
probability that it is a real thief?
Recall
What is the probability that real thieves are
detected?
Anomaly Detection Rate classified as anomaly
Other classified as other types of intrusions.

20
Precision Results
21
Recall Results
22
Anomaly Detection Rate
23
Other Detection Rate Results
24
Summary of results

The most accurate is Configuration 1 where
new model is trained from normal and the new
intrusion type
all predicted normal and anomalies by the old
model is examined by the new model.
Reason
Existing model's precision to detect normal
connection influences combined model's accuracy.
New data is limited in amount. Artificial
anomalies generated from new data is limited as
well.

25
Training Efficiency
26
Related Work (incomplete list)

Anomaly Detection
SRI's IDES use probability distribution of past
activities to measure abnormality of host events.
We measure network events.
Forrest et al uses absence of subsequence to
measure abnormality.
Lane and Brodley employ a similar approach but
use incremental learning approach to update
stored sequence from UNIX shell commands.
Ghosh and Schwarzbard use neural network to learn
profile of normality and distance function to
detect abnormality.
Generating Artificial Data
Nigam et al assign label to unlabelled data using
classifier trained from labeled data.
Chang and Lippman applied voice transformation
techniques to add artificial training talkers to
increase variability.
Multiple classifiers
Asker and Macline "Ensembles as a sequence of
classifiers"

27
Summary and Future Work

Proposed a two-step two classifier approach for
efficient training and fast model deployment.
Empirically tested in the intrusion detection
domain.
Need to test if it works well for other domains.

Write a Comment

User Comments (0)