Title: Ensemble-based Adaptive Intrusion Detection
1Ensemble-based Adaptive Intrusion Detection Wei
Fan IBM T.J.Watson Research Salvatore J.
Stolfo Columbia University
2Data Mining for Intrusion Detection
Connection Records
(telnet, 10,3,...)
Feature Construction
(ftp,10,20,...)
Training Data
Label Existing Connections
Intrusion Detection Model
Inductive Learner
3Some interesting requirements ... ...
- New types of intrusions are constantly invented
by hackers. - Most recent coordinated attacks on many ebusiness
websites in 2000. - Hackers tend to use new types of intrusions that
intrusion detection system is unaware of or weak
at detecting them successfully. - Data mining for intrusion detection is a very
data-intensive process. - very large data
- revolving patterns
- real-time detection
4Question
- When new types of intrusions are invented, can we
quickly adapt our existing model to be able to
detect these new intrusions before they cause
more damages? - If we don't have a solution, the new attack will
make significant damage. - For this kind of problem, having a solution that
is not completely satisfactory is better than
having no solution.
5Naive Approach - Complete Re-training
Existing Training Data
Merged Training Data
New Data
NEW Intrusion Detection Model
Inductive Learner
6Problem with the Naive Approach
- Since data (existing plus new) will be very
large, it takes a long time to compute a
detection model. - By the time, the model is constructed, the new
attack probably will have already made enough
damage to our system.
7New Approach
New Data
NEW Model
Learner
Combined Model
Existing Model
Key point we only compute model from the data
on new types of intrusions only
8How do we label connections?
a new connection
existing model
connection type unrecognized
NEW Model
normal or previously known intrusion types
normal or new intrusion types
9Basic Idea
- Existing model is built to identify THREE classes
- normal
- some type of intrusions
- and anomaly some connection that is neither
normal nor some known types of intrusions. - anomaly detection - we use the artificial
anomaly generation method (Fan et al, ICDM 2001)
10Anomaly Detection
- Generate "artificial anomalies" from training
data similar to "near misses". - Artificial anomalies are data points that are
different from the training data. - The algorithm concentrates on feature values that
are infrequent in the training data. - Distribution-based Artificial Anomaly (Fan et al,
ICDM2001)
11Four Configurations
- H1(x) existing model.
- H2(x) new model.
- They differ in how H2(x) is computed.
- and how H1(x) and H2(x) are combined
- and how a connection is processed and classified.
12Configuration I
13Configuration II
14Configuration III
15Configuration IV
16Experiment
- 1998 DARPA Intrusion Detection Evaluation Dataset
- 22 different types of intrusions.
17Experiment
- Sequence to introduce intrusions into the
training data to simulate new intrusions are
being invented and launched by hackers - 22! unique sequences
- we randomly used 3 unique sequences.
- The results are averaged.
- RIPPER
- unordered rulesets
183 Unique Sequences
19Measurements
- All results on the new intrusion types
- Precision
- If I catch a potential thief, what is the
probability that it is a real thief? - Recall
- What is the probability that real thieves are
detected? - Anomaly Detection Rate classified as anomaly
- Other classified as other types of intrusions.
20Precision Results
21Recall Results
22Anomaly Detection Rate
23Other Detection Rate Results
24Summary of results
- The most accurate is Configuration 1 where
- new model is trained from normal and the new
intrusion type - all predicted normal and anomalies by the old
model is examined by the new model. - Reason
- Existing model's precision to detect normal
connection influences combined model's accuracy. - New data is limited in amount. Artificial
anomalies generated from new data is limited as
well.
25Training Efficiency
26Related Work (incomplete list)
- Anomaly Detection
- SRI's IDES use probability distribution of past
activities to measure abnormality of host events.
We measure network events. - Forrest et al uses absence of subsequence to
measure abnormality. - Lane and Brodley employ a similar approach but
use incremental learning approach to update
stored sequence from UNIX shell commands. - Ghosh and Schwarzbard use neural network to learn
profile of normality and distance function to
detect abnormality. - Generating Artificial Data
- Nigam et al assign label to unlabelled data using
classifier trained from labeled data. - Chang and Lippman applied voice transformation
techniques to add artificial training talkers to
increase variability. - Multiple classifiers
- Asker and Macline "Ensembles as a sequence of
classifiers"
27Summary and Future Work
- Proposed a two-step two classifier approach for
efficient training and fast model deployment. - Empirically tested in the intrusion detection
domain. - Need to test if it works well for other domains.