Title: Wei Fan
1Using Artificial Anomalies to Detect Known and
Unknown Network Intrusions
Wei Fan IBM Research Matt Miller, Sal
Stolfo Columbia University Wenke Lee Georgia
Tech Philip Chan Florida Tech December 1,
2001
2Anomaly Detection and Classification
- Differences
- Classification system builds models to detect
repeated patterns of known event types. - Anomaly detection tracks inconsistencies
deviating from "known" and "expected". - Examples Intrusion Detection Systems
- Misuse detection detects known intrusion types.
- Anomaly detection detects network events
different from normal events and known
intrusions. Anomalies are likely to be newly
launched intrusions. - Training Data
- Classification clearly labeled examples.
- Anomaly Detection no labeled anomalous data.
Otherwise, they won't be anomalies.
3Problem
- Problem How to use Inductive Learning for
anomaly detection? - Wide range of inductive learners available.
- Comprehensible models.
- Solution Compute artificial anomaly data from
classification training data to convert anomaly
detection into classification. - All artificial anomalies will be assigned to the
label anomaly. - For example, use normal and known intrusion data
to compute artificial anomalies.
4Some Observations on Inductive Learning Algorithms
- Only discover boundaries to separate data with
different given labels. - Data of unknown types will always be
misclassified as one of the given classes. - Example
- How to distinguish a bear and a cat?
- An inductive model might be
- If Weight(x) lt 5lb, x is a cat otherwise, it
is a bear. - However,
- If x is a horse, the model will mistakenly
predict it to be a bear. - The ideal answer would be "I don't know. It is
neither bear nor cat."
5Solution Summary
- Generate artificial anomalies with label name
"anomaly" to delineate the boundary between
known and unknown. - How to generate artificial anomalies? Compute
examples that are close to but different from
those with given labels. - Where to place the artificial anomalies?
- Put more artificial anomalies around infrequent
examples or sparse regions in the training data.
6Digging into Dataset
- Assuming that the boundary (between known and
unknown) is close to the known data. - Randomly change the value of one feature of a
given datum while leaving the other features
unaltered. - Concentrate on areas in the training data that
are "sparse". - Sparse regions are characterized by infrequent
feature values. - Example
- Panda is a very "sparse" bear in the bear class.
- Panda has white body, black eye shades and black
legs. - Generate more artificial anomalies around sparse
regions. - Something with white body, white eyes shades,
black legs and a weight above 200 lb. are
anomalies. - Based on the frequency of feature values, we
compensate sparse regions by filling in more
artificial anomalies.
7Overall Effect
- Sparse regions will be focused and very specific
rules will be generated to cover these regions. - For example,
- IF white body, black eye shades, black legs and
weight gt 200lb, THEN it is (panda) bear - ELSE try other rules
- ELSE IF none of the rules are satisfied
- THEN predict "We don't know what it is based on
our limited knowledge" - Without the artificial anomalies, the animal with
"white body, white eye shades and black legs"
will be misclassified as a bear at its best.
8Distribution-based Artificial Anomaly Generation
Algorithm
- Iterate through every feature value..
- fmax is the most frequenct feature value of
feature F. - count(fmax) is its frequency count.
- fi is another feature value of the same feature
F. - count(fi) is its frequency count.
- countdiff count(fmax) - count(fi).
- generate countdiff number of artificial
anomalies for feature value fi. - For a data whose feature F has value fi.
- Change its value fi to any value that is not fi
while leaving the other all other features
unchanged. - Change its label to "anomaly".
9Application of Artificial Anomaly
- Pure Anomaly Detection
- Training data have only one class, such as
normal. - Detect any data that are different from the given
single class. - Artificial anomalies are computed from this
single class. - Combined Misuse and Anomaly Detection
- Classification and anomaly are performed in the
same time. - For example, detects Bear, Cat and non-bear and
non-cat in the same time - Efficient since both classification and anomaly
detection are done in the same time. - One single module.
- Efficient Model Deployment.
10Experiment on Pure Anomaly Detection
- Measurements
- False alarm rate predicted anomalies that are
actually normal. - Detection rate true anomalies correctly
detected. - Original Dataset
- 1998 DARPA Intrusion Detection Evaluation Dataset
(also 1999 KDDCUP dataset). - Original dataset contains both normal data and
intrusion data. - There are 4 basic types of intrusions. Each type
has a few subclasses. - U2R User to Root
- R2L Remote to Local
- DOS Denial of Services
- PRB Probing
11Intrusions and Categories
12Experiment Setup
- Training Set normal data and artificial
anomalies computed from normal data. No
intrusions are included. - Test Set both normal and all intrusion data.
- Goal can we detect all intrusion data as
"anomalies" without having them in the training
data? - Learner
- RIPPER inductive tree learner.
13Pure Anomaly Result
- False Alarm Rate 2
- Anomaly Detection Rate
14Experiment on Combined Misuse and Anomaly
Detection
- One single module that detects both known
intrusions and unknown events that are neither
normal nor intrusions. - Group different types of intrusions into 13
clusters. - Similar intrusions are grouped together.
- Knowledge of intrusions of one cluster may not
help detect intrusions of another cluster.
15Experiment Setup
- Training Set
- normal data plus a few clusters of intrusions
PLUS artificial anomalies. - Test Set
- all data normal and all intrusions
- Goal
- Can we detect unseen intrusions (excluded
intrusion clusters) as anomalies? - Do we have to compromise performance to detect
known intrusions (included intrusion clusters)? - There are 13! ways to choose combinations of
training and test sets. - We use 3 unique sequences to introduce intrusion
clusters. - Add one cluster at a time.
- Training normal Cluster 1 to i
- Testing normal and all types of intrusions.
16Measurements
- True Class Detection Rate
- Anomaly Detection Rate
17True Class Detection Rate
- Intrusion i correctly detected as intrusion i.
18Anomaly Detection Rate
- Percentage of anomaly or unknown intrusions
correctly detected as anomaly.
19Efficient Model Deployment (summary)
- Efficient learning and deployment of models to
detect new attacks. - When data about new attacks are collected, we do
not want to retrain the model for all intrusions
from scratch. - Instead, we only train a lightweight model to
detect the new attacks. - Using artificial anomaly, the older model can
detect anomalies. - The new lightweight model and older model are
combined together to detect both new and existing
attacks. - When an event is detected as anomaly, it will be
sent to the new light weight model to check if it
is the new attack or just anomaly. - Experiments show that accuracy remains unchanged,
but efficiency is 150 times faster.
20Many Related Work (incomplete list)
- Anomaly Detection
- SRI's IDES use probability distribution of past
activities to measure abnormality of host events.
We measure network events. - Forrest et al uses absence of subsequence to
measure abnormality. - Lane and Brodley employ a similar approach but
use incremental learning approach to update
stored sequence from UNIX shell commands. - Ghosh and Schwarzbard use neural network to learn
profile of normality and distance function to
detect abnormality. - Generating Artificial Data
- Nigam et al assign label to unlabelled data using
classifier trained from labeled data. - Chang and Lippman applied voice transformation
techniques to add artificial training talkers to
increase variability.
21Summary and Future Work
- Proposed a feature value distribution-based
artificial anomaly generation algorithm. - Applied this algorithm for both pure anomaly and
combined misuse and anomaly detection for
intrusion detection. - It remains to be seen if the same approach works
for other domains,
22Distribution Based Artificial Anomaly