Wei Fan - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

Wei Fan

Description:

Panda is a very 'sparse' bear in the bear class. ... body, black eye shades, black legs and weight 200lb, THEN it is (panda) bear ... – PowerPoint PPT presentation

Number of Views:53

Avg rating:3.0/5.0

Slides: 23

Provided by: Wei74

Category:

Tags: fan | wei

more less

Transcript and Presenter's Notes

Title: Wei Fan

1
Using Artificial Anomalies to Detect Known and
Unknown Network Intrusions
Wei Fan IBM Research Matt Miller, Sal
Stolfo Columbia University Wenke Lee Georgia
Tech Philip Chan Florida Tech December 1,
2001
2
Anomaly Detection and Classification

Differences
Classification system builds models to detect
repeated patterns of known event types.
Anomaly detection tracks inconsistencies
deviating from "known" and "expected".
Examples Intrusion Detection Systems
Misuse detection detects known intrusion types.
Anomaly detection detects network events
different from normal events and known
intrusions. Anomalies are likely to be newly
launched intrusions.
Training Data
Classification clearly labeled examples.
Anomaly Detection no labeled anomalous data.
Otherwise, they won't be anomalies.

3
Problem

Problem How to use Inductive Learning for
anomaly detection?
Wide range of inductive learners available.
Comprehensible models.
Solution Compute artificial anomaly data from
classification training data to convert anomaly
detection into classification.
All artificial anomalies will be assigned to the
label anomaly.
For example, use normal and known intrusion data
to compute artificial anomalies.

4
Some Observations on Inductive Learning Algorithms

Only discover boundaries to separate data with
different given labels.
Data of unknown types will always be
misclassified as one of the given classes.
Example
How to distinguish a bear and a cat?
An inductive model might be
If Weight(x) lt 5lb, x is a cat otherwise, it
is a bear.
However,
If x is a horse, the model will mistakenly
predict it to be a bear.
The ideal answer would be "I don't know. It is
neither bear nor cat."

5
Solution Summary

Generate artificial anomalies with label name
"anomaly" to delineate the boundary between
known and unknown.
How to generate artificial anomalies? Compute
examples that are close to but different from
those with given labels.
Where to place the artificial anomalies?
Put more artificial anomalies around infrequent
examples or sparse regions in the training data.

6
Digging into Dataset

Assuming that the boundary (between known and
unknown) is close to the known data.
Randomly change the value of one feature of a
given datum while leaving the other features
unaltered.
Concentrate on areas in the training data that
are "sparse".
Sparse regions are characterized by infrequent
feature values.
Example
Panda is a very "sparse" bear in the bear class.
Panda has white body, black eye shades and black
legs.
Generate more artificial anomalies around sparse
regions.
Something with white body, white eyes shades,
black legs and a weight above 200 lb. are
anomalies.
Based on the frequency of feature values, we
compensate sparse regions by filling in more
artificial anomalies.

7
Overall Effect

Sparse regions will be focused and very specific
rules will be generated to cover these regions.
For example,
IF white body, black eye shades, black legs and
weight gt 200lb, THEN it is (panda) bear
ELSE try other rules
ELSE IF none of the rules are satisfied
THEN predict "We don't know what it is based on
our limited knowledge"
Without the artificial anomalies, the animal with
"white body, white eye shades and black legs"
will be misclassified as a bear at its best.

8
Distribution-based Artificial Anomaly Generation
Algorithm

Iterate through every feature value..
fmax is the most frequenct feature value of
feature F.
count(fmax) is its frequency count.
fi is another feature value of the same feature
F.
count(fi) is its frequency count.
countdiff count(fmax) - count(fi).
generate countdiff number of artificial
anomalies for feature value fi.
For a data whose feature F has value fi.
Change its value fi to any value that is not fi
while leaving the other all other features
unchanged.
Change its label to "anomaly".

9
Application of Artificial Anomaly

Pure Anomaly Detection
Training data have only one class, such as
normal.
Detect any data that are different from the given
single class.
Artificial anomalies are computed from this
single class.
Combined Misuse and Anomaly Detection
Classification and anomaly are performed in the
same time.
For example, detects Bear, Cat and non-bear and
non-cat in the same time
Efficient since both classification and anomaly
detection are done in the same time.
One single module.
Efficient Model Deployment.

10
Experiment on Pure Anomaly Detection

Measurements
False alarm rate predicted anomalies that are
actually normal.
Detection rate true anomalies correctly
detected.
Original Dataset
1998 DARPA Intrusion Detection Evaluation Dataset
(also 1999 KDDCUP dataset).
Original dataset contains both normal data and
intrusion data.
There are 4 basic types of intrusions. Each type
has a few subclasses.
U2R User to Root
R2L Remote to Local
DOS Denial of Services
PRB Probing

11
Intrusions and Categories
12
Experiment Setup

Training Set normal data and artificial
anomalies computed from normal data. No
intrusions are included.
Test Set both normal and all intrusion data.
Goal can we detect all intrusion data as
"anomalies" without having them in the training
data?
Learner
RIPPER inductive tree learner.

13
Pure Anomaly Result

False Alarm Rate 2
Anomaly Detection Rate

14
Experiment on Combined Misuse and Anomaly
Detection

One single module that detects both known
intrusions and unknown events that are neither
normal nor intrusions.
Group different types of intrusions into 13
clusters.
Similar intrusions are grouped together.
Knowledge of intrusions of one cluster may not
help detect intrusions of another cluster.

15
Experiment Setup

Training Set
normal data plus a few clusters of intrusions
PLUS artificial anomalies.
Test Set
all data normal and all intrusions
Goal
Can we detect unseen intrusions (excluded
intrusion clusters) as anomalies?
Do we have to compromise performance to detect
known intrusions (included intrusion clusters)?
There are 13! ways to choose combinations of
training and test sets.
We use 3 unique sequences to introduce intrusion
clusters.
Add one cluster at a time.
Training normal Cluster 1 to i
Testing normal and all types of intrusions.

16
Measurements

True Class Detection Rate
Anomaly Detection Rate

17
True Class Detection Rate

Intrusion i correctly detected as intrusion i.

18
Anomaly Detection Rate

Percentage of anomaly or unknown intrusions
correctly detected as anomaly.

19
Efficient Model Deployment (summary)

Efficient learning and deployment of models to
detect new attacks.
When data about new attacks are collected, we do
not want to retrain the model for all intrusions
from scratch.
Instead, we only train a lightweight model to
detect the new attacks.
Using artificial anomaly, the older model can
detect anomalies.
The new lightweight model and older model are
combined together to detect both new and existing
attacks.
When an event is detected as anomaly, it will be
sent to the new light weight model to check if it
is the new attack or just anomaly.
Experiments show that accuracy remains unchanged,
but efficiency is 150 times faster.

20
Many Related Work (incomplete list)

Anomaly Detection
SRI's IDES use probability distribution of past
activities to measure abnormality of host events.
We measure network events.
Forrest et al uses absence of subsequence to
measure abnormality.
Lane and Brodley employ a similar approach but
use incremental learning approach to update
stored sequence from UNIX shell commands.
Ghosh and Schwarzbard use neural network to learn
profile of normality and distance function to
detect abnormality.
Generating Artificial Data
Nigam et al assign label to unlabelled data using
classifier trained from labeled data.
Chang and Lippman applied voice transformation
techniques to add artificial training talkers to
increase variability.

21
Summary and Future Work

Proposed a feature value distribution-based
artificial anomaly generation algorithm.
Applied this algorithm for both pure anomaly and
combined misuse and anomaly detection for
intrusion detection.
It remains to be seen if the same approach works
for other domains,

22
Distribution Based Artificial Anomaly

Write a Comment

User Comments (0)