Data Mining for Intrusion Detection

1 / 46

About This Presentation

Title:

Data Mining for Intrusion Detection

Description:

First use ripple down rules to overfit the data. Ripple down rules are often used ... M. Joshi, V. Kumar, CREDOS: Classification using Ripple Down Structure, ICDE 2003 ... – PowerPoint PPT presentation

Number of Views:329

Avg rating:3.0/5.0

Slides: 47

Provided by: Man849

more less

Transcript and Presenter's Notes

Title: Data Mining for Intrusion Detection

1
Data Mining for Intrusion Detection

Donghan Li
Alex Pivoshenko

2
Overview

Intrusion Detection
Datamining for Intrusion Detection
Dataming for Misuse Detection
Datamining for Anomaly Detection
Current Research Projects
Commercial Products
References

3
Intrusion Detection

Intrusions
DoS (Denial of Service)
Probing / Scanning
Compromises
Trojan horses / Worms
Why we need Intrusion Detection

4
Intrusion Detection Systems
5
IDS Taxonomy
6
Traditional IDS

Signature-based
Limitations
Revising signature database
Emerging cyber threats
Latency in deployment

7
Key Technical Challenges

Large data size
High Dimensionality
Temporal nature of the data
Skewed class distribution
Data Preprocessing
High Performance Computing

8
Data Mining for Intrusion Detection

Goal high detection rate and low false alarm
rate
Data Mining tries to address limitations and
challenges

9
Basic Steps in DM for ID

Convert data
Build Data Mining models
Analysis and summary

10
Feature Construction

Network traffic data is collected
Start time and duration
Protocol type
Source and Dest IP address and port, etc
KDDCup99 example
Content-based features
Time-based traffic features
Connection-based features

11
Data sets in Intrusion Detection

DARPA 1998
9 weeks of raw TCP dump data
Labeled connections
DARPA 1999
System call traces
Data set with virus files

12
Current ID Approaches

Misuse detection
Known intrusion patterns
Record patterns -gt monitor event sequences -gt
report matched events
Anomaly detection
Deviation from normal pattern
Normal behavior profiles -gt observe current
activity -gt report deviations

13
Data Mining for Misuse Detection

Rule based techniques
Tree based approaches
Association rules
Bayesian classifiers, genetic algorithms
Neural networks
Cost sensitive modeling

14
PN-rule Learning

N-phase
Remove FP from examples of P-phase
High accuracy and significant support

P-phase
Positive examples with good support
Seek good recall

15
Boosting based algorithms

RareBoost
Updates the weights differently
SMOTEBoost
Combination of SMOTE (Synthetic Minority
Oversampling Technique) and boosting

16
CREDOS

First use ripple down rules to overfit the data
Ripple down rules are often used
Then prune to improve generalization
Different mechanism from decision trees

17
Neural Networks

For host-based intrusion detection
Build user profiles
Build profiles of software behavior
For network-based intrusion detection
Hierarchical network intrusion detection
Multi-layer perceptrons (MLP)

18
Cost Sensitive Modeling

Detection rate / False Alarm rate may be
misleading
Cost factors damage cost, response cost,
operational cost
Costs for TP, FP, TN, FN
Define cumulative cost

19
Anomaly Detection

Normal Behavior ? Deviations ? Anomaly Behavior
Major approaches
Outlier detection
Profiling
Others
Two categories
Supervised
Unsupervised

20
Sample Data

MINDS 01/26/03
48 hours after the Slammer worm

21
Outlier Detection Schemes

Detect intrusions (data points) that are very
different from the normal activities (rest of
the data points)
General Steps
Identify normal behavior
Construct useful set of features
Define similarity function
Use outlier detection algorithm
Statistics based
Distance based
Model based

22
Statistics Based Outlier Detection

Data points are modeled using stochastic
distribution
Points are determined to be outliers depends on
their relationship with this model
Major approaches
Finite Mixtures
Using probability distribution
Information Theory measures
Problems
High dimensions ? difficult to estimate
distributions

23
Statistics Based Finite Mixture

Unsupervised Learning Algorithm
Data sources
Categorical (e.g. protocol, service)
Continuous (e.g. duration, src_byptes)
Construct FM model as a representation of
underlying mechanism of data generation
Assign score to new input based on how large the
model has changed

24
Statistics Based-Probability Distributions

Supervised Learning Algorithm
Basic assumption for training data
of normal elements gtgt of anomalies
Construct Probability Distribution
M majority distribution
A anomalous distribution
D(1-c)M cA
Measure the likelihood L(D) for real data

25
Statistics Based - Information Theory

Supervised Learning Algorithm
Entropy
Measure uncertainty/impurity of data
Smaller when the class distribution is skewer
Larger when data is partitioned into more regular
subsets
Anomaly detector sets entropy threshold
Below threshold ? potential intrusion
Smaller threshold ? More accurate
Conditional entropy H(XY)
How much uncertainty remains in sequence of events

26
Distance Based Outlier Detection

Represent data as a vector of features
Major approaches
Nearest neighbor based
Density based
Clustering based
Problem
High dimensionality of data

27
Distance Based Nearest Neighbor

Not enough neighbors ? Outliers
Compute distance d to the k-th nearest neighbor
Outlier points
Located in more sparse neighborhoods
Have d larger than a certain threashold
Mahalanobis-distance based approach
More appropriate for computing distance with
skewed distributions

28
Distance Based Density

Local Outlier Factor (LOF)
Average of the ratios of the density of example p
and the density of its nearest neighbors
Compute density of local neighborhood for each
point
Compute LOF
Larger LOF ? Outliers

29
Distance Based Clustering

Radius w of proximity is specified
Two points x1 and x2 are near if d(x1, x2)ltw
Define N(x) as number of points that are within w
of x
Points in small cluster ? Outliers
Fixed-width clustering for speedup

30
Distance Based - Clustering (cont.)

K-Nearst Neighbor Canopy Clustering
Compute sum of distances to k nearest neighbors
Small K-NN ? point in dense region
Canopy clustering for speedup
WaveCluster
Transform data into multidimensional signals
using wavelet transformation
Remove Hign/Low frequency parts
Remaining parts ? Outliers

31
Model Based Outlier Detection

Similar to Probabilistic Based schemes
Build prediction model for normal behavior
Deviation from model ? potential intrusion
Major approaches
Neural networks
Unsupervised Support Vector Machines (SVMs)

32
Model Based - Neural Networks

Use a replicator 4-layer feed-forward neural
network
Input variables are the target output during
training
RNN forms a compressed model for traning data
Outlyingness ? reconstruction error

33
Model Based - SVMs

Attempt to separate the entire set of training
data from the origin
Regions where most data lies are labeled as one
class

Parameters
Expected outlier rates
Good for high quality controlled training data
Variance of Radial Basis Function (RBF)
Larger ? higher detection rate and more false
alarm
Smaller ? lower detection rate and fewer false
alarm

34
Profiling Schemes

Profiling methods are usually applied to host
based intrusion detection where users, programs,
etc are profiled
Profiling sequences of Unix shell command lines
Profiling users behavior
Can also be used to profile alarms produced by
other ID methods
Reduce false positives

35
Profiling Temporal Sequence

Data
Sequence of Unix shell command lines
Set of sequences (user profiles) are reduced and
filtered to only critical commands
Build Instance Based Learning (IBL) model that
stores historic examples of normal data
Deviations ? Potential intrusions

36
Profiling Neural Networks

Modeling the behavior of individual users
Data
Audit logs for each user for several days
Form distribution vector
How often user executes each command
Train NN with these vectors
Identify whether the user is regular or illegal
for each new command distribution vector, I.e for
each new login session

37
Profiling NNs (cont.)

Similar techniques can be applied to profiling
software behavior in a system
Data sequence of system relevant system calls
Sum of NN output over certain threshold indicates
potential malicious software
Multi-level NNs architecture
Feature detection modules can be combined to meet
certain IDS need

38
Profiling Mining Alarms

Unusual but legitimate behaviors may trigger
alarms ? false positives
Overtime, false alarms can be modeled using IBL,
association rules, among other DM techniques
Can be used to improve the performance of a IDS
by reducing false alarms

39
Alternative Approaches

Artificial Anomalies Generation
For sparse regions of data generate more
artificial anomalies than for the dense data
regions
Filter artificial anomalies to avoid collision
with known instance
Use rule discovery systems (e.g. RIPPER) to form
anomaly signatures

40
Current Research Projects

ADAM (Audit Data Analysis and Mining) - GMU
MADAM ID (Mining Audit Data for Automated Models
for Intrusion Detection) - Columbia, GT, Florida
Tech.
MINDS - Univ. of Minnesota
IIDS (Intelligent Intrusion Detection) -
Mississippi State
DM for Network Intrusion Detection - MITRE Corp.
Agent based DM system - Iowa State
IDDM - Dept. of Defense, Australia

41
Commercial IDS

Misuse detection based
SNORT (open source NIDS based on signatures)
Network Flight Recorder (NFR, detect known
attacks)
NetRanger (CISCO, traffic analyzer)
Shadow (collect audit data and run tcmdump
filters)
P-Best (SRI, rule-based expert system)
NetStat (UCSB, real time IDS using state
transition analysis)

42
Commercial IDS (cont.)

Anomaly detection based
IDES, NIDES (statistical)
EMERALD (statistical)
SPADE (Statistical Packet Anomaly Detection
Engine) within SNORT
Computer Watch (ATT, expert system)
Wisdom Sense (rule based)

43
References

D. Barbara, et al., ADAM A Testbed for Exploring
the Use of Data Mining in Intrusion Detection.
SIGMOD Record 2001
M. Joshi, et al., Pnrule, Mining Needles in a
Haystack Classifying Rare Classes via Two-Phase
Rule Induction, ACM SIGMOD 2001
M. Joshi, et al, Predicting Rare Classes Can
Boosting Make Any Weak Learner Strong?, ACM
SIGKDD 2002
M. Joshi, V. Kumar, CREDOS Classification using
Ripple Down Structure, ICDE 2003
K. Yamanishi, On-line unsupervised outlier
detection using finite mixtures with discounting
learning algorithms, KDD 2000
W. Lee, et al, Information-Theoretic Measures for
Anomaly Detection, IEEE Symposium on Security 2001

44
References (cont.)

E. Eskin, Anomaly Detection over Noisy Data using
Learned Probability Distributions, ICML 2000
S. Ramaswamy, R. Rastogi, S. Kyuseok Efficit
Algorithms for Mining Outliers from Large Data
Sets, ACM SIGMOD 2000
A. Lazarevic, et al., A Comparative Study of
Anomaly Detection Schemes in Network Intrusion
Detection, SIAM 2003
E. Eskin et al., A Geometric Framework for
Unsupervised Anomaly Detection Detecting
Intrusions in Unlabeled Data, 2002
S. Hawkins, et al., Outlier detection using
replicator neural networks, DaWaK02 2002
A. Lazarevi, et al. DM for Intrusion Detection,
Tutorial on Pacific-Asia Conference on KDD 2003

45
Intrusion Detection Links

http//www.cs.umn.edu/aleks/intrusion_detection.h
tml
http//www.cc.gatech.edu/wenke/ids-readings.html
http//www.cerias.purdue.edu/coast/intrusion-detec
tion/welcome.html
http//www.cs.ucsb.edu/rsg/STAT/links.html
http//cnscenter.future.co.kr/security/ids.html
http//www.cs.purdue.edu/homes/clifton/cs590m/
http//dmoz.org/Computers/Security/Intrusion_Detec
tion_Systems/
http//www.networkice.com/Advice/Countermeasures/I
ntrusion_Detection/default.htm
http//www.infosyssec.net/infosyssec/intdet1.htm