Title: Data Mining for Intrusion Detection
1Data Mining for Intrusion Detection
- Donghan Li
- Alex Pivoshenko
2Overview
- Intrusion Detection
- Datamining for Intrusion Detection
- Dataming for Misuse Detection
- Datamining for Anomaly Detection
- Current Research Projects
- Commercial Products
- References
3Intrusion Detection
- Intrusions
- DoS (Denial of Service)
- Probing / Scanning
- Compromises
- Trojan horses / Worms
- Why we need Intrusion Detection
4Intrusion Detection Systems
5IDS Taxonomy
6Traditional IDS
- Signature-based
- Limitations
- Revising signature database
- Emerging cyber threats
- Latency in deployment
7Key Technical Challenges
- Large data size
- High Dimensionality
- Temporal nature of the data
- Skewed class distribution
- Data Preprocessing
- High Performance Computing
8Data Mining for Intrusion Detection
- Goal high detection rate and low false alarm
rate - Data Mining tries to address limitations and
challenges
9Basic Steps in DM for ID
- Convert data
- Build Data Mining models
- Analysis and summary
10Feature Construction
- Network traffic data is collected
- Start time and duration
- Protocol type
- Source and Dest IP address and port, etc
- KDDCup99 example
- Content-based features
- Time-based traffic features
- Connection-based features
11Data sets in Intrusion Detection
- DARPA 1998
- 9 weeks of raw TCP dump data
- Labeled connections
- DARPA 1999
- System call traces
- Data set with virus files
12Current ID Approaches
- Misuse detection
- Known intrusion patterns
- Record patterns -gt monitor event sequences -gt
report matched events - Anomaly detection
- Deviation from normal pattern
- Normal behavior profiles -gt observe current
activity -gt report deviations
13Data Mining for Misuse Detection
- Rule based techniques
- Tree based approaches
- Association rules
- Bayesian classifiers, genetic algorithms
- Neural networks
- Cost sensitive modeling
14PN-rule Learning
- N-phase
- Remove FP from examples of P-phase
- High accuracy and significant support
- P-phase
- Positive examples with good support
- Seek good recall
15Boosting based algorithms
- RareBoost
- Updates the weights differently
- SMOTEBoost
- Combination of SMOTE (Synthetic Minority
Oversampling Technique) and boosting
16CREDOS
- First use ripple down rules to overfit the data
- Ripple down rules are often used
- Then prune to improve generalization
- Different mechanism from decision trees
17Neural Networks
- For host-based intrusion detection
- Build user profiles
- Build profiles of software behavior
- For network-based intrusion detection
- Hierarchical network intrusion detection
- Multi-layer perceptrons (MLP)
18Cost Sensitive Modeling
- Detection rate / False Alarm rate may be
misleading - Cost factors damage cost, response cost,
operational cost - Costs for TP, FP, TN, FN
- Define cumulative cost
19Anomaly Detection
- Normal Behavior ? Deviations ? Anomaly Behavior
- Major approaches
- Outlier detection
- Profiling
- Others
- Two categories
- Supervised
- Unsupervised
20Sample Data
- MINDS 01/26/03
- 48 hours after the Slammer worm
21Outlier Detection Schemes
- Detect intrusions (data points) that are very
different from the normal activities (rest of
the data points) - General Steps
- Identify normal behavior
- Construct useful set of features
- Define similarity function
- Use outlier detection algorithm
- Statistics based
- Distance based
- Model based
22Statistics Based Outlier Detection
- Data points are modeled using stochastic
distribution - Points are determined to be outliers depends on
their relationship with this model - Major approaches
- Finite Mixtures
- Using probability distribution
- Information Theory measures
- Problems
- High dimensions ? difficult to estimate
distributions
23Statistics Based Finite Mixture
- Unsupervised Learning Algorithm
- Data sources
- Categorical (e.g. protocol, service)
- Continuous (e.g. duration, src_byptes)
- Construct FM model as a representation of
underlying mechanism of data generation - Assign score to new input based on how large the
model has changed
24Statistics Based-Probability Distributions
- Supervised Learning Algorithm
- Basic assumption for training data
- of normal elements gtgt of anomalies
- Construct Probability Distribution
- M majority distribution
- A anomalous distribution
- D(1-c)M cA
- Measure the likelihood L(D) for real data
25Statistics Based - Information Theory
- Supervised Learning Algorithm
- Entropy
- Measure uncertainty/impurity of data
- Smaller when the class distribution is skewer
- Larger when data is partitioned into more regular
subsets - Anomaly detector sets entropy threshold
- Below threshold ? potential intrusion
- Smaller threshold ? More accurate
- Conditional entropy H(XY)
- How much uncertainty remains in sequence of events
26Distance Based Outlier Detection
- Represent data as a vector of features
- Major approaches
- Nearest neighbor based
- Density based
- Clustering based
- Problem
- High dimensionality of data
27Distance Based Nearest Neighbor
- Not enough neighbors ? Outliers
- Compute distance d to the k-th nearest neighbor
- Outlier points
- Located in more sparse neighborhoods
- Have d larger than a certain threashold
- Mahalanobis-distance based approach
- More appropriate for computing distance with
skewed distributions
28Distance Based Density
- Local Outlier Factor (LOF)
- Average of the ratios of the density of example p
and the density of its nearest neighbors - Compute density of local neighborhood for each
point - Compute LOF
- Larger LOF ? Outliers
29Distance Based Clustering
- Radius w of proximity is specified
- Two points x1 and x2 are near if d(x1, x2)ltw
- Define N(x) as number of points that are within w
of x - Points in small cluster ? Outliers
- Fixed-width clustering for speedup
30Distance Based - Clustering (cont.)
- K-Nearst Neighbor Canopy Clustering
- Compute sum of distances to k nearest neighbors
- Small K-NN ? point in dense region
- Canopy clustering for speedup
- WaveCluster
- Transform data into multidimensional signals
using wavelet transformation - Remove Hign/Low frequency parts
- Remaining parts ? Outliers
31Model Based Outlier Detection
- Similar to Probabilistic Based schemes
- Build prediction model for normal behavior
- Deviation from model ? potential intrusion
- Major approaches
- Neural networks
- Unsupervised Support Vector Machines (SVMs)
32Model Based - Neural Networks
- Use a replicator 4-layer feed-forward neural
network - Input variables are the target output during
training - RNN forms a compressed model for traning data
- Outlyingness ? reconstruction error
33Model Based - SVMs
- Attempt to separate the entire set of training
data from the origin - Regions where most data lies are labeled as one
class
- Parameters
- Expected outlier rates
- Good for high quality controlled training data
- Variance of Radial Basis Function (RBF)
- Larger ? higher detection rate and more false
alarm - Smaller ? lower detection rate and fewer false
alarm
34Profiling Schemes
- Profiling methods are usually applied to host
based intrusion detection where users, programs,
etc are profiled - Profiling sequences of Unix shell command lines
- Profiling users behavior
- Can also be used to profile alarms produced by
other ID methods - Reduce false positives
35Profiling Temporal Sequence
- Data
- Sequence of Unix shell command lines
- Set of sequences (user profiles) are reduced and
filtered to only critical commands - Build Instance Based Learning (IBL) model that
stores historic examples of normal data - Deviations ? Potential intrusions
36Profiling Neural Networks
- Modeling the behavior of individual users
- Data
- Audit logs for each user for several days
- Form distribution vector
- How often user executes each command
- Train NN with these vectors
- Identify whether the user is regular or illegal
for each new command distribution vector, I.e for
each new login session
37Profiling NNs (cont.)
- Similar techniques can be applied to profiling
software behavior in a system - Data sequence of system relevant system calls
- Sum of NN output over certain threshold indicates
potential malicious software - Multi-level NNs architecture
- Feature detection modules can be combined to meet
certain IDS need
38Profiling Mining Alarms
- Unusual but legitimate behaviors may trigger
alarms ? false positives - Overtime, false alarms can be modeled using IBL,
association rules, among other DM techniques - Can be used to improve the performance of a IDS
by reducing false alarms
39Alternative Approaches
- Artificial Anomalies Generation
- For sparse regions of data generate more
artificial anomalies than for the dense data
regions - Filter artificial anomalies to avoid collision
with known instance - Use rule discovery systems (e.g. RIPPER) to form
anomaly signatures
40Current Research Projects
- ADAM (Audit Data Analysis and Mining) - GMU
- MADAM ID (Mining Audit Data for Automated Models
for Intrusion Detection) - Columbia, GT, Florida
Tech. - MINDS - Univ. of Minnesota
- IIDS (Intelligent Intrusion Detection) -
Mississippi State - DM for Network Intrusion Detection - MITRE Corp.
- Agent based DM system - Iowa State
- IDDM - Dept. of Defense, Australia
41Commercial IDS
- Misuse detection based
- SNORT (open source NIDS based on signatures)
- Network Flight Recorder (NFR, detect known
attacks) - NetRanger (CISCO, traffic analyzer)
- Shadow (collect audit data and run tcmdump
filters) - P-Best (SRI, rule-based expert system)
- NetStat (UCSB, real time IDS using state
transition analysis)
42Commercial IDS (cont.)
- Anomaly detection based
- IDES, NIDES (statistical)
- EMERALD (statistical)
- SPADE (Statistical Packet Anomaly Detection
Engine) within SNORT - Computer Watch (ATT, expert system)
- Wisdom Sense (rule based)
43References
- D. Barbara, et al., ADAM A Testbed for Exploring
the Use of Data Mining in Intrusion Detection.
SIGMOD Record 2001 - M. Joshi, et al., Pnrule, Mining Needles in a
Haystack Classifying Rare Classes via Two-Phase
Rule Induction, ACM SIGMOD 2001 - M. Joshi, et al, Predicting Rare Classes Can
Boosting Make Any Weak Learner Strong?, ACM
SIGKDD 2002 - M. Joshi, V. Kumar, CREDOS Classification using
Ripple Down Structure, ICDE 2003 - K. Yamanishi, On-line unsupervised outlier
detection using finite mixtures with discounting
learning algorithms, KDD 2000 - W. Lee, et al, Information-Theoretic Measures for
Anomaly Detection, IEEE Symposium on Security 2001
44References (cont.)
- E. Eskin, Anomaly Detection over Noisy Data using
Learned Probability Distributions, ICML 2000 - S. Ramaswamy, R. Rastogi, S. Kyuseok Efficit
Algorithms for Mining Outliers from Large Data
Sets, ACM SIGMOD 2000 - A. Lazarevic, et al., A Comparative Study of
Anomaly Detection Schemes in Network Intrusion
Detection, SIAM 2003 - E. Eskin et al., A Geometric Framework for
Unsupervised Anomaly Detection Detecting
Intrusions in Unlabeled Data, 2002 - S. Hawkins, et al., Outlier detection using
replicator neural networks, DaWaK02 2002 - A. Lazarevi, et al. DM for Intrusion Detection,
Tutorial on Pacific-Asia Conference on KDD 2003
45Intrusion Detection Links
- http//www.cs.umn.edu/aleks/intrusion_detection.h
tml - http//www.cc.gatech.edu/wenke/ids-readings.html
- http//www.cerias.purdue.edu/coast/intrusion-detec
tion/welcome.html - http//www.cs.ucsb.edu/rsg/STAT/links.html
- http//cnscenter.future.co.kr/security/ids.html
- http//www.cs.purdue.edu/homes/clifton/cs590m/
- http//dmoz.org/Computers/Security/Intrusion_Detec
tion_Systems/ - http//www.networkice.com/Advice/Countermeasures/I
ntrusion_Detection/default.htm - http//www.infosyssec.net/infosyssec/intdet1.htm
46Questions?