Seurat: A Pointillist Approach to Anomaly Detection - PowerPoint PPT Presentation

About This Presentation
Title:

Seurat: A Pointillist Approach to Anomaly Detection

Description:

Cluster points to detect anomaly. Abnormal changes will stand out ... Distributed architecture and collective information for anomaly detection ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 28
Provided by: yxie4
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Seurat: A Pointillist Approach to Anomaly Detection


1
Seurat A Pointillist Approach to Anomaly
Detection
  • Yinglian Xie
  • Hyang-Ah Kim, David OHallaron,
  • Michael K. Reiter, Hui Zhang
  • Carnegie Mellon University

2
Is My Computer Compromised?
  • Signature detection
  • Require known attack signatures
  • Anomaly detection
  • Alarm if deviation from a normal behavior model
  • Define the normal behavior
  • Rule-based Approach
  • Manual rule specification time consuming
  • Learning-based Approach
  • High false positive rates

3
An Important Observation
  • Some host state changes have locality
  • Hosts have similar configurations in a network
    system
  • Spatial locality
  • Similar updates tend to occur across many hosts
  • E.g., administrative updates, worm/virus
    propagation
  • Temporal locality
  • Similar updates tend to occur closely in time
  • E.g., Virus, worms usually propagate quickly
  • Attacks that might be difficult to detect on a
    single host might be easier to detect across many
    hosts.

4
Key Idea Correlate Host State Changes Across
Space and Time
  • Unexpected coincident changes across multiple
    hosts
  • Identical changes to a known compromised host

x
x
x
5
Our Goal
  • Automatic detection of aggregated anomalous
    events in a domain
  • Require no foreknowledge of normal state changes
  • Require no system-specific knowledge

6
The Pointillist Approach
  • Represent host state transitions as feature
    vectors
  • A feature an attribute of certain changes
  • Plot vectors as points in a high dimensional
    space
  • Point patterns reflect the aggregated
  • host activities
  • Cluster points to detect anomaly
  • Abnormal changes will stand out
  • when correlated with other points
  • Seurat --- our prototype system!

7
An Example
  • 2-dimensional feature vectors for host state
    changes
  • Normal stage changes roughly cluster together
  • A new cluster anomaly

Normal cluster
8
Attack Models
  • Anomalous event
  • An unexpected state change close in time across
    multiple hosts in a network system
  • Focus
  • Attacks/events that take place at multiple hosts
    at a time
  • Internet worms, virus, zombies
  • Administrative updates

9
Host State Representation
  • File system updates
  • 83 of intrusions result in file insertion,
    detection, and modifications (Pennington et al
    2003)
  • Used by many other security tools (e.g.,
    Tripwire)
  • File representation
  • The complete path name

10
Outline
  • Motivation
  • Overview of our approach
  • Algorithm details
  • Feature vector definition
  • Clustering
  • Evaluation
  • Related work
  • Conclusion

11
Windows for Correlation
  • Detection window period for anomaly detection
  • One day for current prototype
  • Comparison window period to look back for
    comparison
  • Correlation window detection window comparison
    window

Comparison Window
Detection Window
day j
day j-1
day j-2

day j-t1
day j-t
Correlation Window
12
Feature Vector Space
  • A binary feature vector Hij ltv1, v2, , vk,
    ..., vlgt
  • Corresponds to the file update status at host i
    on day j
  • m hosts and n days m x n vectors
  • Focus on files updated by at least two hosts
  • An example

13
Feature Selection
  • Reduce dimensionality for better clustering
    results
  • Wavelet-based selection
  • Principal component analysis (PCA)

14
Wavelet-based Feature Selection
  • Idea detect which file updates are suspicious
  • For each file
  • 1. Construct a time series signal S cA cD
  • 2. On day i if S(i) cA(i-1) gt ?, select this
    file

15
Principal Component Analysis (PCA)
  • Idea identify correlations of file updates
  • Principal components an ordered set of
    orthogonal vectors
  • Given the feature vectors
  • 1. Find a subspace defined by
  • a few principal components
  • 2. Project vectors onto the
  • PCA selected subspace

16
Anomaly Detection by Clustering
  • Cluster points based on cosine distances
  • A simple iterative clustering algorithm
  • Other clustering algorithms (e.g., K-means)
  • Raise an alarm if detect a new cluster
  • New cluster consist of multiple vectors from
    only the detection window
  • Root cause diagnosis
  • Suspicious hosts whose vectors fall into the new
    cluster
  • Suspicious files identified by wavelet-based
    selection

17
Outline
  • Motivation
  • Overview of our approach
  • Algorithm details
  • Feature vector definition
  • Clustering
  • Evaluation
  • Related work
  • Conclusion

18
The Prototype Implementation
  • A multi-platform prototype system
  • A data collection tool a correlation module
  • Scanning host file updates daily (Windows and
    Linux)
  • Deployed in a real teaching cluster (22 hosts)
  • Reports uploaded to a centralized server

19
Effectiveness of Feature Selection
Average 2300
Average 140
Average 17
  • Dimensionality reduced by 3 orders of magnitude.

Average 2
20
False Positives
  • A total of 9 alarms in 60 days
  • 6 due to system re-configurations
  • 3 due to network-wide experiments by student

01-21-04
new cluster!
new cluster!
01-21-04
01-22-04
No new cluster!
21
Detection Rate
  • Simulated attacks
  • Artificial file insertion
  • Real worm file modification
  • Detection rate sensitive to
  • Number of files modified
  • Number of hosts infected

22
Real Worm Detection
  • Test the effectiveness using Lion worm
  • Manually launch the worm in an isolated cluster
  • Merge file update reports with reports from real
    deployment
  • Identified 22 files as the root cause

Abnormal updates!
02-11-04
02-12-04
02-10-04
23
Outline
  • Motivation
  • Overview of our approach
  • Algorithm details
  • Feature vector definition
  • Clustering
  • Evaluation
  • Related work
  • Conclusion

24
Related Work
  • File system updates for intrusion detection
  • Tripwire, AIDE, Samhain
  • Pennington 2003
  • Distributed architecture and collective
    information for anomaly detection
  • GrIDS, CSM, DIDS, Emerald, AAFID
  • Correlation-based anomaly detection
  • Heterogeneous sensors to reduce false positives
  • System configuration files for trouble shooting
    (Strider)

25
Future Work
  • Real time monitoring
  • Faster detection speed
  • Different representations of host state changes
  • Detect more stealthy attacks
  • Distributed correlation algorithms
  • A more scalable system

26
Conclusion
  • Clustering based correlation can automatically
    identify file update anomalies in a network
    system
  • No prior knowledge of normal host states
  • No system specific rules

27
More Information and Available Software
  • http//www.cs.cmu.edu/seurat
  • Thank you!
Write a Comment
User Comments (0)
About PowerShow.com