Title: Distributed Detection of NetworkWide Traffic Anomalies
1Distributed Detection of Network-Wide Traffic
Anomalies
- Ling Huang XuanLong Nguyen
- Minos Garofalakis Joe Hellerstein
- Michael Jordan Anthony Joseph Nina
Taft - UC Berkeley Intel Research
2Outline
- Introduction motivation
- The centralized detection algorithm
- The distributed online detection
- Summary
3Introduction Network Monitoring
- Large-scale network monitoring and intrusion
detection systems - Distributed and collaborative monitoring boxes
- Continuously generating time series data
- Existing research focuses on data
streaming - Collect and aggregate network state
- for correlation and trend analysis
- Well suited to answeringapproximate queries and
continuously recording system state
Operation Center
4Moving Towards Distr. Online Detection
- Existing streaming protocols are inadequate for
detection purpose - Support limited functions SUM, AVG, MIN,
- Always e-approximation system state
- Wasting resource if applications only care 0-1
information - Monitoring systems call for a online detection
component Ankur04 - Maintain system-wide logical predicates/invariants
- Detect and react to constraint violations/system
anomalies
5Anomaly Detection
- Transformed data exceeding a threshold
- SUM gt threshold
- (Prediction based on normal pattern incoming
pattern) gt threshold - Energy of residual in PCA trans. gt threshold
-
6Online Distributed Detection
- Is hard
- Making decision based on incomplete information
- Guarantee detection accuracy while minimizing
communication overhead - The effort towards this direction
- Approximation of sum of distributed data streams
by Olston et al. SIGMOD 03 - Detection of sum of distributed data streams
exceeding a threshold by Keralapura et al.
SIGMOD 06 - Centralized PCA for network anomaly detection by
Lakhina et al. SIGCOMM 04 - We are studying distributed online PCA for
network anomaly detection
7Detection of Network-wide Anomalies
- A volume anomaly is a sudden change in an
Origin-Destination flow (i.e., point to point
traffic) - Given link traffic measurements, diagnose the
volume anomalies in flows
The backbone network
Regional network 1
Regional network 2
8An Illustration
Finding common patterns (e.g. volume anomalies)
from such high-dimensional, noisy data is very
difficult!
Key observations 1) links have common dominant
pattern 2) anomalies are small but correlated
9The Centralized Algorithm (Lakhina et al. 2004)
10A Geometric Illustration
Traffic on Link 2
Traffic on Link 1
11The Subspace Method
- Principal Components Analysis (PCA) An approach
to separate normal from anomalous traffic - Normal Subspace space spanned by the first
k principal components - Anomalous Subspace space spanned by the
remaining principal components - Then, decompose traffic on all links by
projecting onto and to obtain
12Detection Illustration
13The Centralized Algorithm
- Data matrix Y
- 1) Each link produces a column of m data over
time. - 2) n links produces a row data y at each time
instance.
The detection is
Periodically (e.g. once a week)
Operation center
14The Distributed Detection
15An Typical Distributed Detection System
- A set of distributed monitors
- Each produces a time series signals
- Send processed signals to coordinator
- No communication among monitors
- A coordinator X
- Is aggregation, correlation and
coordination center - Performs detection
- Informs monitors the
level of accuracy for signal updates
16The Online Detection Procedure
17The Distributed Detection Framework
User inputs
Distr. Monitors
Anomaly
Coordinator
Originalmonitoredtime series
Processedtime series
18Solution Overview
- Minimize communication cost by
- Having monitors send as few updates as possible
- Carefully managing the discrepancy between the
coordinators view of the global state and the
actual global state - Providing the coordinator with an accurate enough
view so that it detects anomalies with prescribed
accuracy - Key idea
- Filter monitored signal, dont send an update
unless surprising change has occurred - Coordinator informs monitors when and to what
extent they should process local signal.
19The Protocol At Monitors
- Each monitor updates information to
coordinator if its incoming signal - where (filtering slacks) are
adaptively computed by the coordinator - can be based on any prediction model for
node behavior over time - e.g., the average of last 5 signal values
observed locally at
20The Protocol At The Coordinator
- No update, do nothing
- When updates come in
- Update
- Compute new
- Perform detection using
21The Tradeoff
- The bigger , the less communication.
- How to parameterize (local thresholds)?
- How are they related to detection accuracy?
22The Analysis
- Let ,
- is a matrix of filtering errors
- are distributed
- Assume each element is independently generated
from a symmetric distribution with mean 0 and
variance - Simple examples are Uniform or Gaussian
distribution
23The Root Mean Square of Eigen Error
Theorem
24Evaluation
- Given a tolerable error in eigen values, we can
determine system parameters - Using system parameters, we can evaluate
detection accuracy using simulation - Using tolerable error in eigen values, we can
up-bound the false alarm rate with theory - Experiment setup
- Abilene backbone network data
- Traffic matrices of size 1008 X 41
- Set uniform slack for all monitors
25Results
0.05
70 synthetic anomalies in total
0.04
0.1
26Summary
- High detection accuracy with low overhead
- Open Framework
- Local decision rule
- Filtering, prediction, adaptive learning,
- Correlation functions
- Sum, PCA, Sequential Analysis,
- Constraint definition
- Fixed value, threshold function, advanced
statistics, - Adaptive system
- Coordinator acts as a correlation, detection and
parameter tuning point
27Questions and Future Work
http//www.cs.berkeley.edu/hling/
hling_at_cs.berkeley.edu
28Backup Slides
29The Centralized Algorithm
- Capture size of vector using squared prediction
error - Assuming Gaussian data, we can find boundswhich
SPE should only exceed 1- of the time - Result due to Jackson and Mudholkar, 1979
Traffic on Link 2
Traffic on Link 1
30Distributed Detection
- Maintain e-accurate PCA decomposition
- Anomaly detection on incomplete information
31Background Matrix Perturbation Theory I
- Perturbation bound on eigenvalues
32Background Matrix Perturbation Theory II
For orthogonal projection, we have
33Filtering Error and ?
- Recall that monitors have the distributed m
x n matrix
34Filtering Error and Eigen Error
35The F-Norm of Perturbation Error I
36The F-Norm of Perturbation Error II
37The Root Mean Square of Eigen Error
38Individual Eigen Errors
39Application to Network Anomaly Detection
Eigenerror and Threshold Error
Detection Performance
40Background PCA
- Principal Component Analysis (PCA) on continuous
time domain involves tracking and calculating the
top eigenvalues and correspondent eigenvectors of
covariance matrix A YTY
41Background Matrix Norm
42Computing The Eigengap
43The F-Norm of Perturbation Error III
44Background Matrix Perturbation Theory II
45The F-Norm of Perturbation Error III
46How Good is The Model?
Slack
47Application to Network Anomaly Detection
48Detection of Network-wide Anomalies
- A volume anomaly is a sudden change in an
Origin-Destination flow (i.e., point to point
traffic) - Given link traffic measurements, diagnose the
volume anomalies - PCA approach to separate
- normal from anomalous traffic
- Normal traffic is well
- approximated as occupying
- a low dimensional subspace
49The Centralized Algorithm (I)
- The data matrix Y
- Each link produces a column data Yi(t) of size m
- n links produces a row signals y at each time
inst. - The transformation
- Periodically (e.g. once a week) collect all data
to update Y matrix and perform PCA - Use eigenvalues to compute threshold Qa, and
eigenvectors to compute projection matrix Cab - The detection on new row y
- Alarm
50Typical Link Data
Abilene
Sprint-Europe
Finding common patterns (e.g. volume anomalies)
from such high-dimensional, noisy data is very
difficult
51The Protocol At The Coordinator
- The coordinator makes a new row
- where
- If any element in is updated
- Update
- Compute new
- Perform detection using
- Otherwise, do nothing