Title: Network Tomography and Anomaly Detection
1Network Tomography and Anomaly Detection
Network map from www.opte.org
2Brain mapping (opening it up can disturb the
system)
3Internet Boom
- too complex to measure everywhere, all the time
- traffic measurements expensive (hardware,
bandwidth)
4Brain Tomography
counting projection
MRF model
Poisson
5Link-level Network Tomography
6Link-level Network Tomography
Solely from edge-based traffic measurements,
infer internal
topology / connectivity
link-level loss probability and delay distribution
7Application Topology Discovery
- Challenges
- 12 never respond,15 multiple interfaces -
Barford et al (2000) - detect level-2 topology invisible to IP layer
(e.g., switches)
8Application Overlay Voice-over-IP
- Multiple paths to choose from
- select paths with minimal delay or delay variance
- Send a small number of critical packets (vocal
transitions) along multiple paths - Use these packets to estimate the path delays
(and the extent of path diversity)
Access Network
Overlay Link
Service Gateway
Autonomous System(s)
9Network Monitoring
- Challenges
- Restricted measurement
- High volumes and high rates of data (sampling of
traffic on Gb/s routers) - High dimensional data (source/destination IP
addresses, port numbers) - Goals
- Supply networking protocols with relevant
performance information. - Identify anomalous behaviour and operational
transitions. - Provide network administrators with appropriate
notification or visualization.
10Outline
- Inference about network performance based on
passive measurements or active probing - Two components to the talk
- Network tomography
- Network anomaly detection
- Focus on online, sequential approaches
- Account for non-stationary behaviour
- Dont repeat work that has already been done
11Network Tomography Likelihood Formulation
- A routing matrix (graph)
- ? packet loss probabilities
- or queuing delays
- for each link
- y packet losses or delays
- measured at the edge
- randomness inherent in
- traffic measurements
Statistical likelihood function
12Classical Problem
Solve the linear system
Interesting if A, ?, or ? have special structures
Maximize the likelihood function
or
13Network Tomography The Basic Idea
sender
receivers
14Network Tomography The Basic Idea
sender
receivers
15Packet-pair measurements
cross-traffic
delay
measurement packet pair
packet(1) and packet(2) experience (nearly)
identical losses and/or delays on shared links
16Modelling time-variations
Cross-traffic
Cross-traffic
- Nonstationary cross-traffic induces
time-variation - Directly model the dynamics (but not the
traffic!) - Goal is to perform online tracking and prediction
of network link characteristics
17Non-stationary behaviour
Introduce time-dependence in parameters
Filtering exercise (track ?t )
(1) Describe dynamic behaviour of ?t
(2) Form estimate
(MMSE)
18Particle Filtering
19Delay Distribution Tracking
- Time-varying delay distribution of window size R
at time m
Delay unit
- In each window, R probe measurements.
- Form estimates of average delay and jitter over
short time intervals
time
Delay units
20Dynamic Model
- Queue/traffic model
-
- reflected random walk on
0,max_del
Probability
Delay units
21Observations
22Estimation of Delay Distributions
- Sequential Monte Carlo Approximation to
posterior mean estimate
Message-passing algorithm
Particle weights
- Estimate of time-varying delay distribution
23Analysis
- Complexity per
measurement
Average Number of Unique Links
Number of Particles
Max. delay units per link
- Convergence analysis of Crisan, Doucet 01 Le
Gland, Oudjane 02 applies. - The approximation to the posterior mean estimate
converges to the true estimate as N 8
24Simulation Results ns2
Delay Distributions
true
tracking
Mean Delay
time
25Comments
- Dynamic models allow us to account for
non-stationarity - but realistic models are hard to derive and
incorporate - Particle filtering only appropriate when
analytical techniques fail - non-Gaussian or non-linear dynamics or
observations - Sequential structure allows on-line
implementation - Care must be taken to reduce computation at each
step
26Network Anomaly Detection
- In tomography, a primary challenge is the
restriction on available measurements. - Anomaly detection a primary challenge is the
abundance of measurements. - How can we process data at a sufficient rate?
- How should we extract relevant information?
27Netflow Data
- Records of flows.
- A flow is defined by (source IP, dest. IP,
source port , dest. port ) - Packets are sampled at configurable rates.
- Exported at 1-minute or 5-minute intervals.
28Dataset Abilene Network
Abilene Weathermap Indiana University
Thanks to Rick Summerhill and Mark Fullmer at
Abilene for providing access to the data.
29Principal Component Analysis (PCA)
- Goal Identify a low-dimensional subspace that
captures the key components of the feature set - Idea If (most of) a measurement does not lie in
this subspace, then it is anomalous - PCA
- conduct a linear transformation to choose a new
coordinate system - Projection onto first principal component has
greater variance than any other projection
(maximum energy). - Subsequent principal components capture greatest
remaining energy
30PCA (2)
- Reduce dimensionality by eliminating principal
components that do not contribute significantly
to variance in the dataset (small singular value) - Not optimized for class separability (linear
discriminant analysis) - Minimizes reconstruction error under L2 norm.
31Eigenflow Analysis
- Lakhina et al. (2004, 2004b).
- PCA analysis of Origin-Destination (OD) Flows
- Eigenflow set of flows mapped onto a single
principle component - Intrinsic Dimensionality Empirical studies for
Sprint and Abilene networks indicated that 5-10
principal components sufficed to capture most of
the energy.
32PCA-based Anomaly Detection
- Perform PCA on block of OD flow measurements
- Project each measurement onto primary principal
components - Test whether the residual energy exceeds a
threshold. - Squared prediction error (SPE - Q-statistic)
used to test for unusual flow-types. - Prone to Type-I errors (false positives) when
applied to transient operations. - In these cases, the assumption that the source
data is normally distributed is violated.
33Online Method
- Dont need to relearn from scratch when new data
arrive - Computational cost per time step should be
bounded by constant independent of time - Block-based PCA unattractive
- Alternative method Kernel Recursive Least
Squares (KRLS)
34KRLS
- Represent function as
- Where xi are training points
- Desire a sparse solution (storage and time
savings generalization ability) - Effective dimensionality of manifold spanned by
training feature vectors may be much smaller than
feature space dimension - Identify linearly independent feature vectors
that approximately span this manifold.
35KRLS
- Sequentially sample a stream of input/output
pairs - At time step t, assume we have collected a
dictionary of samples - where by construction are linearly independent
feature vectors
36KRLS
- We encounter a new sample xt.
- Test whether is approximately linearly dependent
on feature vectors. - If not, add it to dictionary.
Threshold
Dictionary approximation
37KRLS Properties
- Provided input set X is compact, then number of
dictionary elements is finite. - Approximate version of kernel PCA
- eigenvectors with eigenvalues significantly
larger than are projected almost entirely
onto the dictionary set. - O(m2) memory and O(tm2) time
- Compare exact kernel PCA O(t2) memory and
O(t2p) time.
38Application in Networks
- Data set is the Origin-Destination Flows (11x11
matrix 121 dimensional vector per measurement
interval). - Normalized, these comprise the features.
- We use the total traffic per measurement interval
as the associated value y
39Total traffic
No. Packets
Measurement interval
0000 hrs on Aug 10, 2005 to 2359 hrs Aug 21, 2005
at Chicago router. Gives 3456, 5-minute
intervals over the 12-day period.
40Origin-Destination Flows
t 100
t 1
t 1300
t 3000
41Building the Dictionary
d
n 0.2
Elements
Gaussian
d
n 0.1
Elements
Linear
Measurement interval
Measurement interval
42Dictionary Components
Element 6
Element 5
Element 20
Element 22
43KRLS Anomaly Detection Algorithm
- Based on xt , evaluate dt.
- If dt lt ?1, green-light traffic.
- If dt lt ?2, raise red alarm.
- If ?1 lt dt lt ?2 raise orange alarm.
- Test usefulness of xt. (Does f(xt) provide good
support for ensuing vectors). - If yes, add xt to the dictionary.
- If no, raise red alarm.
- Remove any obsolete dictionary elements
44Evaluating Usefulness
Normal
Obsolete
Kernel value
Anomalous
Timestep
45Anomaly Detection
KRLS
PCA
Magnitude of Residual
OCNM
Euclidean distance
KRLS
PCA
OCNM
Timestep
46PCA versus KRLSAnomaly 1
No. IP flows
Timestep
47PCA Versus KRLSAnomaly 2
No. IP flows
Magnitude of Projection
Timestep
48Summary and Challenges
- Network monitoring presents challenges on
different fronts - Constraints on available measurements
(reconstruction based on partial views) - High-rate, high-dimensional, distributed data
- (Some of the many) open questions
- Tomography network models, spatial temporal
correlations, optimal sampling, multiple source. - Anomaly detection thresholds, dictionary
control, feature space, dataset
49Fig 3
Detection Rate ()
False Alarm Rate ()
50Particle filtering
Objective Estimate expectations
with respect to a sequence of distributions known
up to a normalizing constant, i.e.
Monte Carlo Obtain N weighted samples
where
such that