NetworkWide Traffic Analysis: Methods and Applications - PowerPoint PPT Presentation

1 / 62
About This Presentation
Title:

NetworkWide Traffic Analysis: Methods and Applications

Description:

to seattle. to atlanta. to LA. from nyc. 10. Why Origin-Destination Flows? ... common (user-driven) daily and weekly cycles: 17. Outline for rest of talk ... – PowerPoint PPT presentation

Number of Views:174
Avg rating:3.0/5.0
Slides: 63
Provided by: anukool
Category:

less

Transcript and Presenter's Notes

Title: NetworkWide Traffic Analysis: Methods and Applications


1
Network-Wide Traffic Analysis Methods and
Applications
  • Anukool LakhinaPh.D. Proposal

Committee Azer Bestavros (chair) John ByersMark
Crovella (advisor) Christophe DiotEric Kolaczyk
2
Network Traffic
  • Fundamental unit of information that computer
    networks carry
  • Networks are built to deliver traffic
  • Important to study in order to
  • Understand performance of network components,
  • Understand user communication patterns,
  • Manage and operate networks

The other traffic.
3
Traditional Traffic Analysis
  • Focus on
  • Short, stationary periods
  • Traffic on a single link in isolation
  • Volume descriptors (i.e., bytes, packets)
  • Principal results
  • Models for single-link traffic

4
What operators care about
  • Focus on
  • Long, non-stationary periods
  • Traffic on multiple links simultaneously
  • Volume and feature metrics
  • bytes, packets, packet header fields, etc.
  • Principal goals
  • Traffic engineering
  • Anomaly detection
  • Capacity planning

5
Need for Network-Wide Analysis
  • Traffic Engineering How does traffic move
    throughout the network?
  • Capacity planning How much and where in network
    to upgrade?
  • Attack/Anomaly Detection Which links show
    unusual traffic?

6
Network-Wide Traffic Analysis
links
  • The simultaneous examination of multi-feature
    traffic from multiple resources (links or
    routers) in a network

7
Outline for rest of talk
  • Challenges of Network-Wide Analysis
  • Approach and Methods
  • Dimension Analysis of Network-Wide Traffic
  • Subspace Methods for Anomaly Diagnosis
  • Applications and Results
  • Volume Anomalies
  • Feature Anomalies
  • Thesis Outline and Plan

8
Network-Wide Analysis is Difficult
  • Measuring and modeling traffic volume on all
    links simultaneously is challenging
  • 100s of links in a large IP backbones
  • Even single link modeling is difficult
  • High-dimensional timeseries
  • Curse of dimensionality
  • Correlation and redundancy in link traffic
  • Is there a more fundamental representation?

9
Origin-Destination Traffic Flows
  • Traffic entering the network at the origin and
    leaving the network at the destination

10
Why Origin-Destination Flows?
link traffic
traffic
time
  • All link traffic arises from the superposition
    of OD flows
  • OD flows capture distinct traffic demands no
    redundant traffic
  • A useful primitive for whole-network analysis

11
But, This is Still Difficult!
How do we extract meaning from such noisy,
high-dimensional traffic data in a systematic
manner?
12
High Dimensionality A General Strategy
  • Look for a small number of patterns that are
    widely present throughout network
  • Instance of a general strategy seeking a
    low-dimensional representation preserving the
    most important features of data
  • Commonly used tool Principal Component Analysis
    (PCA)

13
Principal Component Analysis
For any given dataset, PCA finds a new coordinate
system that maps maximum variability in the data
to a minimum number of axes New axes are called
Principal Axes or Components
x2
x1
14
PCA on OD flows
OD pairs
OD pairs
OD pairs
time
time
OD pairs
Eigenflow
PC
XUSVT
15
Low Intrinsic Dimensionality of OD Flow Traffic
Studied via Principal Component Analysis Key
result OD flow traffic is well approximated by
a low dimensional subspace For example Traffic
on 121 OD flows is well approximated in space of
only 4 dimensions
16
Reasons for Low Dimensionality
  • OD flows tend to vary according to common
    (user-driven) daily and weekly cycles

17
Outline for rest of talk
  • Challenges of Network-Wide Analysis
  • Approach and Methods
  • v Dimension Analysis of Network-Wide Traffic
  • Subspace Methods for Anomaly Diagnosis
  • Applications and Results
  • Volume Anomalies
  • Feature Anomalies
  • Thesis Outline and Plan

18
Network Anomaly Diagnosis
  • Is my customer being attacked?
  • Is someone probing in my network?
  • Are there worms spreading?
  • A sudden traffic shift?
  • An equipment outage?
  • Something never seen before?

A general, unsupervised method for reliably
detecting and classifying network anomalies is
needed
19
Example Network-Wide Anomalies
  • Distributed Attack

Traffic Shifts
Working Hypothesis Effective diagnosis of
network anomalies requires a whole-network
approach
20
Finding a needle in the haystack(s)
Problem extracting typical behavior and
anomalies from multi-way, high-dimensional traffic
21
Turning High Dimensionality into a Strength
  • Traditional traffic anomaly diagnosis builds
    normality in time
  • Methods exploit temporal correlation
  • Whole-network view is an attemptto examine
    normality in space
  • Make use of spatial correlation
  • Useful for anomaly diagnosis
  • Strong trends exhibited throughout network are
    likely to be normal
  • Network-wide anomalies can be detected

22
The Subspace Method
  • An approach to separate normal anomalous
    network-wide traffic
  • Designate temporal patterns most common to all
    the OD flows as the normal subspace
  • Remaining temporal patterns form the anomalous
    subspace
  • Then, decompose traffic in all OD flows by
    projecting onto the two subspaces to obtain

Residual trafficvector
Traffic vector of all OD flows at a particular
point in time
Normal trafficvector
23
The Subspace Method, Geometrically
In general, anomalous traffic results in a large
value of
Traffic on Flow 2
Traffic on Flow 1
24
Handling Multiway Traffic Data
Feature Feature Feature
FeatureBlock 1 Block 2 Block 3
Block 4
  • Unwrap the multiway matrix into a single
    matrix
  • Then, apply the subspace method on the merged
    matrix, to write
  • Detect anomalies by monitoring size of

residual
normal
all traffic
25
Outline
  • Challenges of Network-Wide Analysis
  • Approach and Methods
  • v Dimension Analysis of Network-Wide Traffic
  • v Subspace Methods for Anomaly Diagnosis
  • Applications and Sample Results
  • Volume Anomalies
  • Feature Anomalies
  • Thesis Outline and Plan

26
Subspace Detection on Byte Traffic
Value of
over time(SPE)
SPE at anomaly time points clearly stand out
27
An example PF anomaly (DOS attack)
No Dominant Source IP Dominant Dest. IP 80 of
P and 92 of F traffic. Cause DOS attack
28
Re-routing around failure (ingress-shift)
29
Beyond Volume Metrics Feature Anomalies
  • Many important anomalies do not cause detectable
    disruptions in traffic volume
  • e.g., port scans, network scans, worms
  • To mitigate anomalies, must be able to
    automatically classify anomalies
  • Our thesis Anomalies can be detected and
    distinguished by inspecting traffic features
    SrcIP, SrcPort, DstIP, DstPort
  • Key Challenge Mining Anomalies in Network-Wide
    Multi-Feature Traffic

30
Harnessing Traffic Features
  • Typical Traffic

31
Feature Entropy Shows Promise
Bytes
Port scan dwarfed in volume metrics
Packets
H(Dst IP)
But stands out in feature entropy, which also
revealsits structure
H(DstPort)
32
Final Thoughts
  • Network-Wide Traffic Analysis needed for many
    operational problems
  • Central Difficulty High Dimensionality
  • Dimensional Analysis yields accurate
    low-dimensional descriptions of network-wide
    traffic
  • Subspace Methods leverage low dimensionality for
    network-wide anomaly diagnosis
  • Detection and Classification
  • Promising approach for analyzing network traffic

33
Outline for rest of talk
  • Focus on Anomaly Diagnosis
  • Challenges of Network-Wide Analysis
  • Approach and Methods
  • v Structural Analysis of Network-Wide Traffic
  • v Subspace Methods for Anomaly Diagnosis
  • Applications and Sample Results
  • Volume Anomalies
  • Feature Anomalies
  • Thesis Outline and Plan

34
Thesis Outline
  • Part I Background
  • Part II Methods
  • Part III Applications
  • Part IV Summary

35
Thesis Outline
  • IntroductionMotivating problems, Need for
    Network-Wide Analysis, Challenges, Thesis
    Organization
  • Related WorkNetwork traffic analysis, Network
    anomalies
  • Part I Background
  • Part II Methods
  • Part III Applications
  • Part IV Summary

36
Thesis Outline
  • Measurement Methods Measuring network-wide
    traffic, Networks studiedPTLIMC04
  • Network-Wide Traffic AnalysisDimension Analysis
    with PCALPCSIGMETRICS04
  • Subspace Methods for Anomaly Diagnosis
  • Traffic Subspaces, Subspace Method, Multi-way
    Subspace Method LCDSIGCOMM04,LCDSIGCOMM05
  • Part I Background
  • Part II Methods
  • Part III Applications
  • Part IV Summary

37
Thesis Outline
  • Volume Anomalies Volume anomalies in link
    traffic, in OD flow traffic
  • LCDSIGCOMM04,LCDIMC04
  • Feature AnomaliesDetection and classification of
    network-wide anomaliesLCDSIGCOMM05,LCDFLOCON05
  • Traffic Matrix Estimation
  • Low dimensionality for TM estimation,
    comparative study SLTSIGMETRICS05
  • Part I Background
  • Part II Methods
  • Part III Applications
  • Part IV Summary

38
Thesis Outline
  • Implementation Issues related to implementation
    and deployment of methods
  • ConclusionsFinal thoughts, summary, and future
    work
  • Part I Background
  • Part II Methods
  • Part III Applications
  • Part IV Summary

39
Tentative Schedule
  • September 16 Proposal defense
  • November 1 First draft of thesis delivered to
    committee
  • First Week of December Ph.D. Defense

40
Thanks!
41
Backup Slides
42
Discussion Implementation
  • Subspace method is computed efficiently using the
    singular value decomposition (SVD)
  • SVD of a t ? m matrix is O(tm2)
  • Time on a 2016 ? 121 matrix less than 3 sec
  • Many methods for more efficient computation in
    on-line settings e.g.
  • A sliding window approach, or
  • Incremental SVD techniques
  • Distributed implementations possible

43
Our Approach
  • Key Idea
  • Anomalies can be detected and distinguished
    by inspecting traffic features SrcIP,
    SrcPort, DstIP, DstPort
  • Overview of Methodolgy
  • Inspect distributions of traffic features
  • Correlate distributions network-wide to detect
    anomalies
  • Cluster on anomaly features to classify

44
Principal Component Analysis
Coordinate transformation method
Original Data
Transformed Data
PC2
PC1
x2
PC2
x2
u2
u1
u2
PC1
u1
x1
x1
45
Traffic
46
Network Traffic
  • Fundamental unit of information carried by
    networks organized into data packets
  • Talk about packet traffic features, because we
    need this notion later
  • Talk about why we should study traffic (broadly)

47
Detection Illustration
Value of
over time(SPE)
SPE at anomaly time points clearly stand out
48
Sample Results On Detection
Multi-Source DOS Hussain et al, 03
Code Red Scan Jung et al, 04
  • Evaluation Methodology
  • Superimpose known anomaly traces into OD flows
  • Test sensitivity at varying anomaly intensities,
    by thinning trace
  • Results are average over a sequence of
    experiments

Entropy Volume
Entropy Volume
VolumeAlone
VolumeAlone
6.3
0.63
1.3
12
Detection rate vs. Anomaly intensity(intensity
compared to average flow bytes)
49
Sample Results on Classification
Hres (Src IP)
Hres (Src IP)
Legend Code Red Scanning Single source DOS
attack Multi source DOS attack
Hres (Dst IP)
Known Labels
Cluster Results
Summary Correctly classified 292 of 296 known
anomalies
50
Identification
  • An anomaly causes a displacement of the link
    traffic vector away from
  • The direction of the displacement gives
    information about the nature of the anomaly
  • Intuition find the hypothesis that best
    describes the detected anomaly

51
Hypothesis-Based Identification
  • Denote set of all anomalies by
  • Each adds link traffic specified by
  • In the presence of
  • is found by minimizing the distance to
    in the direction of the anomaly

Normal Subspace,
52
Selecting the Best Hypothesis
  • 1. For each hypothesized anomaly
  • compute
  • 2. Select anomaly as

The best hypothesis (OD flow) accounts for
maximum residual traffic
53
But, This Is Still Complicated
  • Each OD flow serves a different customer
    population
  • No two OD flows carry same traffic
  • Are they still correlated?
  • Even more OD flows than links
  • High dimensional, multivariate timeseries
  • Still have curse of dimensionality

54
Whole-Network Diagnosis
  • Working Hypothesis Effective diagnosis of
    network anomalies requires a whole-network
    approach

55
Outline
  • Subspace Method applied to Link Traffic
  • Problem Volume Anomaly Diagnosis
  • Detection, Identification, Quantification
  • Validation
  • Subspace Method applied to Flow Traffic
  • Problem General Anomaly Detection
  • Sample Results
  • Conclusions

56
PCA on OD flows
OD pairs
OD pairs
OD pairs
time
time
OD pairs
Eigenflow
PC
XUSVT
57
PCA on OD flows (2)
Each eigenflow is a weighted sum of all OD
flows Eigenflows are orthonormal

Singular values indicate the energy attributable
to a principal component
Each OD flow is weighted sum of all eigenflows
58
(No Transcript)
59
(No Transcript)
60
All Anomalies
Anomaliesvisible in traffic
61
PCA on OD flows
  • Each principal axis in the direction of maximum
    (remaining) energy in set of OD flows
  • Ordered by amount of energy they capture
  • Eigenflow set of OD flows mapped onto a
    principal axis a common pattern
  • Ordered by most common to least common pattern
  • An OD flow is a weighted sum of eigenflows

62
Final Thoughts
  • OD flows are a useful primitive for whole-network
    traffic analysis
  • PCA forms an effective basis for a Structural
    Analysis of OD flows
  • Structural Analysis has many benefits
  • provides insight into nature of OD flows
  • allows feature-based decomposition of OD flows
  • provides leverage on many important problems
Write a Comment
User Comments (0)
About PowerShow.com