New Directions in Traffic Measurement and Accounting Cristian Estan - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

New Directions in Traffic Measurement and Accounting Cristian Estan

Description:

False negatives. False positives. Expected error in traffic estimates. Michela Becchi ... Reduction of false positives. Conservative update of the counters ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 35
Provided by: arlW
Category:

less

Transcript and Presenter's Notes

Title: New Directions in Traffic Measurement and Accounting Cristian Estan


1
New Directions in Traffic Measurement and
AccountingCristian Estan UCSDGeorge Varghese
- UCSD
Discussion Leaders Andrew Levine Jeff Mitchell
Reviewed by Michela Becchi
2
Outline
  • Introduction
  • Cisco NetFlow
  • Sample and Hold Multistage Filters
  • Analytical Evaluation
  • Comparison
  • Measurements
  • Conclusions

3
Introduction
  • Measuring and monitoring of network traffic for
    Internet Backbones
  • Long term traffic engineering (traffic rerouting
    and link upgrade)
  • Short term monitoring (hot spots and DOS attacks
    detection)
  • Accounting (usage based pricing)
  • Scalability problem
  • FixWest, MCI traces million flows/hour between
    end host pairs

4
Cisco NetFlow
  • Flow unidirectional stream of data identified by
  • Source IP address and port
  • Destination IP address and port
  • Protocol
  • TOS byte
  • Rx router interface
  • An entry in DRAM for each flow
  • Heuristics for end-of-flow detection
  • Flow data exported via UDP packets from routers
    to collection server for processing

5
Cisco NetFlow - problems
  • Processing overhead
  • Interfaces faster then OC3 (155Mbps) slowed down
    by memory cache updates
  • Collection overhead
  • Collection server
  • Network connection
  • NetFlow Aggregation (based on IP prefixes, ASes,
    ports)
  • Extra aggregation cache
  • Only aggregated data exported to collection
    server
  • PB High amount of aggregates

6
Sampled NetFlow
  • Sampling packets
  • Per flow information based on samples
  • Problems
  • Inaccurate (sampling and losses)
  • Memory Intensive
  • Slow (DRAM needed)

7
Idea
  • A small percentage of flows accounts for a large
    percentage of the traffic
  • Algorithms for identifying large flows
  • Use of SRAM instead of DRAM
  • Categorize algorithms depending on
  • Memory size and memory references
  • False negatives
  • False positives
  • Expected error in traffic estimates

8
Algorithms
  • Sample and Hold
  • Sample to determine flows to consider
  • Update flow entry for every subsequent packet
    belonging to the flow
  • Multistage Filters
  • Use multiple tables of counters (stages) indexed
    by a hash function computed on flow ID
  • Different stages have independent hash functions
  • For each packet and for each stage, compute hash
    on flow ID and add the packet size to
    corresponding counter
  • Consider counters in all stages for addition of
    packets to flow memory

9
Sample and Hold
Sampled Packet (probability1/3) Entry
created Entry updated
Flow Memory
F1 2
F1 3
F1 1
F3 2
F3 1
Transmitted Packets
10
Multistage Filters
flow memory
Array of counters
Hash(Pink)
11
Multistage Filters
flow memory
Array of counters
Hash(Green)
12
Multistage Filters
flow memory
Array of counters
Hash(Green)
13
Multistage Filters
flow memory
14
Multistage Filters
flow memory
Collisions are OK
15
Multistage Filters
Reached threshold
flow memory
stream1 1
Insert
16
Multistage Filters
flow memory
stream1 1
17
Multistage Filters
flow memory
stream1 1
stream2 1
18
Multistage Filters
flow memory
Stage 1
stream1 1
19
Parallel vs. Serial Multistage Filters
  • Threshold for serial filters T/d (d number of
    stages)
  • Parallel filters perform better on traces of
    actual traffic

20
Optimizations
  • Preserving entries
  • Nearly exact measurement of long lived large
    flows
  • Bigger flow memory required
  • Early removal
  • Definition of a threshold R lt T to determine
    which entries added in the current interval to
    keep
  • Shielding
  • Avoid to update counters for flows already in
    flow memory
  • Reduction of false positives
  • Conservative update of the counters
  • Update normally only the smallest counter
  • No introduction of false negatives
  • Reduction of false positive

21
Conservative update of counters
Gray all prior packets
22
Conservative update of counters
23
Conservative update of counters
24
Analytical Evaluation
  • Sample and Hold
  • Prob.(false negatives) (1-p)T e(-O)
  • Best estimate for flow size s c1/p
  • Upper bound for flow memory size OC/T
  • Preserving entries 2OC/T
  • Early removal OC/TC/R
  • Parallel Multistage Filters
  • No false negatives
  • Prob(false positives) f(1/k)d
  • Upper bound for flow size estimate error
    f(T,1/k)
  • Bound on memory requirement
  • Where
  • T threshold, psample prob (O/T), c number of
    bytes counted for flow,
  • C link capacity, O oversampling factor, d
    filter depth,
  • k stage strength (Tb/C)

25
Comparison w/ Memory Constraint
  • Assumptions
  • Memory Constraint M
  • The considered flow produces traffic zC (e.g.
    z0.01)
  • Observations and Conclusions
  • Mz oversampling factor
  • SH and MF better accuracy but more memory
    accesses
  • SH and MF through SRAM, SNetflow through DRAM,
    as long as x is larger than the ratio of a DRAM
    memory access to an SRAM memory access

26
Comparison w/o Mem Constraint
  • Observations and Conclusions
  • Through preserving of entries, SH and MF provide
    exact estimation for long-lived large flows
  • SH and MF gain in accuracy by losing in memory
    bound (uzC/T)
  • Memory access as in case of constrained memory
  • SH provides better accuracy for small
    measurement intervals gt faster detection of new
    large flows
  • Increase in memory size gt greater resource
    consumption

27
Dynamic threshold adaption
  • How to dimension the algorithms
  • Conservative bounds vs. accuracy
  • Missing a priori knowledge of flow distribution
  • Dynamical adaptation
  • Keep decreasing the threshold below the
    conservative estimate until the flow memory is
    nearly full
  • Target usage of memory
  • Adjustment ratio of threshold
  • For stability purposes, adjustments made across 3
    intervals
  • Netflow fixed sampling rate

28
Measurement setup
  • 3 unidirectional traces of Internet traffic
  • 3 flow definitions
  • Traces are between 13 and 17 of link capacities

29
Measurements
  • SH (threshold 0.025, oversampling 4)
  • MF (strength3)
  • Differences between analytical bounds and actual
    behavior (lightly loaded links)
  • Effect of preserving entries and early removal

30
Measurements
  • Flow IDs 5-tuple
  • MF always better than SH
  • SNetflow better for medium flows, worse for very
    large ones
  • AS reduced number of flows (entries in flow
    memory).
  • Flow IDs destination IP
  • Flow IDs ASes

31
Conclusions
  • Focus on identifying large flows which creates
    the majority of network traffic
  • Proposal of two techniques
  • Providing higher accuracy than Sampled Netflow
  • Using limited memory resource (SRAM)
  • Mechanism to make the algorithms adaptable
  • Analytical Evaluation providing theoretical
    bounds
  • Experimental measurements showing the validity of
    the proposed algorithms

32
Future works
  • Generalize algorithms to automatically extract
    flow definitions for large flows
  • Deepen analysis, especially to cover discrepancy
    between theory and experimental measurements
  • Explore the commonalities with other research
    areas (e.g. data mining, architecture,
    compilers) where issues related to data volume
    and high speed also hold

33
The End
  • Questions?

34
Zipf distribution
  • Characteristics
  • Few data score very high
  • A medium number of elements have medium score
  • Huge number of elements score very low
  • Examples
  • Use of words in a natural language
  • Web use (e.g. www.sun.com website accesses)
Write a Comment
User Comments (0)
About PowerShow.com