Title: Cristian Estan, George Varghese
1Data Streaming in Computer Networking
- Cristian Estan, George Varghese
- University of California, San Diego
2Talk structure
- Traditional streaming in networking
- Rules of the game
- Iteration paradigm packet scheduling example
- New streaming problems
- Detecting malicious traffic
- Understanding network workloads
3Internet service model
Destination IP address
Source IP address
Destination port
Source port
Data
Header
Flow
Internet
4Traditional router functions
IP Lookup
?
Incoming 1
Outgoing 1
Incoming 2
Outgoing 2
Incoming 3
Outgoing 3
5Traditional router functions
IP Lookup
Out2
Incoming 1
Outgoing 1
Incoming 2
Outgoing 2
Incoming 3
Outgoing 3
6Traditional router functions
Switching
Out2
Out3
Incoming 1
Outgoing 1
Out3
Incoming 2
Outgoing 2
Out1
Out2
Incoming 3
Outgoing 3
7Traditional router functions
Scheduling
Incoming 1
Outgoing 1
Flow 1
Flow 2
Incoming 2
Outgoing 2
Flow 3
Incoming 3
Outgoing 3
8Traditional router functions
Scheduling
Incoming 1
Outgoing 1
Flow 1
Flow 3
Flow 2
Incoming 2
Outgoing 2
Incoming 3
Outgoing 3
9Rules of the game
- Wire speed processing
- At 40 gigabits/s 8 nanoseconds per packet - need
fast SRAM - Limited SRAM (say 32 megabits) but millions of
flows - What does this mean for algorithms?
- Low worst case complexity bounds
- Low bounds on the amount of memory used
- Differences from databases
- One pass vs. multiple passes
- Worst case vs. average case
- Small constants vs. asymptotic complexity
10Talk structure
- Traditional streaming in networking
- Rules of the game
- Iteration paradigm packet scheduling example
- New streaming problems
- Detecting malicious traffic
- Understanding network workloads
11Iteration paradigm
- Many networking algorithms use iteration in time
- Way to allow multi-pass algorithms without
storing input by assuming inputs do not change
quickly - Many examples (MULTOPS for DoS detection Gil01,
CSFQ for scheduling Stoica98) - Would be nice to formalize tradeoff between
quality of results and drift rate of input
12Example Core Stateless FQ
R
R
If RgtF drop with probability 1-F/R Iteratively
compute fair share F
R
Mark rate R
13Talk structure
- Traditional streaming in networking
- Rules of the game
- Iteration paradigm packet scheduling example
- New streaming problems
- Detecting malicious traffic
- Understanding network workloads
14New streaming problems
- Detecting malicious activity
- Flooding (denial of service attacks)
- Worms
- Scans looking for vulnerable servers
- Understanding workloads
- Billing
- Planning network growth
- Application mix
15Detecting malicious traffic
- Well defined building blocks
- Detecting large aggregates
- Similar to iceberg queries
- Counting active flows in an aggregate
- Similar to counting distinct values
- Many open problems e.g. detect worms and DoS
attacks (not clear what is right formal problem
statement)
16Talk structure
- Traditional streaming in networking
- Rules of the game
- Iteration paradigm packet scheduling example
- New streaming problems
- Detecting malicious traffic
- Understanding network workloads
17Informal problem definition
Analysis
Traffic reports
Applications 50 of traffic is
Kazaa Sources 20 of traffic comes from
Steves PC
Terabytes of measurement data
18Informal problem definition
Analysis
Traffic reports
20 is Kazaa from Steves PC
50 is Kazaa from the dorms
Terabytes of measurement data
19Formal problem definition
- Define clusters
- Atoms fields 1 to n with hierarchies in each
field including - Cluster intersection of one set from each field
hierarchy - Example Source, DestinationCS Net, App Email
- Threshold clusters
- Report traffic clusters above threshold T (e.g.
1 of traffic) - Omit redundant clusters
- Compression rule remove general clusters from
report when its traffic can be inferred (up to
error T) from on non-overlapping more specific
clusters
20Solution status
- The good
- Offline tool AutoFocus SIGCOMM 2003 paper
- Detected worm, busy servers, squid cache, etc.
- Network managers like it
- The bad
- Takes long 3 hours at T0.5 for one day trace
- Needs much memory 300 Mbytes
- The wanted
- Streaming algorithm - we invite improvements
21Conclusions
- New rules strict constraints on algorithms
running in routers - Iteration in time can give simple algorithms,
but needs more formalization as to quality of
results - General open problems many challenges in
detecting malicious traffic such as worms and DoS
attacks - Specific open problem computing traffic cluster
reports in streaming fashion
22Thank you!
Algorithms
?
Databases
Networking
23Unidimensional clusters
40
35
15
35
30
160
110
75
10.8.0.2
10.8.0.3
10.8.0.4
10.8.0.5
10.8.0.8
10.8.0.9
10.8.0.10
10.8.0.14
24Unidimensional clusters
500
10.8.0.0/28
10.8.0.0/29
10.8.0.8/29
120
380
10.8.0.0/30
10.8.0.4/30
10.8.0.8/30
10.8.0.12/30
75
305
50
70
10.8.0.10/31
10.8.0.2/31
10.8.0.4/31
10.8.0.8/31
10.8.0.14/31
50
70
270
35
75
40
35
15
35
30
160
110
75
10.8.0.2
10.8.0.3
10.8.0.4
10.8.0.5
10.8.0.8
10.8.0.9
10.8.0.10
10.8.0.14
25Unidimensional clusters
500
10.8.0.0/28
10.8.0.0/29
10.8.0.8/29
120
380
10.8.0.0/30
10.8.0.4/30
10.8.0.8/30
10.8.0.12/30
75
305
50
70
10.8.0.10/31
10.8.0.2/31
10.8.0.4/31
10.8.0.8/31
10.8.0.14/31
50
70
270
35
75
40
35
15
35
30
160
110
75
10.8.0.2
10.8.0.3
10.8.0.4
10.8.0.5
10.8.0.8
10.8.0.9
10.8.0.10
10.8.0.14
26Unidimensional clusters
500
10.8.0.0/28
10.8.0.0/29
10.8.0.8/29
120
380
10.8.0.8/30
305
10.8.0.8/31
270
160
110
10.8.0.8
10.8.0.9
27Unidimensional clusters
500
10.8.0.0/28
10.8.0.0/29
10.8.0.8/29
120
380
10.8.0.8/30
305
10.8.0.8/31
270
160
110
10.8.0.8
10.8.0.9
28Multidimensional clusters
- Two dimensions
- Source network
- Protocol (traffic type)
- Trees turn into lattice
- Multiple parents
- Nodes overlap
29Offline solution
30Sample report