Title: Automatically Inferring Patterns of Resource Consumption in Network Traffic
1Automatically Inferring Patterns of Resource
Consumption in Network Traffic
- Cristian Estan, Stefan Savage, George Varghese
- University of California, San Diego
2Who is using my link?
3Looking at the traffic
Too much data for a human
Do something smarter!
4Looking at traffic aggregates
Rank Destination IP Traffic
1 jeff.dorm.bigU.edu 11.9
2 tracy.dorm.bigU.edu 3.12
3 risc.cs.bigU.edu 2.83
- Aggregating on individual packet header fields
gives useful results but - Traffic reports are not always at the right
granularity (e.g. individual IP address, subnet,
etc.) - Cannot show aggregates defined over multiple
fields (e.g. which network uses which
application) - The traffic analysis tool should automatically
find aggregates over the right fields at the
right granularity
Which network uses web and which one kazaa?
Rank Source port Traffic
1 Web 42.1
2 Kazaa 6.7
3 Ssh 6.3
Rank Destination network Traffic
1 library.bigU.edu 27.5
2 cs.bigU.edu 18.1
3 dorm.bigU.edu 17.8
Where does the traffic come from?
What apps are used?
Most traffic goes to the dorms
5Ideal traffic report
Traffic aggregate Traffic
Web traffic 42.1
Web traffic to library.bigU.edu 26.7
Web traffic from www.schwarzenegger.com 13.4
ICMP traffic from sloppynet.badU.edu to jeff.dorm.bigU.edu 11.9
Web is the dominant application
This is a Denial of Service attack !!
The library is a heavy user of web
Thats a big flash crowd!
This paper is about giving the network
administrator insightful traffic reports
6Contributions of this paper
- Approach
- Definitions
- Algorithms
- System
- Experience
7Approach
- Characterize traffic mix by describing all
important traffic aggregates - Multidimensional aggregates (e.g. flash crowd
described by protocol, port number and IP
address) - Aggregates at the the right level of granularity
(e.g. computer, subnet, ISP) - Traffic analysis is automated finds insightful
data without human guidance
8Definition traffic clusters
- Traffic clusters are the multidimensional traffic
aggregates identified by our reports - A cluster is defined by a range for each field
- The ranges are from natural hierarchies (e.g. IP
prefix hierarchy) meaningful aggregates - Example
- Traffic aggregate incoming web traffic for CS
Dept. - Traffic cluster ( SrcIP, DestIP in
132.239.64.0/21, ProtoTCP, SrcPort80, DestPort
in 1024,65535 )
9Definition traffic report
- Traffic reports give the volume of chosen traffic
clusters - To keep report size manageable describe only
clusters above threshold (e.g. Htotal of
traffic/20) - To avoid redundant data compress by omitting
clusters whose traffic can be inferred (up to
error H) from non-overlapping more specific
clusters in the report - To highlight non-obvious aggregates prioritize by
using unexpectedness label - Example
- 50 of all traffic is web
- Prefix B receives 20 of all traffic
- The web traffic received by prefix B is 15
instead of 502010, unexpectedness label is
15/10150
10Contributions of this paper
- Approach
- Definitions
- Algorithms
- System
- Experience
11Algorithms and theory
- Algorithms and theoretical bounds in the paper
- Unidimensional reports are easy to compute
- Multidimensional reports are exponentially harder
as we add more fields - Next few slides
- Example of unidimensional compression
- Example for the structure of the multidimensional
cluster space
12Unidimensional report example
Hierarchy
Threshold100
10.0.0.12/30
10.0.0.14/31
40
35
15
35
30
160
110
75
10.0.0.2
10.0.0.3
10.0.0.4
10.0.0.5
10.0.0.8
10.0.0.9
10.0.0.10
10.0.0.14
13Unidimensional report example
Compression
Source IP Traffic
10.0.0.0/29 120
10.0.0.8/29 380
10.0.0.8 160
10.0.0.9 110
120
380
380-270100
10.0.0.0/29
10.0.0.8/29
305-270lt100
160
110
10.0.0.8
10.0.0.9
14Multidimensional structure ex.
Nodes (clusters) have multiple parents
Nodes (clusters) overlap
US
CA
Web
15Contributions of this paper
- Approach
- Definitions
- Algorithms
- System
- Experience
16System AutoFocus
Cluster miner
Web based GUI
Grapher
Traffic parser
Packet header trace
17(No Transcript)
18(No Transcript)
19(No Transcript)
20Contributions of this paper
- Approach
- Definitions
- Algorithms
- System
- Experience
21Structure of regular traffic mix
- Backups from CAIDA to tape server
- Semi-regular time pattern
- FTP from SLAC Stanford
- Scripps web traffic
- Web Squid servers
- Large ssh traffic
- Steady ICMP probing from CAIDA
SD-NAP
SD-NAP
22Analysis of unusual events
- UCSD to UCLA route change
- Sapphire/SQL Slammer worm
Site 2
23Conclusions
10101111010100001010111111010110010101011010110100
00101010100101010111101010101000101111010000010111
11110101100101011101011110010010101010001101111110
00101011101101011001010101101011110000101010111101
11010111010101010111111010110010101011010101111101
01000011010000101101010010101100100000010101100101
01010111110000100010000101010111101010000101110010
10101101011110000010101011111101011000101111010000
01011111010101101011110010010101011001010101000101
01001010101101010100101110010100000101000011101101
01010110111111000101011101011101011001010101101011
11000011011110111010111010101010111111010110010101
01101011110111010100001101010100101011010101110101
01001010000101011010101001010100000101010101010101
10101110101010000001010101010110101010101111010111
01010110101000110001010100101110101010011010101000
01000110101111010100010110
24Conclusions
- Multidimensional traffic clusters using natural
hierarchies describe traffic aggregates - Traffic reports using thresholding identify
automatically conspicuous resource consumption at
the right granularity - Compression produces compact traffic reports and
unexpectedness labels highlight non-obvious
aggregates - Our prototype system, AutoFocus, provides
insights into the structure of regular traffic
and unexpected events
25Thank you!
- Alpha version of AutoFocus downloadable from
- http//ial.ucsd.edu/AutoFocus/
- Any questions?
- Acknowledgements NIST, NSF, Vern Paxson, David
Moore, Liliana Estan, Jennifer Rexford, Alex
Snoeren, Geoff Voelker
26Bounds and running times
Report size Running time Memory usage
unc. 1dim. rep. 1(d-1)T/H O(nm(d-1)) O(m(d-1))
1dim. report T/H linear linear
1dim. ? report T1/HT2/H linear
unc. dim. rep. T/H ?di resultn O(mresult)
dim. rep. T/H ?di/max(di)
dim. ? report eresult
27Open questions
- Are there tighter bounds for the size of the
reports? - Are there algorithms that produce smaller
results? - Are there algorithms that compute traffic reports
more efficiently? In streaming fashion?
28Delta reports
- Why repeat the same traffic report if the traffic
doesnt change from one day to the other? - Delta reports describe the clusters that
increased or decreased by more than the threshold
from one interval to the other - On related traffic mixes delta reports much
smaller than traffic reports - Multidimensional compression very hard for delta
reports - We have only exponential algorithm for the
cluster delta
29Greedy compression algorithm
30Multidimensional report example
Thresholding
Compression
31System details
Part Language LoC Status
Backend C 5400 stable
GUI HTML, Javascript 1000 functional
Glue perl 350 evolving