Title: Experiences With Internet Traffic Measurement and Analysis
1Experiences With Internet Traffic Measurement
and Analysis
- Vern Paxson
- ICSI Center for Internet Research
- International Computer Science Institute
- and
- Lawrence Berkeley National Laboratory
- vern_at_icir.org
- March 5th, 2004
2Outline
- The 1990s How is the Internet used?
- Growth and diversity
- Fractal traffic, heavy tails
- End-to-end dynamics
- Difficulties with measurement analysis
- The 2000s How is the Internet abused?
- Prevalence of misuse
- Detecting attacks
- Worms
3The 1990s
- How is the Internet Used?
4 80 growth/year
Data courtesy of Rick Adams
5Internet Growth Exponential
- Growth of 80/year
- Sustained for at least ten years
- before the Web even existed.
- Internet is always changing. You do not have a
lot of time to understand it.
6(No Transcript)
7Characterizing Site Traffic
- Methodology passively record traffic in/out of a
site - Danzig et al (1992)
- 3 sites, 24 hrs, all packet headers
- Paxson (1994)
- TCP SYN/FIN/RST control packets
- Gives hosts, sizes, start time, duration,
application - Large filtering win ( 10-1001 packets, 1000s1
bytes) - 7 month-long traces at Lawrence Berkeley Natl.
Laboratory - 8 day-long traces from 6 other sites
8Findings from Site Studies
- Traffic mix (which protocols are used how many
connections/bytes they contribute) varies widely
from site to site. - Mix also varies at the same site over time.
- Most connections have much heavier traffic in one
direction than the other - Even interactive login sessions (201)
9Findings from Site Studies, cont
- Many random variables associated with connection
characteristics (sizes, durations) are best
described with log-normal distributions - But often these are not particularly good fits
- And often their parameters vary significantly
between datasets - The largest connections in bulk transfers are
very large - Tail behavior is unpredictable
- Many of these findings differ from assumptions
used in 1990s traffic modeling
10Theory vs. Measured Reality
- Scaling behavior in Internet Traffic
11Burstiness
- Long-established framework Poisson modeling
- Central idea network events (packet arrivals,
connection arrivals) are well-modeled as
independent - In simplest form, theres just a rate parameter,
? - It then follows that the time between calls
(events) is exponentially distributed, of calls
Poisson - Implications (if assumptions correct)
- Aggregated traffic will smooth out quickly
- Correlations are fleeting, bursts are limited
12Burstiness Theory vs. Measurement
- For Internet traffic, Poisson models have
fundamental problem they greatly underestimate
burstiness - Consider an arrival process Xk gives packets
arriving during kth interval of length T. - Take 1-hour trace of Internet traffic (1995)
- Generate (batch) Poisson arrivals with same mean
and variance
13(No Transcript)
14Previous Region
?10
15?100
16?600
17(No Transcript)
18Burstiness OverMany Time Scales
- Real traffic has strong, long-range correlations
- Power spectrum
- Flat for Poisson processes
- For measured traffic, diverges to ? as ? ? 0
- To build Poisson-based models that capture this
characteristic takes many parameters - But due to great variation in Internet traffic,
we are desperate for parsimonious models (few
parameters)
19Describing Traffic with Fractals
- Landmark 1993 paper by Leland et al proposed
capturing such characteristics (in Ethernet
traffic) using self-similarity, a form of
fractal-based modeling - Parameterized by mean, variance, and Hurst
parameter - Models predict burstiness on all time scales
- Queueing delays / drop probabilities much higher
than predicted by Poisson-based models
20(No Transcript)
21Heavy Tails
- Key prediction from fractal modeling
- One way fractal traffic can arise in aggregate is
if individual connections have activity periods
(durations, sizes) whose distribution has
infinite variance. - Infinite variance manifests in distributions
upper tail - Consider Pareto distribution, F(x) (x/a)-?
- If ? lt 2, then F(x) has infinite variance
- Can test for Pareto fit by plotting log F(x) vs.
log x - Straight line Pareto distribution, slope
estimates -?
22Web connection sizes(226,386 observations)
- 28,000 observations
- ? 1.3
- ? Infinite Variance
23Self-Similarity Heavy Tails, cont
- We find heavy-tailed sizes in many types of
network traffic. Just a few extreme connections
dominate the entire volume.
24(No Transcript)
25Self-Similarity Heavy Tails, cont
- We find heavy-tailed sizes in many types of
network traffic. Just a few extreme connections
dominate the entire volume. - Theorems then give us that this traffic
aggregates to self-similar behavior. - While self-similar models are parsimonious, they
are not (alas) simple. - You can have self-similar correlations for which
magnitude of variations is small ? still possible
to have a statistical multiplexing gain,
especially at very high aggregation - Smaller time scales behave quite differently.
- When very highly aggregated, they can appear
Poisson!
26End-to-End Internet Dynamics
27End-to-End Dynamics
- Ultimately what the user cares about is not
whats happening on a given link, but the
concatenation of behaviors along all of the hops
in an end-to-end path. - Measurement methodology deploy measurement
servers at numerous Internet sites, measure the
paths between them - Exhibits N2 scaling as sites grows, paths
between them grows rapidly.
28Measurement Infrastructure sites 1994-1995
End-to-End Dynamics Study
29Path in the Study N2 Scaling Effect
30End-to-End Routing Dynamics
- Analysis of 40,000 traceroute measurements
between 37 sites, 900 end-to-end paths. - Route prevalence
- most end-to-end paths through the Internet
dominated by a single route. - Route persistence
- 2/3s of routes remain unchanged for days/weeks
- 1/3 of routes change on time scales of seconds to
hours - Route symmetry
- More than half of all routes visited at least one
different city in each direction - Very important for tracking connection state
inside network!
31End-to-End Packet Dynamics
- Analysis of 20,000 TCP bulk transfers of 100 KB
between 36 sites - Each traced at both ends using tcpdump
- Benefits of using TCP
- Real-world traffic
- Can probe fine-grained time scales but using
congestion control - Drawbacks to using TCP
- Endpoint TCP behavior a major analysis headache
- TCPs loading of the transfer path also
complicates analysis
32End-to-End Packet Dynamics Unusual Behavior
- Out-of-order delivery
- Not uncommon. 0.6-2 of all packets.
- Strongly site-specific.
- Generally little impact on performance.
- Replicated packets
- Very rare, but does occur (e.g., 1 packet in, 22
out) - Corrupted packets (bad checksum)
- Overall, 1 in 5,000 (!)
- Stone/Partridge (2000) between 1 in 1,100 and 1
in 32,000 - Undetected between 1 in 16 million and 1 in 10
billion
33End-to-End Packet Dynamics Loss
- Half of all 100 KB transfers experienced no loss
- 2/3s of paths within U.S.
- The other half experienced significant loss
- Average 4-9, but with wide variation
- TCP loss is not well described as independent
- Losses dominated by a few long-lived outages
- (Keep in mind this is 1994-1995!)
- Subsequent studies
- Loss rates have gotten much better
- Loss episodes well described as independent
- Same holds for regions of stable delay,
throughput - Time scales of constancy ? minutes or more
34Issues / Difficulties for Analyzing Internet
Traffic
- Measurement, Simulation Analysis
35There is No Such Thing as Typical
- Heterogeneity in
- Traffic mix
- Range of network capabilities
- Bottleneck bandwidth (orders of magnitude)
- Round-trip time (orders of magnitude)
- Dynamic range of network conditions
- Congestion / degree of multiplexing / available
bandwidth - Proportion of traffic that is adaptive/rigid/attac
k - Immense size growth
- Rare events will occur
- New applications explode on the scene
36Doubling every 7-8 weeks for 2 years
37There is No Such Thing as Typical, cont
- New applications explode on the scene
- Not just the Web, but Mbone, Napster, KaZaA
etc., IM - Event robust statistics fail.
- E.g., median size of FTP data transfer at LBL
- Oct. 1992 4.5 KB (60,000 samples)
- Mar. 1993 2.1 KB
- Mar. 1998 10.9 KB
- Dec. 1998 5.6 KB
- Dec. 1999 10.9 KB
- Jun. 2000 62 KB
- Nov. 2000 10 KB
- Danger if you misassume that something is
typical, nothing tells you that you are wrong!
38The Search for Invariants
- In the face of such diversity, identifying things
that dont change has immense utility - Some Internet traffic invariants
- Daily and weekly patterns
- Self-similarity on time scales of 100s of msec
and above - Heavy tails
- both in activity periods and elsewhere, e.g.,
topology - Poisson user session arrivals
- Log-normal sizes (excluding tails)
- Keystrokes have a Pareto distribution
39The Danger of Mental Models
Exponential plus a constant offset
40Not exponential - Pareto! Heavy tail ? 1.0
41Versus the Power of Modeling toOpen Our Eyes
- Fowler Leland, 1991
- Traffic spikes (which cause actual losses) ride
on longer-term ripples, that in turn ride on
still longer-term swells
42(No Transcript)
43Versus the Power of Modeling toOpen Our Eyes
- Fowler Leland, 1991
- Traffic spikes (which cause actual losses) ride
on longer-term ripples, that in turn ride on
still longer-term swells - Lacked vocabulary that came from self-similar
modeling (1993) - Similarly, 1993 self-similarity paper
- We did so without first studying and modeling the
behavior of individual Ethernet users (sources) - Modeling led to suggestion to investigate heavy
tails
44Measurement Soundness
- How well-founded is a given Internet measurement?
- We can often use additional information to help
calibrate. - One source protocol structure
- E.g., was a packet dropped by the network or
by the measurement device? - For TCP, can check did receiver acknowledge it?
- If Yes, then dropped by measurement device
- If No, then dropped by network
- Can also calibrate using additional information
45Calibration Using Additional Information Packet
Timings
Routing change?
Clock adjustment
46Reproducibilty of Results(or lack thereof)
- It is rare, though sometimes occurs, that raw
measurements are made available to other
researchers for further analysis or for
confirmation. - It is more rare that analysis tools and scripts
are made available, particularly in a coherent
form that others can actually get to work. - It is even rarer that measurement glitches,
outliers, analysis fudge factors, etc., are
detailed. - In fact, often researchers cannot reproduce their
own results.
47Towards Reproducible Results
- Need to ensure a systematic approach to data
reduction and analysis - I.e., a paper trail for how analysis was
conducted, particularly when bugs are fixed - A methodology to do this
- Enforce discipline of using a single (master)
script that builds all analysis results from the
raw data - Maintain all intermediary/reduced forms of the
data as explicitly ephemeral - Maintain a notebook of what was done and to what
effect. - Use version control for scripts notebook.
- But also really need ways to visualize what's
changed in analysis results after a re-run.
48The 2000s
- How is the Internet Abused?
49(No Transcript)
50(No Transcript)
51Magnitude of Internet Attacks
- As seen at Lawrence Berkeley National Laboratory,
on a typical day in 2004 - gt 70 of Internet connections (20 million out of
28 million) reflect clear attacks. - 60 different remote hosts scan one of LBLs two
blocks of 65,536 address in its entirety - More than 10,000 remote hosts engage in scanning
activity - Much of this activity reflects worms
- Much of the rest reflects automated
scan-and-exploit tools
52How is the Internet Abused?
- Detecting Network Attacks
53Design Goals for the Bro Intrusion Detection
System
- Monitor traffic in a very high performance
environment - Real-time detection and response
- Separation of mechanism from policy
- Ready extensibility of both mechanism and policy
- Resistant to evasion
54How Bro Works
- Taps GigEther fiber link passively, sends up a
copy of all network traffic.
Network
55How Bro Works
Filtered Packet Stream
Tcpdump Filter
- Kernel filters down high-volume stream via
standard libpcap packet capture library.
libpcap
Packet Stream
Network
56How Bro Works
Event Stream
Event Control
- Event engine distills filtered stream into
high-level, policy-neutral events reflecting
underlying network activity - E.g., connection_attempt, http_reply,
user_logged_in
Event Engine
Filtered Packet Stream
Tcpdump Filter
libpcap
Packet Stream
Network
57How Bro Works
Real-time Notification Record To Disk
Policy Script
- Policy script processes event stream,
incorporates - Context from past events
- Sites particular policies
Policy Script Interpreter
Event Stream
Event Control
Event Engine
Filtered Packet Stream
Tcpdump Filter
libpcap
Packet Stream
Network
58How Bro Works
Real-time Notification Record To Disk
Policy Script
- Policy script processes event stream,
incorporates - Context from past events
- Sites particular policies
- and takes action
- Records to disk
- Generates alerts via syslog, paging
- Executes programs as a form of response
Policy Script Interpreter
Event Stream
Event Control
Event Engine
Filtered Packet Stream
Tcpdump Filter
libpcap
Packet Stream
Network
59Experiences with Bro
- Exciting research because used operationally
(24x7) at several open sites (LBL, UCB, TUM) - Key enabler sites threat model
- Occasional break-ins are tolerable
- Jewels are additionally protected (e.g.,
firewalls) - Significant real-world concern policy management
- Dynamic blocking critical to success
- Currently, 100-200 blocks/day
60The Problem of Evasion
- Fundamental problem passively measuring traffic
on a link Network traffic is inherently
ambiguous - Generally not a significant issue for traffic
characterization - But is in the presence of an adversary Attackers
can craft traffic to confuse/fool monitor
61Evading Detection ViaAmbiguous TCP Retransmission
62The Problem of Crud
- There are many such ambiguities attackers can
leverage. - Unfortunately, they occur in benign traffic, too
- Legitimate tiny fragments, overlapping fragments
- Receivers that acknowledge data they did not
receive - Senders that retransmit different data than
originally - In a diverse traffic stream, you will see these
- Approaches for defending against evasion
- Traffic normalizers that actively remove
ambiguities - Mapping of local hosts to determine their
behaviors - Active participation by local hosts in intrusion
detection
63How is the Internet Abused?
- The Threat of Internet Worms
64What is a Worm?
- Self-replicating/self-propagating code.
- Spreads across a network by exploiting flaws in
open services. - As opposed to viruses, which require user action
to quicken/spread. - Not new --- Morris Worm, Nov. 1988
- 6-10 of all Internet hosts infected
- Many more since, but none on that scale .
- until .
65Code Red
- Initial version released July 13, 2001.
- Exploited known bug in Microsoft IIS Web servers.
- 1st through 20th of each month spread.20th
through end of each month attack. - Payload web site defacement.
- Spread via random scanning of 32-bitIP address
space. - But failure to seed random number generator ?
linear growth.
66Code Red, cont
- Revision released July 19, 2001.
- Payload flooding attack on
www.whitehouse.gov. - Bug lead to it dying for date 20th of the
month. - But this time random number generator correctly
seeded. Bingo!
67(No Transcript)
68Network Telescopes
- Idea monitor a cross-section of the IP address
space to measure network traffic involving random
addresses (flooding backscatter worm scanning) - LBLs cross-section 1/32,768 of Internet.
- UCSDs cross-section 1/256.
69Spread of Code Red
- Network telescopes give lower bound on infected
hosts 360K. - Course of infection fits classic logistic.
- Note larger the vulnerable population, faster
the worm spreads. - That night (? 20th), worm dies
- except for hosts with inaccurate clocks!
- It just takes one of these to restart the worm on
August 1st
70(No Transcript)
71Striving for Greater Virulence Code Red 2
- Released August 4, 2001.
- Comment in code Code Red 2.
- But in fact completely different code base.
- Payload a root backdoor, resilient to reboots.
- Bug crashes NT, only works on Windows 2000.
- Localized scanning prefers nearby addresses.
- Kills Code Red I.
- Safety valve programmed to die Oct 1, 2001.
72Striving for Greater Virulence Nimda
- Released September 18, 2001.
- Multi-mode spreading
- attack IIS servers via infected clients
- email itself to address book as a virus
- copy itself across open network shares
- modifying Web pages on infected servers w/ client
exploit - scanning for Code Red II backdoors (!)
- worms form an ecosystem!
- Leaped across firewalls.
73(No Transcript)
74(No Transcript)
75Life Just Before Slammer
76Life Just After Slammer
77A Lesson in Economy
- Slammer exploits a connectionless UDP service,
rather than connection-oriented TCP. - Entire worm fits in a single packet!
- When scanning, worm can fire and forget.
- Worm infects 75,000 hosts in 10 minutes (despite
broken random number generator). - Progress limited by the Internets carrying
capacity!
78The Usual Logistic Growth
79Slammers Bandwidth-Limited Growth
80(No Transcript)
81Blaster
- Released August 11, 2003.
- Exploits flaw in RPC service ubiquitous across
Windows. - Payload attack Microsoft Windows Update.
- Despite flawed scanning and secondary infection
strategy, rapidly propagates to 100Ks of hosts. - Actually, bulk of infections are really Nachia, a
Blaster counter-worm. - Key paradigm shift firewalls dont help.
82What if Spreading WereWell-Designed?
- Observation (Weaver) Much of a worms scanning
is redundant. - Idea coordinated scanning
- Construct permutation of address space
- Each new worm starts at a random point
- Worm instance that encounters another instance
re-randomizes. - Greatly accelerates worm in later stages.
83What if Spreading WereWell-Designed?, cont
- Observation (Weaver) Accelerate initial phase
using a precomputed hit-list of say 1 vulnerable
hosts. - At 100 scans/worm/sec, can infect huge
population in a few minutes. - Observation (Staniford) Compute hit-list of
entire vulnerable population, propagate via
divide conquer. - At 10 scans/worm/sec, infect in 10s of sec!
84Defenses
- Detect via honeyfarms collections of honeypots
fed by a network telescope. - Any outbound connection from honeyfarm worm.
- Distill signature from inbound/outbound traffic.
- If telescope covers N addresses, expect detection
when worm has infected 1/N of population. - Thwart via scan suppressors network elements
that block traffic from hosts that make failed
connection attempts to too many other hosts.
85Defenses?
- Observation worms dont need to randomly
scan - Meta-server worm ask server for hosts to infect.
E.g., query Google for index.html. - Topological worm fuel spread with local
information from infected hosts (web server logs,
email address books, config files, SSH known
hosts) - No scanning signature with rich inter-
connection topology, potentially very fast.
86Defenses??
- Contagion worm propagate parasitically along
with normally initiated communication. - E.g., using 2 exploits - Web browser Web server
- infect any vulnerable servers visited by
browser, then any vulnerable browsers that come
to those servers. - E.g., using 1 KaZaA exploit, glide along immense
peer-to-peer network in days/hours. - No unusual connection activity at all! -(
87Some Observations
- Todays worms have significant real-world impact
- Code Red disrupted routing
- Slammer disrupted elections, ATMs, airline
schedules, operations at an off-line nuclear
power plant - Blaster possibly contributed to North American
Blackout of Aug. 2003 - But todays worms are amateurish
- Frequent bugs, algorithm/attack botches
- Unimaginative payloads
88Next-Generation Worm Authors
- Potential for major damage with more nasty
payloads -(. - Military (cyberwarfare)
- Criminals
- Denial-of-service, spamming for hire
- Access for Sale A New Class of Worm
(Schecter/Smith, ACM CCS WORM 2003) - Money on the table ? Arms race
89Summary
- Internet measurement is deeply challenging
- Immense diversity
- Internet never ceases to be a moving target
- Our mental models can betray us the Internet is
full of surprises! - Seek invariants
- Many of the last decades measurement questions
-- What are the basic characteristics and
properties of Internet traffic? -- have returned
- but now regarding Internet attacks
- What on Earth will the next decade hold??