Title: Analyzing Peer-to-Peer Traffic Across Large Networks
1Analyzing Peer-to-Peer Traffic Across Large
Networks
- Jia Wang
- Joint work with Subhabrata Sen
- ATT Labs - Research
2P2P applications
- Distributed file sharing
- Napster, Gnutella, FastTrack, EDonkey,
DirectConnect - Searching v.s. data fetching phases
- All the communications occur over default ports
- SuperNodes and Hubs
- Why is this interesting?
- Large and growing traffic volume
3Outline
- Methodology
- Data collection
- Characterization metrics
- Analysis results
- Traffic volume and overlay topology
- System dynamics
- Traffic characterization
- P2P vs Web
4Methodology
- Challenges
- Decentralized system
- Transient peer membership
- Some popular close proprietary protocols
- Large-scale passive measurement
- Flow-level data from routers across a large
tier-1 ISP backbone - Analyze both signaling and data fetching traffic
- 3 levels of granularity IP, Prefix, AS
- P2P protocols
- FastTrack1214 (including Morpheus)
- Gnutella6346/6347
- DirectConnect411/412
5Methodology Discussion
- Advantages
- Requires minimal knowledge of P2P protocols
port number - Large scale non-intrusive measurement
- More complete view of P2P traffic
- Allows localized analysis
- Limitations
- Flow-level data no application-level details
- Incomplete traffic flows
- Other issues
- DHCP, NAT, proxy
- Host ? IP
- Asymmetric IP routing
6Measurements
- Characterization
- Overlay network topology
- Traffic distribution
- Dynamic behavior
- Metrics
- Host distribution
- Host connectivity
- Traffic volume
- Mean bandwidth usage
- Traffic pattern over time
- Connection duration and on-time
7Data cleaning
- Invalid IPs
- 10.0.0.0-10.255.255.255
- 172.16.0.0-172.31.255.255.255
- 192.168.0.0-192.168.255.255
- No matched prefixes in routing tables
- Invalid AS numbers
- gt 64512
- Removed 4 flows
8Overview of P2P traffic
- Total 800 million flow records
- FastTrack is the most popular one
9Host distribution
10Host connectivity
FastTrack (9/14/2001)
Connectivity is very small for most hosts, very
high for few hosts Distribution is less skewed
at prefix and AS levels
11Traffic volume distribution
FastTrack (9/14/2001)
- Significant skews in traffic volume across
granularities - Few entities source most of the traffic
- Few entities receive most of the traffic
12Mean bandwidth usage
FastTrack (9/14/2001)
- Upstream usage lt downstream usage. Possible
causes are - Asymmetric available BW, e.g., DSL, cable
- Users/ISPs rate-limiting upstream data
transfers
13Time of day effect
FastTrack (9/14/2001 GMT)
- Traffic volume exhibits very strong time-of-day
effect - Milder time-of-day variation for hosts in the
system
14Host connection duration on-time
FastTrack (9/14/2001) thd30min
- Substantial transience most hosts stay in the
system for a short time - Distribution less skewed at the prefix and AS
levels - Using per-cluster or per-AS indexing/caching
nodes may help
15Traffic characterization
- The power law
- May not be a suitable model for P2P traffic
- Relationship between metrics
- Traffic volume
- Number of IPs
- On-time
- Mean bandwidth usage
16Traffic volume vs. on-time
FastTrack (9/14/2001) top 1 hosts (73 volume)
1
2
- Volume heavy hitters tend to have long on-times
- Hosts with short on-times contribute small
traffic volumes
17Connectivity vs. on-time
FastTrack (9/14/2001) top 1 hosts (73 volume)
1
2
- Hosts with high connectivity have long on-times
- Hosts with short on-times communicate with few
other hosts
18P2P vs Web
- Observations
- 97 of prefixes contributing P2P traffic also
contribute Web traffic - Heavy hitter prefixes for P2P traffic tend to be
heavy hitters for Web traffic - Prefix stability the daily traffic volume (in
) from the prefix does not change over days - Experiments 0.01, 0.1, 1, 10 heavy hitters
gt 10, 30, 50, 90 of the traffic volume
19Traffic stability
March 2002
Top 0.01 prefixes
Top 1 prefixes
P2P traffic contributed by the top heavy hitter
prefixes is more stable than either Web or total
traffic
20Summary
- Measure and characterize P2P traffic across a
large network - Three popular P2P systems
- Significant increase in both number of users and
traffic volume - Traffic distributions are highly skewed
- High level system dynamics
- P2P is significant, but stable component of the
Internet traffic
21Acknowledgement
- ATT Labs
- Matt Grossglauser, Carsten Lund, Jennifer
Rexford, Matt Roughan, Fred True - External
- Steve Gribble