A Comparative Analysis of Web and P2P Traffic - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

A Comparative Analysis of Web and P2P Traffic

Description:

A Comparative Analysis of Web and P2P Traffic Naimul Basher (University of Calgary) Aniket Mahanti (University of Calgary) Anirban Mahanti (IIT, Delhi) – PowerPoint PPT presentation

Number of Views:151
Avg rating:3.0/5.0
Slides: 26
Provided by: ELIS90
Category:

less

Transcript and Presenter's Notes

Title: A Comparative Analysis of Web and P2P Traffic


1
A Comparative Analysis of Web and P2P Traffic
  • Naimul Basher (University of Calgary)
  • Aniket Mahanti (University of Calgary)
  • Anirban Mahanti (IIT, Delhi)
  • Carey Williamson (University of Calgary)
  • Martin Arlitt (U. Calgary and HP Labs)
  • WWW 2008, Beijing

2
Introduction
  • In the recent past, a significant proportion of
    Internet traffic volume was from Web applications
    using HTTP.
  • Web traffic is typically characterized by
    small-sized flows, short-lived connections,
    asymmetric flow volumes, and well-defined TCP
    port usage (e.g., 80, 8080, 443).
  • The advent of Peer-to-Peer (P2P) file sharing
    applications in the past decade has triggered a
    major paradigm shift in Internet data exchange.
  • P2P usage has grown steadily since its inception,
    and recent empirical studies report that Web and
    P2P together dominate todays Internet traffic.

WWW 2008, Beijing
2
3
Web and P2P Characterization
  • Question How are they similar/different?
  • We use recent packet traces collected at a large
    university (30,000 students and employees) to
    characterize and compare traffic generated by
    current Web and P2P applications.
  • We also analyze and compare two P2P applications,
    BitTorrent and Gnutella.
  • We primarily focus on characterizing these
    applications at the flow-level and host-level.
  • Our work develops flow-level distributional
    models that may be used to refine Internet
    traffic models for use in network simulations
    and emulation experiments.

WWW 2008, Beijing
3
4
Preview of Results
WWW 2008, Beijing
4
5
Trace Collection Methodology
  • Full packet traces were collected using lindump
    from the 100 Mbps full duplex commercial Internet
    connection of the University of Calgary.
  • Since P2P applications frequently use random
    ports, we used payload signatures to identify
    applications.
  • We used bro, a network intrusion detection system
    (IDS), to perform payload signature matching and
    map network flows to traffic types.
  • We used non-contiguous 1-hour traces collected
    each morning and evening on Thursday through
    Sunday between April 6 and April 30, 2006.

WWW 2008, Beijing
5
6
Trace Summary
WWW 2008, Beijing
6
7
Characterization Metrics
  • Flow-level characterization metrics
  • Flow size total bytes transferred during a
    connection. Mice transfer lt 10 KB. Elephants
    transfer gt 5 MB. (Others are called Buffalo)
  • Flow duration the time between the start and
    the end of a TCP flow (e.g., SYN and FIN).
  • Flow inter-arrival time (IAT) the time between
    two consecutive flow arrivals.
  • Host-level characterization metrics
  • Flow concurrency the maximum number of TCP
    flows a single host uses concurrently to transfer
    content to/from one or more hosts.
  • Transfer volume the total bytes transferred to
    (downstream) and from (upstream) a host.
  • Geographic distribution the distribution of the
    distance between hosts and U of C along the
    surface of the Earth.

WWW 2008, Beijing
7
8
Flow Sizes Web and P2P
P2P model Hybrid Pareto and Weibull Web model
Hybrid Pareto and Weibull
WWW 2008, Beijing
  • P2P applications generate many small-sized flows
    and many very large-sized flows (many more than
    Web applications generate).
  • Small-sized P2P flows arise from signaling,
    aborted transfers, and conn attempts to
    unresponsive peers.
  • We also find some very large P2P flows, which are
    much larger than the large Web transfers.

8
9
Flow Sizes Gnutella and BitTorrent
BitTorrent model Hybrid Lognormal and Pareto
Gnutella model Hybrid Lognormal and Pareto
WWW 2008, Beijing
  • Gnutella and BitTorrent generate similar
    percentages of small-sized flows (e.g., control
    info exchanged between peers).
  • Gnutella generates more large-sized flows than
    BitTorrent.
  • Gnutella usually downloads entire object from a
    single peer.
  • BitTorrent uses file segmentation to split an
    object into multiple equal-sized pieces (e.g.,
    256 KB), and downloads the pieces using parallel
    flows and/or persistent connections.

9
10
Mice and Elephant Phenomenon
Application Mice Flows Mice Bytes Elephant Flows Elephant Bytes
Web 76 9 0.04 15
P2P 93 0.5 1 93
Gnutella 83 0.1 3 93
BitTorrent 95 2 0.1 95
  • Web mice flows account for a relatively higher
    proportion of total Web bytes than P2P mice flows
    do for total P2P bytes.
  • P2P elephant flows are larger than Web elephant
    flows.
  • BitTorrent mice flows, on average, are larger
    than Gnutella mice flows because of BitTorrents
    signaling activities.
  • BitTorrent elephant flows, on average, are larger
    than Gnutella elephant flows.
  • Gnutella users share mostly audio files, while
    BitTorrent users share more video files.
    CacheLogic P2P Study 2005

WWW 2008, Beijing
10
11
Flow Durations Web and P2P
P2P model Hybrid Weibull and Pareto Web
model Two-mode Pareto
WWW 2008, Beijing
  • Approx. 70 of Web durations are lt 1 sec
    indicating low response times for Web requests
    (i.e., good Internet connectivity on campus).
  • Approx. 30 of P2P flows are shorter than 30 sec.
    These often are signaling
    flows, or failed/aborted flows.
  • Some P2P mice flows have long durations due to
    repeated unsuccessful connection attempts.
  • Approx. 40 of P2P flow durations are between 20
    and 200 sec. These reflect bandwidth-limited
    connections.

11
12
Flow Durations Gnutella and BitTorrent
BitTorrent model Hybrid Lognormal and
Pareto Gnutella model Hybrid Lognormal and
Pareto
WWW 2008, Beijing
  • BitTorrent flows typically last longer than
    Gnutella flows.
  • Longer BitTorrent flows resulted due to its
    protocol architecture concurrent flows, fixed
    number of uploads/downloads permitted, persistent
    connections.
  • Gnutella can use a single flow for downloading an
    object (no need to share bandwidth with
    concurrent flows).

12
13
Flow Concurrency Web and P2P
  • Many P2P hosts in our network maintain only a
    single TCP connection (a surprising result).
  • A significant proportion of internal Web hosts
    maintain more than one concurrent TCP connection.
  • Web browsers often initiate multiple concurrent
    connections to transfer content in parallel.
  • High degree of Web flow concurrency (gt 30) is due
    to Web proxies, browser accelerators, and content
    distribution nodes.

WWW 2008, Beijing
13
14
Distinct IP Addresses for Concurrent Flows
Web
P2P
WWW 2008, Beijing
  • Web tends to have multiple concurrent flows to
    same host.
  • P2P hosts use concurrent flows to connect to many
    hosts.
  • P2P protocols encourage connectivity with
    multiple hosts to facilitate widespread sharing
    of data.

14
15
Flow Concurrency Gnutella and BT
WWW 2008, Beijing
  • Most Gnutella hosts connect with only one host at
    a time.
  • We observed a few Gnutella hosts with gt 10
    concurrent TCP connections. These hosts acted as
    super-peers in Gnutellas peer hierarchy.
  • Most BitTorrent hosts exhibit a high degree of
    flow concurrency, which is a design feature of
    BitTorrent.

15
16
Transfer Symmetry P2P Applications
System Freeloader Fair-share Benefactor
Gnutella 57 10 33
BitTorrent 10 40 50
  • Transfer symmetry is a major concern for P2P
    system developers, who want to encourage fair
    sharing among participating peers.
  • We observe pronounced freeloading in Gnutella,
    and greater fairness in BitTorrent.
  • Gnutella host behavior appears to be dominated by
    extreme upstream and downstream transfers.
  • BitTorrents tit-for-tat mechanism encourages
    uploading for the opportunity to download.

WWW 2008, Beijing
16
17
Heavy Hitters Web and P2P
WWW 2008, Beijing
  • Heavy hitters are the few hosts that account for
    much of the traffic volume transferred.
  • Heavy hitters are present in both Web and P2P.
  • Top-ranked P2P hosts transfer an order of
    magnitude more data than top-ranked Web hosts.
  • Most P2P heavy hitters are either freeloaders or
    benefactors.
  • The total amount of data transferred by the top
    10 of Web and P2P hosts follows a power-law
    distribution.

17
18
Geographic Distribution Web and P2P
WWW 2008, Beijing
  • Approx. 75 of external Web hosts are in North
    America. Europe and Asia account for another 10
    each.
  • A majority of our Web campus users are English
    speaking, and thus are likely to visit Web sites
    located in predominantly English-speaking
    countries.
  • Approx. 40 of P2P hosts are located within North
    America.
  • This indicates that connectivity between P2P
    hosts does not strongly rely on host locality,
    rather it depends on resource availability during
    connection establish phase.

18
19
Geographic Distribution Gnutella and BT
  • Approx. 70 of Gnutella hosts are located in
    North America.
  • This suggest either Gnutella peers prefer to
    connect with hosts that are in close proximity or
    that Gnutella clients are widely used in North
    America for file sharing.
  • Approx. 30 BitTorrent hosts are located in North
    America and approx. 40 are located in Europe.
  • We believe that the list of trackers is created
    based on host bandwidth availability in a swarm,
    and we see a bias towards regions with high
    broadband penetration.

WWW 2008, Beijing
19
20
Effect of Network Traffic Management
  • At the University of Calgary, traffic is managed
    using a commercial packet shaping device.
  • At the time of capture the network policy was to
    group together all identified P2P flows and
    collectively limit their bandwidth to 56 Kbps.
  • We do not observe a strong positive correlation
    between flow size and duration.
  • Some P2P flows are indeed identified and limited
    by the traffic shaper, however, we do see many
    other P2P flows that escaped detection by the
    traffic shaper.
  • Our results provide a snapshot of Web and P2P
    characteristics from a large edge network, and
    should be representative of other edge networks
    with similar user population and network
    management policies.

WWW 2008, Beijing
20
21
Summary of Results
WWW 2008, Beijing
21
22
Conclusions and Future Work
  • Our work presented an extensive characterization
    study of Web and P2P traffic using full packet
    traces collected at a large edge network (U of C
    campus).
  • We observed a number of contrasting features
    between Web and P2P traffic using flow-level and
    host-level metrics.
  • Flow-level distributional models were developed
    for Web and P2P traffic. These can be used in
    network simulation and emulation experiments.
  • Traffic from other networks should be studied to
    facilitate development of general models for Web
    and P2P traffic.
  • Impact of other non-Web applications, such as P2P
    VoIP and IPTV, can be studied as well.

WWW 2008, Beijing
22
23
FLOW MODELS
WWW 2008, Beijing
23
24
Inter-Arrival Times Web and P2P
P2P model Hybrid Weibull and Pareto Web model
Two-mode Weibull
  • Web flow IAT are much shorter than those of P2P
    flows.
  • Web traffic has a higher arrival rate (80
    flows/sec) compared to P2P traffic (6 flows/sec).
  • Another factor contributing to the lower arrival
    rate and the longer IAT values for P2P flows is
    the persistent nature of their TCP connections.

WWW 2008, Beijing
24
25
Transfer Volume Web and P2P
  • Approx. 50 of Web and P2P hosts transfer small
    amounts of data (lt 1 MB) and are typically active
    for lt 100 sec.
  • P2P hosts that repeatedly yet unsuccessfully
    attempt connecting to peers.
  • Web hosts that browse the Web, widgets that
    retrieve information from the Web periodically,
    and downloading small files.
  • Approx. 35 of Web and 15 of P2P hosts transfer
    data lt 10 MB and are active for lt 1000 sec.
  • P2P hosts that share small objects.
  • Web hosts that browse the Web for prolonged
    periods, downloading software/multimedia, and
    HTTP-based streaming.

WWW 2008, Beijing
25
Write a Comment
User Comments (0)
About PowerShow.com