Title: How do people use the Internet
1How do people use the Internet?
- CS 7270
- Internet Applications Services
- Lecture-1
2Reading
- The Broadband Fact Book, by the Internet
Innovation Alliance - Not a research paper, but it includes some
interesting statistics about application usage
trends - Most interesting part is pages 16-19.
3Application preferences change over time
4iTunes explodes
5Video explodes
6Online gaming explodes
7Some interesting statistics
- 46 of Internet users watch an online video once
a week (as of Sept06) - 8 of Internet users downloaded a movie during
the 3Q06 using P2P apps - 60 adult content, 20 TV content, rest is
movies, clips, etc - YouTube stats (March06)
- 50 users are younger than 20 years old
- 60 all videos watched online
- 65,000 new videos uploaded daily
- Total viewing time about 10,000 years!
- YouTube consumed as much bandwidth in 2006 as the
whole Internet did in 2000
8How do people use the Web?
- Almost all users do the basics (email, Web
browsing) - 50 of users pay bills online
- 25 online job hunting
- 8 upload videos
- 5 publish blogs
- 4 date online
9(No Transcript)
10traffic classification- application
identification
- CS 7270
- Internet Applications Services
- Lecture-2
11Background
- What does traffic classification mean?
- What does application identification mean?
- Packet monitors
- Which are the important packet header fields?
- Flow monitors (e.g., Ciscos NetFlow)
- Definition of a flow?
12Background
- Who is interested in traffic classification and
why? - Performance metrics in traffic classification?
- Accuracy fraction of correctly classified flows
(or bytes) in the trace - Precision fraction of flows (or bytes)
classified as application X that are truly of
that application X - Recall fraction of flows (or bytes) of
application X that were correctly classified as
application X - F-measure a weighted harmonic mean of precision
and recall - Running time growing need for real-time
classification
13Existing approaches
- Port-based
- Largely ineffective today
- Flow-based signatures/patterns
- Look for certain packet sizes, packet
interarrivals, flow sizes - Supervised machine-learning techniques
- Requires accurate classification of some flows
(training set) - Cluster flows based on group of discriminants?
- Payload-based techniques
- Look for certain strings or byte sequences in
layer-4 (or higher) headers - What does deep packet inspection mean?
14How would you do traffic classification?
- A good project topic?
- Some things to consider as you decide on a
project topic - What is the most important related work?
- See Keshavs paper
- Read at least 3-4 papers on a topic before you
decide to work on it - What is the key new idea that I want to explore?
- For example, can I identify individual p2p
applications if I have access to the payload of
the first packet in a flow (after connection
establishment)? - Which are the available tools I can use?
- Tcpdump or ethereal packet monitors at my laptop
- Install clients of p2p applications at my laptop
- Do we have appropriate datasets?
- OIT may be able to provide us with anonymized
packet traces or netflow records from GAtechs
edge routers - You can collect packet traces from your own
laptop for validation purposes - What is the set of questions I want to answer?
How will I do so? - Asking the right questions is 50 of the
research! - Describe your methodology in detail
- E.g, I will examine hypothesis X if I can accept
it, I will move on to hypothesis Y (given X)
otherwise, if I reject X, I will move to
hypothesis Z
15Reading-2
- Is P2P dying or just hiding, by T.Karagiannis
et al - Abstract
Recent reports in the popular media suggest a
significant decrease in peer-to-peer (P2P)
file-sharing traffic, attributed to the publics
response to legal threats. Have we reached the
end of the P2P revolution? In pursuit of
legitimate data to verify this hypothesis, we
embark on a more accurate measurement effort of
P2P traffic at the link level. In contrast to
previous efforts we introduce two novel elements
in our methodology. First, we measure traffic of
all known popular P2P protocols. Second, we go
beyond the known port limitation by reverse
engineering the protocols and identifying
characteristic strings in the payload. We find
that, if measured accurately, P2P traffic has
never declined indeed we have never seen the
proportion of p2p traffic decrease over time (any
change is an increase) in any of our data sources
16Methodology
- They analyzed packet traces (first 44 bytes of IP
packet - only 4B for payload) - Search for characteristic strings in payload
- They present four heuristics (M1-M4), with
increasing p2p estimation aggressiveness - (btw, this could have been a nice course project
for CS7270)
17P2P did not decrease in 03-04(despite the
lawsuits by RIAA that took place during that
period)
18FastTrack decrease (mostly Kazaa), BitTorrent
increase by 100
19Reading-3
- Internet Traffic Classification Demystified
Myths, Caveats,and the Best Practices, by H.Kim
et al. - Published at Conext08
- Comparison of major existing methods
- Link to slides