YouTube Traffic Characterization: A View From the Edge - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

YouTube Traffic Characterization: A View From the Edge

Description:

Most popular videos. 10. Campus Data Collection. Goals: Collect data ... Retrieve pages of most popular videos. Use YouTube API to get details on these videos ... – PowerPoint PPT presentation

Number of Views:142
Avg rating:3.0/5.0
Slides: 38
Provided by: scie216
Category:

less

Transcript and Presenter's Notes

Title: YouTube Traffic Characterization: A View From the Edge


1
YouTube Traffic Characterization A View From
the Edge
  • Phillipa Gill¹, Martin Arlitt²¹,
  • Zongpeng Li¹, Anirban Mahanti³
  • ¹Dept. of Computer Science, University of
    Calgary, Canada
  • ²Enterprise Systems Software Lab, HP Labs, USA
  • ³Dept. of Computer Science and Engineering, IIT
    Delhi, India

2
Introduction
  • The way people use the Web is changing.
  • Creation and sharing of media
  • Fast, easy, cheap!
  • Volume of data associated with extremely popular
    online media.

3
What is Web 2.0?
  • User generated content
  • Text Wordpress, Blogspot
  • Photos Flickr, Facebook
  • Video YouTube, MySpace
  • Social Networking
  • Facebook, MySpace
  • Tagging
  • Flickr, YouTube

4
YouTube Facts and Figures
  • Founded in February 2005
  • Enabled users to easily share movies by
    converting them to Flash
  • Largest video sharing Website on the Internet
    Alexa2007
  • Sold to Google for 1.65 billion in November 2006

5
How YouTube Works (1/2)
GET /watch?vwQVEPFzkhaM
OK (text/html)
GET /vi/fNaYQ4kM4FE/2.jpg
OK (img/jpeg)
6
How YouTube Works (2/2)
GET swfobject.js
OK (application/x-javascript)
GET /p.swf
OK (application/shockwave-flash)
GET /get_video?video_idwQVEPFzkhaM
OK (video/flv)
7
Our Contributions
  • Efficient measurement framework
  • One of the first extensive characterizations of
    Web 2.0 traffic
  • File properties
  • File access patterns
  • Transfer properties
  • Implications for network and content providers

8
Outline
  • Introduction Background
  • Contributions
  • Methodology
  • Results
  • Implications
  • Conclusions

9
Our View Points
  • Edge (University Campus)
  • 28,000 students
  • 5,300 faculty staff
  • /16 address space
  • 300Mb/s full-duplex network link
  • Global
  • Most popular videos

10
Campus Data Collection
  • Goals
  • Collect data on all campus YouTube usage
  • Gather data for an extended period of time
  • Protect user privacy
  • Challenges
  • YouTubes popularity
  • Monitor limitations
  • Volume of campus Internet usage

11
Our Methodology
  • Identify servers providing YouTube content
  • Use bro to summarize each HTTP transaction in
    real time
  • Restart bro daily and compress the daily log
  • Map visitor identifier to a unique ID

12
Categories of Transactions
  • Complete the entire transaction was parsed
    successfully
  • Interrupted TCP connection was reset
  • Gap monitor missed a packet
  • Failure transaction could not be parsed

13
Categories of Transactions (2)
14
Our Traces
15
HTTP Response Codes
16
Global Data Collection
  • Crawling all videos is infeasible
  • Focus on top 100 most popular videos
  • Four time frames daily, weekly, monthly and all
    time.
  • 2 step data collection
  • Retrieve pages of most popular videos
  • Use YouTube API to get details on these videos

17
Outline
  • Introduction Background
  • Contributions
  • Methodology
  • Results
  • Implications
  • Conclusions

18
Results
  • Campus Usage Patterns
  • File Properties
  • File Access Patterns
  • Transfer Properties

19
Campus Usage Patterns
Reading Break
20
Results
  • Campus Usage Patterns
  • File Properties
  • File Access Patterns
  • Transfer Properties

21
Unique File Sizes
  • Video data is significantly larger than the other
    content types

22
Time Since Modification
  • Videos and images rarely modified
  • Text and application data modified more frequently

23
Video Durations
  • Spike around 3 minutes likely music videos
  • Campus videos are relatively short µ3.3 min

24
Summary of File Properties
  • Video content is much larger than other content
    types
  • Image and video content is more static than
    application and text content
  • Video durations are relatively short
  • Videos viewed on campus tend to be more than 1
    month old

25
Results
  • Campus Usage Patterns
  • File Properties
  • File Access Patterns
  • Transfer Properties

26
Relative Popularity of Videos
  • Video popularity follows a weak Zipf
    distribution
  • Possibly due to edge network point of view

ß 0.56
27
Commonality of Videos
  • 10 commonality between consecutive days during
    the week
  • 5 commonality between consecutive days on the
    weekend

28
Summary of File Referencing
  • Zipf distribution is weak when observed from the
    edge of the network
  • There is some overlap between videos viewed on
    consecutive days
  • Significant amount of content viewed on campus is
    non-unique

29
Results
  • Campus Usage Patterns
  • File Properties
  • File Access Patterns
  • Transfer Properties

30
Transfer Sizes
Flash player (p.swf, player2.swf)
Javascripts
31
Transfer Durations
  • Video transfers have significantly longer
    durations than other content types

32
Summary of Transfer Properties
  • Javascript and flash objects have an impact on
    the size of files transferred
  • Video transfers have significantly larger sizes
    and durations

33
Outline
  • Introduction Background
  • Contributions
  • Methodology
  • Results
  • Implications
  • Conclusions

34
Implications for Network Providers
  • Web 2.0 poses challenges to caching
  • Larger multimedia files
  • More diversity in content
  • Meta data may be used to improve caching
    efficiency

35
Implications for Content Providers
  • Multimedia content is large!
  • 65,000 videos/day x 10MB/video 19.5 TB/month
  • Long tail effect - much of the content will be
    unpopular
  • Cheap storage solutions
  • Longer transfer durations for video files
  • more CPU cycles required for transfers

36
Conclusions
  • Multimedia content has much larger transfer sizes
    and durations than other content types
  • From the edge of the network, video popularity
    follows a weak Zipf distribution
  • Web 2.0 facilitates diversity in content which
    poses challenges to caching
  • New approaches are needed to efficiently handle
    the resource demands of Web 2.0 sites

37
Questions?
  • Contact
  • psessini_at_ucalgary.ca

38
Ignore the slides after this one
39
Download to Bitrate-Ratio
40
Time of Day and Day of Week Traffic Patterns
41
Video Ages
  • 73 of campus videos are older than 1 month
  • 5 of campus videos are older than 1 year

42
Absolute Growth in Working Set
  • Half the video content transferred is non-unique

43
What is different about Web 2.0?
Web 1.0
44
What is different about Web 2.0?
Web 2.0
Write a Comment
User Comments (0)
About PowerShow.com