Title: Understanding User Behavior in Large Scale VideoonDemand Systems
1Understanding User Behavior in Large Scale
Video-on-Demand Systems
- Hongliang Yu, Dongdong Zheng, Ben Y. Zhao, Weimin
Zheng - Tsinghua University and UC Santa Barbara
- Eurosys Conference 2006
2Motivation
- VOD the future of media networks
- Select your favorite movies as you like, any
time, anywhere, impressive - In China, up to Jan. 2005, 8 million VOD users, 5
million of them use it frequently, increasing
with a rate of 35 per year (China
Telecommunication Newspaper). - In global view, 90 million VOD users in 2003, 138
million users in 2005, 327 million users
estimated in 2010 (Information Media Group) - Most of current system are not True VOD Business
Reasons? Technical Reasons?
3Motivation
- Characteristics of VOD
- Multi data source
- Asynchronous data stream
- High interactivity, VCR
- Challenges
- High Network Bandwidth
- High Random I/O capacity
- Technical approaches
- Caching Policies
- Data replication
- Distributed content delivery
- Providing VOD service to a huge number of clients
in a scalable way still unsolved
4Motivation
- Challenging to address user behavior model for
VOD system optimization - Little knowledge about the user behavior of
deployed large scale VOD system, chicken and egg? - Current researchers based their studies on rental
data from video stores, or small scale VOD
systems, or web streaming services - Video rental lack of enough video title, limited
physical copy - Web streaming narrow band service, smaller file
size, bad video quantities, affects user behavior
much
5The focus of this paper
- Things useful to video streaming system design
and maintenance - How about the user-arrival rate in such a system
- In what situation, people like to keep their
patience - What part of content people tend to visit
- How user interests change over time
- What features should we keep in such services?
6Source of Data
- Log data from an infrastructure based large scale
video on demand service deployed in China - The total user of the system is over 1.5 million
users, use a regional data contains about 150
thousand users
- 21,498,338 sessions in 219 days 7,036 movies
involved - Movie length 38.23, gt90min 41.76, 45-90min
- Average data rate is about 384Kbps (512K ADSL
support)
7Outline
- Motivation
- Source of Data
- Poisson Distribution
- Session Length
- User Interests
- Summary
8User arrival rate
P R O B
User Arrivals per 5 sec
- 0-27 arrivals per 5 seconds, do not match the
Poisson
9User arrival rate
P R O B
User Arrivals per 5 sec
- 0-27 arrivals per 5 seconds, do not match the
Poisson - Guess System Idle time may be responsible for
the failure of Poisson
10User Arrival Pattern
P R O B
User Arrivals per 5 sec
- Using data from rush hour(6PM to 9PM), similar
shape with Poisson
11User Arrival Pattern
P R O B
User Arrivals per 5 sec
- Using data from rush hour(6PM to 9PM), similar
shape with Poisson
12User Arrival Pattern
P R O B
User Arrivals per 5 sec
- Using data from rush hour(6PM to 9PM), similar
shape with Poisson - Modified version of Poisson fit well with real
workload
, X0,1,2,
13Indication
- The Poisson distribution underestimates the
possibility of small arrival cases and it
over-estimates the probability of large arrivals,
inefficient resource reservation - With modified model, you can design the maximum
user arrival rate (N) according to user
requirement and investment plan
14Outline
- Motivation
- Source of Data
- Poisson Distribution
- Session Length
- User Interests
- Summary
15Session length impatient audience
C D F
Session Length (Minutes)
- 37 users terminate their session in the first 5
minutes - 52.55 in 10 minutes
- 75 in 25 minutes
16Session length related with popularity?
C D F
NSL
- NSL a ratio of SessionLength / VideoLength
- Expected Movies with higher popularity have
longer session length.
17Session length related with popularity?
C D F
NSL
- NSL a ratio of SessionLength / VideoLength
18Session length related with popularity?
C D F
NSL
- NSL a ratio of SessionLength / VideoLength
- Movies with HIGHER popularity tend to have
SHORTER session length!
Surprise!!!
19Session length related with popularity?
C D F
NSL
- NSL a ratio of SessionLength / VideoLength
- The relation between movie popularity and session
length does exists, but not so strong
20Example caching optimization
A0
A1
A2
A3
- Movie A is the most popular movie, movie B
second, Movie C last
Movie B
B0
B1
B2
B3
Movie C
C0
C1
C2
C3
Caching Priority
A0
A1
A2
A3
B0
B1
B2
B3
C0
C1
C2
C3
A0
A1
B0
C0
B1
C1
A2
B2
B3
C2
A3
C3
- The latter priority list is more reasonable
- Not all part of the most popular movie should be
stressed
21Example ALM optimization
- Movie 1,2,3,4 from least popular to most popular
Viewing movie 2
Viewing movie 4
C
A
Viewing movie 1
Viewing movie 3
D
B
Viewing movie 3
Viewing movie 2
Viewing movie 4
Viewing movie 1
B
A
C
D
The right ALM tree has a better chance to be
stable
22Indication
- Caching the prefix is effective
- Popularity may not reflect the potential of the
content - In Ebay, high reputation user concede much higher
reputation in latter time than they owned - In Powerinfo VOD, high reputation movies are not
always so attractive, people only attracted by
its reputation - Caching policy based on content segment
popularity counting is more effective - Set the node viewing relative colder contents
to the position near the root of ALM tree will be
effective
23Outline
- Motivation
- Source of Data
- Poisson Distribution
- Session Length
- User Interests
- Summary
24User interests distribution
C D F
Movie Index (sorted by popularity)
- 10 objects covering 60 of accesses
- 23 objects got 80 of the hits
25User interests transferring
R A T E
Hour
- User interest changes slowly
26Understanding popularity recommendation
A D A
video sorted by maximum daily access
- ADA average daily access / maximum daily access
27Indication
- User interests change slowly
- Interest inducement user interests can be
induced, with mechanisms like movie
recommendation - Features like movie recommendation are
performance benefit
28Summary
- Indications
- Poisson over-estimates the probability of large
arrivals - Caching and forwarding with regards to content
popularity will be necessary - Use features like content recommendation benefits
caching policy much - Future work
- VCR studies
- Data set open
- Optimization deployment
29Thanks!!!
30Backup
31Global Infrastructure
Edge Server
Edge Server
Regional Server
Regional Server
Regional Server
Edge Server
Regional Server
Edge Server
WAN
Regional Server
Central Server
Central Server
Regional Server
Regional Server
Central Server
Central Server
Regional Server
Central Server
Regional Server
Regional Server
Regional Server
Regional Server
Regional Server
Edge Server
Regional Server
Edge Server
http//www.powerinfo.com.cn
Edge Server
32Session length a close view
- Three kinds of spikes 1mins, 5mins, whole length
33Session length related with popularity?
A S L
- There is no strong relations between movie
popularity and session length
34User request distribution
- Differ with Gummadi and Gribble in Kazaa log
analysis fetch-at-most-once model - Fit with Zipf farely well except for the ending
part, big tail
35User request distribution
- Many suspicions from different works to Zipf
distribution
- The Kolmogorov-Smirnov test is very useful to
decide if a sample comes from a population with a
specific distribution, and it is defined as
36User request distribution
0 am to 12am, 07/08/2004, 23,484
sessions, coldest dayskew factor 0.18783
11am to 23 pm, 10/01/2004, 76,771
sessions, hottest dayskew factor 0.21712
- Checking the validation of Zipf by using
Kolmogorov-Smirnov Goodness-of-Fit test - In total 219 days, skew factor changes between 0
and 0.34847, average skew factor is 0.1987
37Understanding popularity External factors
- Surprise 1 sudden drop from top 15
- Surprise 2 old movie review
38User interests transferring
R A T E
Week Day
- User interest changes slowly
- In a system with a fixed set of object
candidates, there will probably be little
transferring of user interest
39Understanding popularity recommendation
A D A
video sorted by maximum daily access
- ADA average daily access / maximum daily access
- Recommendation has great impact on popularity