Title: Click anywhere to start the presentation
1Click anywhere to start the presentation
2Lessons from the Netflix Prize
Yehuda Koren The BellKor team (with Robert Bell
Chris Volinsky)
movie 15868
movie 7614
movie 3250
3Recommender systems
We Know What You OughtTo Be Watching This Summer
4Collaborative filtering
- Recommend items based on past transactions of
users - Analyze relations between users and/or items
- Specific data characteristics are irrelevant
- Domain-free user/item attributes are not
necessary - Can identify elusive aspects
5(No Transcript)
6Movie rating data
Training data
Test data
score movie user
1 21 1
5 213 1
4 345 2
4 123 2
3 768 2
5 76 3
4 45 4
1 568 5
2 342 5
2 234 5
5 76 6
4 56 6
score movie user
? 62 1
? 96 1
? 7 2
? 3 2
? 47 3
? 15 3
? 41 4
? 28 4
? 93 5
? 74 5
? 69 6
? 83 6
7Netflix Prize
- Training data
- 100 million ratings
- 480,000 users
- 17,770 movies
- 6 years of data 2000-2005
- Test data
- Last few ratings of each user (2.8 million)
- Evaluation criterion root mean squared error
(RMSE) - Netflix Cinematch RMSE 0.9514
- Competition
- 2700 teams
- 1 million grand prize for 10 improvement on
Cinematch result - 50,000 2007 progress prize for 8.43 improvement
8Overall rating distribution
- Third of ratings are 4s
- Average rating is 3.68
From TimelyDevelopment.com
9ratings per movie
10ratings per user
11Average movie rating by movie count
- More ratings to better movies
From TimelyDevelopment.com
12Most loved movies
Count Avg rating Title
137812 4.593 The Shawshank Redemption
133597 4.545 Lord of the Rings The Return of the King
180883 4.306 The Green Mile
150676 4.460 Lord of the Rings The Two Towers
139050 4.415 Finding Nemo
117456 4.504 Raiders of the Lost Ark
180736 4.299 Forrest Gump
147932 4.433 Lord of the Rings The Fellowship of the ring
149199 4.325 The Sixth Sense
144027 4.333 Indiana Jones and the Last Crusade
13Important RMSEs
Global average 1.1296
User average 1.0651
Movie average 1.0533
Personalization
Cinematch 0.9514 baseline
BellKor 0.8693 8.63 improvement
Grand Prize 0.8563 10 improvement
Inherent noise ????
14Challenges
- Size of data
- Scalability
- Keeping data in memory
- Missing data
- 99 percent missing
- Very imbalanced
- Avoiding overfitting
- Test and training data differ significantly
movie 16322
15The BellKor recommender system
- Use an ensemble of complementing predictors
- Two, half tuned models worth more than a single,
fully tuned model
16(No Transcript)
17The BellKor recommender system
- Use an ensemble of complementing predictors
- Two, half tuned models worth more than a single,
fully tuned model - ButMany seemingly different models expose
similar characteristics of the data, and wont
mix well - Concentrate efforts along three axes...
18The three dimensions of the BellKor system
Global effects
- The first axis
- Multi-scale modeling of the data
- Combine top level, regional modeling of the data,
with a refined, local view - k-NN Extracting local patterns
- Factorization Addressing regional effects
Factorization
k-NN
19Multi-scale modeling 1st tier
Global effects
- Mean rating 3.7 stars
- The Sixth Sense is 0.5 stars above avg
- Joe rates 0.2 stars below avg
- ?Baseline estimationJoe will rate The Sixth
Sense 4 stars
20Multi-scale modeling 2nd tier
Factors model
- Both The Sixth Sense and Joe are placed high on
the Supernatural Thrillers scale - ?Adjusted estimateJoe will rate The Sixth Sense
4.5 stars
21Multi-scale modeling 3rd tier
Neighborhood model
- Joe didnt like related movie Signs
- ?Final estimateJoe will rate The Sixth Sense
4.2 stars
22The three dimensions of the BellKor system
- The second axis
- Quality of modeling
- Make the best out of a model
- Strive for
- Fundamental derivation
- Simplicity
- Avoid overfitting
- Robustness against iterations, parameter
setting, etc. - Optimizing is good, but dont overdo it!
global
local
quality
23The three dimensions of the BellKor system
- The third dimension will be discussed later...
- NextMoving the multi-scale view along the
quality axis
global
local
???
quality
24Local modeling through k-NN
- Earliest and most popular collaborative filtering
method - Derive unknown ratings from those of similar
items (movie-movie variant) - A parallel user-user flavor rely on ratings of
like-minded users (not in this talk)
25k-NN
users
12 11 10 9 8 7 6 5 4 3 2 1
4 5 5 3 1 1
3 1 2 4 4 5 2
5 3 4 3 2 1 4 2 3
2 4 5 4 2 4
5 2 2 4 3 4 5
4 2 3 3 1 6
movies
26k-NN
users
12 11 10 9 8 7 6 5 4 3 2 1
4 5 5 ? 3 1 1
3 1 2 4 4 5 2
5 3 4 3 2 1 4 2 3
2 4 5 4 2 4
5 2 2 4 3 4 5
4 2 3 3 1 6
movies
- estimate rating of movie 1 by user 5
27k-NN
users
12 11 10 9 8 7 6 5 4 3 2 1
4 5 5 ? 3 1 1
3 1 2 4 4 5 2
5 3 4 3 2 1 4 2 3
2 4 5 4 2 4
5 2 2 4 3 4 5
4 2 3 3 1 6
movies
Neighbor selectionIdentify movies similar to 1,
rated by user 5
28k-NN
users
12 11 10 9 8 7 6 5 4 3 2 1
4 5 5 ? 3 1 1
3 1 2 4 4 5 2
5 3 4 3 2 1 4 2 3
2 4 5 4 2 4
5 2 2 4 3 4 5
4 2 3 3 1 6
movies
Compute similarity weightss130.2, s160.3
29k-NN
users
12 11 10 9 8 7 6 5 4 3 2 1
4 5 5 2.6 3 1 1
3 1 2 4 4 5 2
5 3 4 3 2 1 4 2 3
2 4 5 4 2 4
5 2 2 4 3 4 5
4 2 3 3 1 6
movies
Predict by taking weighted average(0.220.33)/
(0.20.3)2.6
30Properties of k-NN
- Intuitive
- No substantial preprocessing is required
- Easy to explain reasoning behind a recommendation
- Accurate?
31k-NN on the RMSE scale
Global average 1.1296
User average 1.0651
Movie average 1.0533
0.96
Cinematch 0.9514
k-NN
0.91
BellKor 0.8693
Grand Prize 0.8563
Inherent noise ????
32k-NN - Common practice
- Define a similarity measure between items sij
- Select neighbors -- N(iu) items most similar
to i, that were rated by u - Estimate unknown rating, rui, as the weighted
average
baseline estimate for rui
33k-NN - Common practice
- Define a similarity measure between items sij
- Select neighbors -- N(iu) items similar to i,
rated by u - Estimate unknown rating, rui, as the weighted
average
Problems
- Similarity measures are arbitrary no fundamental
justification - Pairwise similarities isolate each neighbor
neglect interdependencies among neighbors - Taking a weighted average is restricting e.g.,
when neighborhood information is limited
34Interpolation weights
- Use a weighted sum rather than a weighted average
(We allow )
- Model relationships between item i and its
neighbors - Can be learnt through a least squares problem
from all other users that rated i
35Interpolation weights
- Interpolation weights derived based on their
role no use of an arbitrary similarity measure - Explicitly account for interrelationships among
the neighbors
- Challenges
- Deal with missing values
- Avoid overfitting
- Efficient implementation
36Latent factor models
serious
Braveheart
The Color Purple
Amadeus
Lethal Weapon
Sense and Sensibility
Oceans 11
Geared towards males
Geared towards females
Dave
The Lion King
Dumb and Dumber
The Princess Diaries
Independence Day
Gus
escapist
37Latent factor models
users
4 5 5 3 1
3 1 2 4 4 5
5 3 4 3 2 1 4 2
2 4 5 4 2
5 2 2 4 3 4
4 2 3 3 1
items
users
-.9 2.4 1.4 .3 -.4 .8 -.5 -2 .5 .3 -.2 1.1
1.3 -.1 1.2 -.7 2.9 1.4 -1 .3 1.4 .5 .7 -.8
.1 -.6 .7 .8 .4 -.3 .9 2.4 1.7 .6 -.4 2.1
.2 -.4 .1
.5 .6 -.5
.5 .3 -.2
.3 2.1 1.1
-2 2.1 -.7
.3 .7 -1
items
A rank-3 SVD approximation
38Estimate unknown ratings as inner-products of
factors
users
4 5 5 3 1
3 1 2 4 4 5
5 3 4 3 2 1 4 2
2 4 5 4 2
5 2 2 4 3 4
4 2 3 3 1
?
items
users
-.9 2.4 1.4 .3 -.4 .8 -.5 -2 .5 .3 -.2 1.1
1.3 -.1 1.2 -.7 2.9 1.4 -1 .3 1.4 .5 .7 -.8
.1 -.6 .7 .8 .4 -.3 .9 2.4 1.7 .6 -.4 2.1
.2 -.4 .1
.5 .6 -.5
.5 .3 -.2
.3 2.1 1.1
-2 2.1 -.7
.3 .7 -1
items
A rank-3 SVD approximation
39Estimate unknown ratings as inner-products of
factors
users
4 5 5 3 1
3 1 2 4 4 5
5 3 4 3 2 1 4 2
2 4 5 4 2
5 2 2 4 3 4
4 2 3 3 1
?
items
users
-.9 2.4 1.4 .3 -.4 .8 -.5 -2 .5 .3 -.2 1.1
1.3 -.1 1.2 -.7 2.9 1.4 -1 .3 1.4 .5 .7 -.8
.1 -.6 .7 .8 .4 -.3 .9 2.4 1.7 .6 -.4 2.1
.2 -.4 .1
.5 .6 -.5
.5 .3 -.2
.3 2.1 1.1
-2 2.1 -.7
.3 .7 -1
items
A rank-3 SVD approximation
40Estimate unknown ratings as inner-products of
factors
users
4 5 5 3 1
3 1 2 4 4 5
5 3 4 3 2 1 4 2
2 4 5 4 2
5 2 2 4 3 4
4 2 3 3 1
2.4
items
users
-.9 2.4 1.4 .3 -.4 .8 -.5 -2 .5 .3 -.2 1.1
1.3 -.1 1.2 -.7 2.9 1.4 -1 .3 1.4 .5 .7 -.8
.1 -.6 .7 .8 .4 -.3 .9 2.4 1.7 .6 -.4 2.1
.2 -.4 .1
.5 .6 -.5
.5 .3 -.2
.3 2.1 1.1
-2 2.1 -.7
.3 .7 -1
items
A rank-3 SVD approximation
41Latent factor models
.2 -.4 .1
.5 .6 -.5
.5 .3 -.2
.3 2.1 1.1
-2 2.1 -.7
.3 .7 -1
4 5 5 3 1
3 1 2 4 4 5
5 3 4 3 2 1 4 2
2 4 5 4 2
5 2 2 4 3 4
4 2 3 3 1
-.9 2.4 1.4 .3 -.4 .8 -.5 -2 .5 .3 -.2 1.1
1.3 -.1 1.2 -.7 2.9 1.4 -1 .3 1.4 .5 .7 -.8
.1 -.6 .7 .8 .4 -.3 .9 2.4 1.7 .6 -.4 2.1
- Properties
- SVD isnt defined when entries are unknown ? use
specialized methods - Very powerful model ? can easily overfit,
sensitive to regularization - Probably most popular model among contestants
- 12/11/2006 Simon Funk describes an SVD based
method - 12/29/2006 Free implementation at
timelydevelopment.com
42Factorization on the RMSE scale
Global average 1.1296
User average 1.0651
Movie average 1.0533
Cinematch 0.9514
0.93
factorization
BellKor 0.8693
0.89
Grand Prize 0.8563
Inherent noise ????
43Our approach
- User factors Model a user u as a vector pu
Nk(?, ?) - Movie factorsModel a movie i as a vector qi
Nk(?, ?) - RatingsMeasure agreement between u and i rui
N(puTqi, e2) - Maximize models likelihood
- Alternate between recomputing user-factors,
movie-factors and model parameters - Special cases
- Alternating Ridge regression
- Nonnegative matrix factorization
44Combining multi-scale views
Residual fitting
Weighted average
factorization
global effects
regional effects
k-NN
local effects
A unified model
factorization
k-NN
45Localized factorization model
- Standard factorization User u is a linear
function parameterized by pu rui puTqi - Allow user factors pu to depend on the item
being predicted rui pu(i)Tqi - Vector pu(i) models behavior of u on items like i
46Results on Netflix Probe set
More accurate
47(No Transcript)
48Seek alternative perspectives of the data
- Can exploit movie titles and release year
- But movies side is pretty much covered anyway...
- Its about the users!
- Turning to the third dimension...
global
local
???
quality
49The third dimension of the BellKor system
- A powerful source of informationCharacterize
users by which movies they rated, rather than how
they rated - ? A binary representation of the data
users
users
4 5 5 3 1
3 1 2 4 4 5
5 3 4 3 2 1 4 2
2 4 5 4 2
5 2 2 4 3 4
4 2 3 3 1
0 1 0 1 0 0 1 0 0 1 0 1
1 1 1 0 0 1 0 0 1 1 0 0
0 1 1 1 0 1 0 1 1 0 1 1
0 1 0 0 1 0 0 1 0 1 1 0
1 1 0 0 0 0 1 1 1 1 0 0
0 1 0 0 1 0 0 1 0 1 0 1
movies
movies
50The third dimension of the BellKor system
- Great news to recommender systems
- Works even better on real life datasets (?)
- Improve accuracy by exploiting implicit feedback
- Implicit behavior is abundant and easy to
collect - Rental history
- Search patterns
- Browsing history
- ...
- Implicit feedback allows predicting personalized
ratings for users that never rated!
movie 17270
51The three dimensions of the BellKor system
- Where do you want to be?
- All over the global-local axis
- Relatively high on the quality axis
- Can we go in between the explicit-implicit axis?
- Yes! Relevant methods
- Conditional RBMs ML_at_UToronto
- NSVD Arek Paterek
- Asymmetric factor models
global
local
implicit
explicit
binary
ratings
quality
52Lessons
- What it takes to win
- Think deeper design better algorithms
- Think broader use an ensemble of multiple
predictors - Think different model the data from different
perspectives - At the personal level
- Have fun with the data
- Work hard, long breath
- Good teammates
- Rapid progress of science
- Availability of large, real life data
- Challenge, competition
- Effective collaboration
movie 13043
534 5 5 3 1
3 1 2 4 4 5
5 3 4 3 2 1 4 2
2 4 5 4 2
5 2 2 4 3 4
4 2 3 3 1
3 4 3 2 1
4 5 4
2 4 3 4
3 2
5
2 4
Yehuda Koren ATT Labs Research yehuda_at_att.com B
ellKor homepagewww.research.att.com/volinsky/ne
tflix/