Title: CHAPTER 1: INTRODUCTION
1Filtering and Recommender SystemsContent-based
and Collaborative
Some of the slides based On Mooneys Slides
2Personalization
- Recommenders are instances of personalization
software. - Personalization concerns adapting to the
individual needs, interests, and preferences of
each user. - Includes
- Recommending
- Filtering
- Predicting (e.g. form or calendar appt.
completion) - From a business perspective, it is viewed as part
of Customer Relationship Management (CRM).
3Feedback Prediction/Recommendation
- Traditional IR has a single userprobably working
in single-shot modes - Relevance feedback
- WEB search engines have
- Working continually
- User profiling
- Profile is a model of the user
- (and also Relevance feedback)
- Many users
- Collaborative filtering
- Propagate user preferences to other users
You know this one
4Recommender Systems in Use
- Systems for recommending items (e.g. books,
movies, CDs, web pages, newsgroup messages) to
users based on examples of their preferences. - Many on-line stores provide recommendations (e.g.
Amazon, CDNow). - Recommenders have been shown to substantially
increase sales at on-line stores.
5Feedback Detection
Non-Intrusive
Intrusive
- Click certain pages in certain order while ignore
most pages. - Read some clicked pages longer than some other
clicked pages. - Save/print certain clicked pages.
- Follow some links in clicked pages to reach more
pages. - Buy items/Put them in wish-lists/Shopping Carts
- Explicitly ask users to rate items/pages
6Justifying Recommendation..
- Recommendation systems must justify their
recommendations - Even if the justification is bogus..
- For search engines, the justifications are the
page synopses - Some recommendation algorithms are better at
providing human-understandable justifications
than others - Content-based ones can justify in terms of
classifier features.. - Collaborative ones are harder-pressed other than
saying people like you seem to like this stuff - In general, giving good justifications is
important..
7Content-based vs. Collaborative Recommendation
8Collaborative Filtering
Correlation analysis Here is similar to
the Association clusters Analysis!
9Item-User Matrix
- The input to the collaborative filtering
algorithm is an mxn matrix where rows are items
and columns are users - Sort of like term-document matrix (items are
terms and documents are users) - Can think of users as vectors in the space of
items (or vice versa) - Can do vector similarity between users
- And find who are most similar users..
- Can do scalar clusters over items etc..
- And find what are most correlated items
Think users?docs Items?keywords
10A Collaborative Filtering Method(think kNN
regression)
- Weight all users with respect to similarity with
the active user. - How to measure similarity?
- Could use cosine similarity normally pearson
coefficient is used - Select a subset of the users (neighbors) to use
as predictors. - Normalize ratings and compute a prediction from a
weighted combination of the selected neighbors
ratings. - Present items with highest predicted ratings as
recommendations.
113/27
Today Complete Filtering Discuss Das/Datar
paper
- ?Homework 2 Solns posted
- ?Midterm on Thursday in class
- ?Covers everything covered by the first two
homeworks - ?Qns??
12Finding User Similarity with Person Correlation
Coefficient
- Typically use Pearson correlation coefficient
between ratings for active user, a, and another
user, u.
ra and ru are the ratings vectors for the m
items rated by both a and u ri,j is
user is rating for item j
13Neighbor Selection
- For a given active user, a, select correlated
users to serve as source of predictions. - Standard approach is to use the most similar k
users, u, based on similarity weights, wa,u - Alternate approach is to include all users whose
similarity weight is above a given threshold.
14Rating Prediction
- Predict a rating, pa,i, for each item i, for
active user, a, by using the k selected neighbor
users, - u ? 1,2,k.
- To account for users different ratings levels,
base predictions on differences from a users
average rating. - Weight users ratings contribution by their
similarity to the active user.
ri,j is user is rating for item j
15Similarity WeightingUser Similarity
- Typically use Pearson correlation coefficient
between ratings for active user, a, and another
user, u.
ra and ru are the ratings vectors for the m
items rated by both a and u ri,j is
user is rating for item j
16Significance Weighting
- Important not to trust correlations based on very
few co-rated items. - Include significance weights, sa,u, based on
number of co-rated items, m.
17Covariance and Standard Deviation
- Covariance
- Standard Deviation
18Problems with Collaborative Filtering
- Cold Start There needs to be enough other users
already in the system to find a match. - Sparsity If there are many items to be
recommended, even if there are many users, the
user/ratings matrix is sparse, and it is hard to
find users that have rated the same items. - First Rater Cannot recommend an item that has
not been previously rated. - New items
- Esoteric items
- Popularity Bias Cannot recommend items to
someone with unique tastes. - Tends to recommend popular items.
- WHAT DO YOU MEAN YOU DONT CARE FOR BRITNEY
SPEARS YOU DUNDERHEAD?
19Content-Based Recommending
- Recommendations are based on information on the
content of items rather than on other users
opinions. - Uses machine learning algorithms to induce a
profile of the users preferences from examples
based on a featural description of content. - Lots of systems
20Adapting Naïve Bayes idea for Book Recommendation
- Vector of Bags model
- E.g. Books have several different fields that are
all text - Authors, description,
- A word appearing in one field is different from
the same word appearing in another - Want to keep each bag differentvector of m Bags
Conditional probabilities for each word w.r.t
each class and bag
- Can give a profile of a user in terms of words
that are most predictive of what they like - Odds Ratio
- P(relexample)/P(relexample)
- An example is positive if the odds ratio is gt 1
- Strengh of a keyword
- LogP(wrel)/P(wrel)
- We can summarize a users profile in terms of the
words that have strength above some threshold.
21Advantages of Content-Based Approach
- No need for data on other users.
- No cold-start or sparsity problems.
- Able to recommend to users with unique tastes.
- Able to recommend new and unpopular items
- No first-rater problem.
- Can provide explanations of recommended items by
listing content-features that caused an item to
be recommended. - Well-known technology The entire field of
Classification Learning is at (y)our disposal!
22Disadvantages of Content-Based Method
- Requires content that can be encoded as
meaningful features. - Users tastes must be represented as a learnable
function of these content features. - Unable to exploit quality judgments of other
users. - Unless these are somehow included in the content
features.
23Movie Domain
- EachMovie Dataset Compaq Research Labs
- Contains user ratings for movies on a 05 scale.
- 72,916 users (avg. 39 ratings each).
- 1,628 movies.
- Sparse user-ratings matrix (2.6 full).
- Crawled Internet Movie Database (IMDb)
- Extracted content for titles in EachMovie.
- Basic movie information
- Title, Director, Cast, Genre, etc.
- Popular opinions
- User comments, Newspaper and Newsgroup reviews,
etc.
24Content-Boosted Collaborative Filtering
EachMovie
IMDb
25Content-Boosted CF - I
26Content-Boosted CF - II
User Ratings Matrix
Pseudo User Ratings Matrix
Content-Based Predictor
- Compute pseudo user ratings matrix
- Full matrix approximates actual full user
ratings matrix - Perform CF
- Using Pearson corr. between pseudo user-rating
vectors - This works better than either!
27Why cant the pseudo ratings be used to help
content-based filtering?
- How about using the pseudo ratings to improve a
content-based filter itself? (or how access to
unlabelled examples improves accuracy) - Learn a NBC classifier C0 using the few items for
which we have user ratings - Use C0 to predict the ratings for the rest of the
items - Loop
- Learn a new classifier C1 using all the ratings
(real and predicted) - Use C1 to (re)-predict the ratings for all the
unknown items - Until no change in ratings
- With a small change, this actually works in
finding a better classifier! - Change Keep the class posterior prediction
(rather than just the max class) - This means that each (unlabelled) entity could
belong to multiple classeswith fractional
membership in each - We weight the counts by the membership fractions
- E.g. P(Avc) Sum of class weights of all
examples in c that have Av divided by Sum of
class weights of all examples in c - This is called expectation maximization
- Very useful on web where you have tons of data,
but very little of it is labelled - Reminds you of K-means, doesnt it?
28(No Transcript)
29(boosted) content filtering
30Co-Training Motivation
- Learning methods need labeled data
- Lots of ltx, f(x)gt pairs
- Hard to get (who wants to label data?)
- But unlabeled data is usually plentiful
- Could we use this instead??????
31Co-training
You train meI train you
Small labeled data needed
- Suppose each instance has two parts
- x x1, x2
- x1, x2 conditionally independent given f(x)
- Suppose each half can be used to classify
instance - ?f1, f2 such that f1(x1) f2(x2) f(x)
- Suppose f1, f2 are learnable
- f1 ? H1, f2 ? H2, ? learning algorithms A1,
A2
A2
A1
x1, x2
ltx1, x2, f1(x1)gt
f2
Unlabeled Instances
Labeled Instances
Hypothesis
32Observations
- Can apply A1 to generate as much training data as
one wants - If x1 is conditionally independent of x2 / f(x),
- then the error in the labels produced by A1
- will look like random noise to A2 !!!
- Thus no limit to quality of the hypothesis A2 can
make
33It really works!
- Learning to classify web pages as course pages
- x1 bag of words on a page
- x2 bag of words from all anchors pointing to a
page - Naïve Bayes classifiers
- 12 labeled pages
- 1039 unlabeled
34(No Transcript)
35Focussed Crawling
- Cho paper
- Looks at heuristics for managing URL queue
- Aim1 completeness
- Aim2 just topic pages
- Prioritize if word in anchor / URL
- Heuristics
- Pagerank
- backlinks
36Modified Algorithm
- Page is hot if
- Contains keyword in title, or
- Contains 10 instances of keyword in body, or
- Distance(page, hot-page) lt 3
37Results
38More Results
39Conclusions
- Recommending and personalization are important
approaches to combating information over-load. - Machine Learning is an important part of systems
for these tasks. - Collaborative filtering has problems.
- Content-based methods address these problems (but
have problems of their own). - Integrating both is best.
- Which lead us to discuss some approaches that
wind up using unlabelled data along with labelled
data to improve performance.
40Discussion of the Google News Collaborative
Filtering Paper