CHAPTER 1: INTRODUCTION - PowerPoint PPT Presentation

About This Presentation

Title:

CHAPTER 1: INTRODUCTION

Description:

... most similar k users, u, based on similarity weights, wa,u ... WHAT DO YOU MEAN YOU DON'T CARE FOR BRITNEY SPEARS YOU DUNDERHEAD? Content-Based Recommending ... – PowerPoint PPT presentation

Number of Views:31

Avg rating:3.0/5.0

Slides: 31

Provided by: sunylearni

Learn more at: https://rakaposhi.eas.asu.edu

Category:

more less

Transcript and Presenter's Notes

Title: CHAPTER 1: INTRODUCTION

1
Filtering and Recommender SystemsContent-based
and Collaborative
Some of the slides based On Mooneys Slides
2
Personalization

Recommenders are instances of personalization
software.
Personalization concerns adapting to the
individual needs, interests, and preferences of
each user.
Includes
Recommending
Filtering
Predicting (e.g. form or calendar appt.
completion)
From a business perspective, it is viewed as part
of Customer Relationship Management (CRM).

3
Feedback Prediction/Recommendation

Traditional IR has a single userprobably working
in single-shot modes
Relevance feedback
WEB search engines have
Working continually
User profiling
Profile is a model of the user
(and also Relevance feedback)
Many users
Collaborative filtering
Propagate user preferences to other users

You know this one
4
Recommender Systems in Use

Systems for recommending items (e.g. books,
movies, CDs, web pages, newsgroup messages) to
users based on examples of their preferences.
Many on-line stores provide recommendations (e.g.
Amazon, CDNow).
Recommenders have been shown to substantially
increase sales at on-line stores.

5
Feedback Detection
Non-Intrusive
Intrusive

Click certain pages in certain order while ignore
most pages.
Read some clicked pages longer than some other
clicked pages.
Save/print certain clicked pages.
Follow some links in clicked pages to reach more
pages.
Buy items/Put them in wish-lists/Shopping Carts

Explicitly ask users to rate items/pages

6
Justifying Recommendation..

Recommendation systems must justify their
recommendations
Even if the justification is bogus..
For search engines, the justifications are the
page synopses
Some recommendation algorithms are better at
providing human-understandable justifications
than others
Content-based ones can justify in terms of
classifier features..
Collaborative ones are harder-pressed other than
saying people like you seem to like this stuff
In general, giving good justifications is
important..

7
Content-based vs. Collaborative Recommendation
8
Collaborative Filtering
Correlation analysis Here is similar to
the Association clusters Analysis!
9
Item-User Matrix

The input to the collaborative filtering
algorithm is an mxn matrix where rows are items
and columns are users
Sort of like term-document matrix (items are
terms and documents are users)
Can think of users as vectors in the space of
items (or vice versa)
Can do vector similarity between users
And find who are most similar users..
Can do scalar clusters over items etc..
And find what are most correlated items

Think users?docs Items?keywords
10
A Collaborative Filtering Method(think kNN
regression)

Weight all users with respect to similarity with
the active user.
How to measure similarity?
Could use cosine similarity normally pearson
coefficient is used
Select a subset of the users (neighbors) to use
as predictors.
Normalize ratings and compute a prediction from a
weighted combination of the selected neighbors
ratings.
Present items with highest predicted ratings as
recommendations.

11
3/27
Today Complete Filtering Discuss Das/Datar
paper

?Homework 2 Solns posted
?Midterm on Thursday in class
?Covers everything covered by the first two
homeworks
?Qns??

12
Finding User Similarity with Person Correlation
Coefficient

Typically use Pearson correlation coefficient
between ratings for active user, a, and another
user, u.

ra and ru are the ratings vectors for the m
items rated by both a and u ri,j is
user is rating for item j
13
Neighbor Selection

For a given active user, a, select correlated
users to serve as source of predictions.
Standard approach is to use the most similar k
users, u, based on similarity weights, wa,u
Alternate approach is to include all users whose
similarity weight is above a given threshold.

14
Rating Prediction

Predict a rating, pa,i, for each item i, for
active user, a, by using the k selected neighbor
users,
u ? 1,2,k.
To account for users different ratings levels,
base predictions on differences from a users
average rating.
Weight users ratings contribution by their
similarity to the active user.

ri,j is user is rating for item j
15
Similarity WeightingUser Similarity

Typically use Pearson correlation coefficient
between ratings for active user, a, and another
user, u.

ra and ru are the ratings vectors for the m
items rated by both a and u ri,j is
user is rating for item j
16
Significance Weighting

Important not to trust correlations based on very
few co-rated items.
Include significance weights, sa,u, based on
number of co-rated items, m.

17
Covariance and Standard Deviation

Covariance
Standard Deviation

18
Problems with Collaborative Filtering

Cold Start There needs to be enough other users
already in the system to find a match.
Sparsity If there are many items to be
recommended, even if there are many users, the
user/ratings matrix is sparse, and it is hard to
find users that have rated the same items.
First Rater Cannot recommend an item that has
not been previously rated.
New items
Esoteric items
Popularity Bias Cannot recommend items to
someone with unique tastes.
Tends to recommend popular items.
WHAT DO YOU MEAN YOU DONT CARE FOR BRITNEY
SPEARS YOU DUNDERHEAD?

19
Content-Based Recommending

Recommendations are based on information on the
content of items rather than on other users
opinions.
Uses machine learning algorithms to induce a
profile of the users preferences from examples
based on a featural description of content.
Lots of systems

20
Adapting Naïve Bayes idea for Book Recommendation

Vector of Bags model
E.g. Books have several different fields that are
all text
Authors, description,
A word appearing in one field is different from
the same word appearing in another
Want to keep each bag differentvector of m Bags
Conditional probabilities for each word w.r.t
each class and bag

Can give a profile of a user in terms of words
that are most predictive of what they like
Odds Ratio
P(relexample)/P(relexample)
An example is positive if the odds ratio is gt 1
Strengh of a keyword
LogP(wrel)/P(wrel)
We can summarize a users profile in terms of the
words that have strength above some threshold.

21
Advantages of Content-Based Approach

No need for data on other users.
No cold-start or sparsity problems.
Able to recommend to users with unique tastes.
Able to recommend new and unpopular items
No first-rater problem.
Can provide explanations of recommended items by
listing content-features that caused an item to
be recommended.
Well-known technology The entire field of
Classification Learning is at (y)our disposal!

22
Disadvantages of Content-Based Method

Requires content that can be encoded as
meaningful features.
Users tastes must be represented as a learnable
function of these content features.
Unable to exploit quality judgments of other
users.
Unless these are somehow included in the content
features.

23
Movie Domain

EachMovie Dataset Compaq Research Labs
Contains user ratings for movies on a 05 scale.
72,916 users (avg. 39 ratings each).
1,628 movies.
Sparse user-ratings matrix (2.6 full).
Crawled Internet Movie Database (IMDb)
Extracted content for titles in EachMovie.
Basic movie information
Title, Director, Cast, Genre, etc.
Popular opinions
User comments, Newspaper and Newsgroup reviews,
etc.

24
Content-Boosted Collaborative Filtering
EachMovie
IMDb
25
Content-Boosted CF - I
26
Content-Boosted CF - II
User Ratings Matrix
Pseudo User Ratings Matrix
Content-Based Predictor

Compute pseudo user ratings matrix
Full matrix approximates actual full user
ratings matrix
Perform CF
Using Pearson corr. between pseudo user-rating
vectors
This works better than either!

27
Why cant the pseudo ratings be used to help
content-based filtering?

How about using the pseudo ratings to improve a
content-based filter itself? (or how access to
unlabelled examples improves accuracy)
Learn a NBC classifier C0 using the few items for
which we have user ratings
Use C0 to predict the ratings for the rest of the
items
Loop
Learn a new classifier C1 using all the ratings
(real and predicted)
Use C1 to (re)-predict the ratings for all the
unknown items
Until no change in ratings
With a small change, this actually works in
finding a better classifier!
Change Keep the class posterior prediction
(rather than just the max class)
This means that each (unlabelled) entity could
belong to multiple classeswith fractional
membership in each
We weight the counts by the membership fractions
E.g. P(Avc) Sum of class weights of all
examples in c that have Av divided by Sum of
class weights of all examples in c
This is called expectation maximization
Very useful on web where you have tons of data,
but very little of it is labelled
Reminds you of K-means, doesnt it?

28
(No Transcript)
29
(boosted) content filtering
30
Co-Training Motivation

Learning methods need labeled data
Lots of ltx, f(x)gt pairs
Hard to get (who wants to label data?)
But unlabeled data is usually plentiful
Could we use this instead??????

31
Co-training
You train meI train you
Small labeled data needed

Suppose each instance has two parts
x x1, x2
x1, x2 conditionally independent given f(x)
Suppose each half can be used to classify
instance
?f1, f2 such that f1(x1) f2(x2) f(x)
Suppose f1, f2 are learnable
f1 ? H1, f2 ? H2, ? learning algorithms A1,
A2

A2
A1
x1, x2
ltx1, x2, f1(x1)gt
f2
Unlabeled Instances
Labeled Instances
Hypothesis
32
Observations

Can apply A1 to generate as much training data as
one wants
If x1 is conditionally independent of x2 / f(x),
then the error in the labels produced by A1
will look like random noise to A2 !!!
Thus no limit to quality of the hypothesis A2 can
make

33
It really works!

Learning to classify web pages as course pages
x1 bag of words on a page
x2 bag of words from all anchors pointing to a
page
Naïve Bayes classifiers
12 labeled pages
1039 unlabeled

34
(No Transcript)
35
Focussed Crawling

Cho paper
Looks at heuristics for managing URL queue
Aim1 completeness
Aim2 just topic pages
Prioritize if word in anchor / URL
Heuristics
Pagerank
backlinks

36
Modified Algorithm

Page is hot if
Contains keyword in title, or
Contains 10 instances of keyword in body, or
Distance(page, hot-page) lt 3

37
Results
38
More Results
39
Conclusions

Recommending and personalization are important
approaches to combating information over-load.
Machine Learning is an important part of systems
for these tasks.
Collaborative filtering has problems.
Content-based methods address these problems (but
have problems of their own).
Integrating both is best.
Which lead us to discuss some approaches that
wind up using unlabelled data along with labelled
data to improve performance.

40
Discussion of the Google News Collaborative
Filtering Paper

Write a Comment

User Comments (0)