Top-N Recommendation Algorithm Based on Item-Graph - PowerPoint PPT Presentation

1 / 17

About This Presentation

Title:

Top-N Recommendation Algorithm Based on Item-Graph

Description:

... since in many domains (such as music, restaurants) it is hard to extract useful ... No pre-computing is needed, suffer serious scalability problem. ... – PowerPoint PPT presentation

Number of Views:106

Avg rating:3.0/5.0

Slides: 18

Provided by: All5162

Category:

more less

Transcript and Presenter's Notes

Title: Top-N Recommendation Algorithm Based on Item-Graph

1
Top-N Recommendation Algorithm Based on Item-Graph

Allen, Zhenjiang LIN
CSE, CUHK
Nov 13, 2007

2
Outline

1. Top-N Recommendation Problem
2. Top-N Recommendation Algorithm
3. Item-Graph Model and GCP-based Method
Item-Graph Model
Generalized Conditional Probability (GCP)-based
Recommendation Algorithm
4. Preliminary Experimental Results
5. Conclusion and Future Work

3
1. Top-N Recommendation Problem

The Top-N Recommendation Problem
Given the preference information of users,
recommend a set of N items to a certain user that
he might be interested in, based on the items he
has selected.
E-commerce system example Amazon. COM,
customers vs products.

Item 1 Item 2 Item 3 Item m
User 1 1 0 1 0
User 2 1 1 0 0

User n 0 1 0 1
New User 1 ? 1 ? ?
User-Item matrix
4
Example the Amazon.com
5
1. Top-N Recommendation Problem

Challenges in E-commerce Systems
Huge amounts of data millions of users and/or
items
Real-time return the results set
Limited new users preference information
Volatile users preference information.

6
2. Top-N Recommendation Algorithm

Two major approaches
Content-based recommend items based on the
content (textual information) of items.
Fab system Balabanovic97, Syskill Webert
system Pazzani97.
Collaborative Filtering (CF) recommend items by
collecting taste information from other users.
Collaborative (correlation) information between
users.
More popular than content-based recommendation,
since in many domains (such as music,
restaurants) it is hard to extract useful
features from items.
Tapestry system Goldberg92, Video Recommender
Hill95, Ringo Shardanand95, GroupLens
Konstan97, Jester system Goldberg01, Amazon
Linden03.

7
2. Top-N Recommendation Algorithm

CF algorithms classified by strategy of using
data
Memory-based make recommendations based on the
entire collection of references of the users.
No pre-computing is needed, suffer serious
scalability problem.
E.g., Correlation-based Resnick94, Cosine-based
Breese98.
Model-based use the collection of user
preferences to learn a model, which is then used
to make recommendations.
Building a model off-line, more scalable.
E.g., Cluster models Ungar98, Bayesian network
model Breese98, Association Rule Mining
approach Lin00.

8
2. Top-N Recommendation Algorithm

CF algorithms classified by strategy of using
objects
User-centric look for similar (like-minded)
users first and then make recommendation.
Similarity between users is relatively dynamic.
Pre-computing user neighborhood may lead to poor
predictions.
Item-centric look for similar (or related) items
first and then make recommendation.
Similarity between items is relatively static.
Enables pre-computing of item-item similarity.
More scalable.

9
2. Top-N Recommendation Algorithm

Notations
Item set I I1, I2, , Im.
User set U U1, U2, , Un.
User-Item (binary) matrix D (Dn,m).
Basket of the active user B ? I.
Similarity score of x and y sim(x,y).
Formal definition of top-N recommendation problem
Given a user-item matrix D and a set of items B
that have been selected by the active user,
identify an ordered set of items X, such that
X N, and X nB 0.

10
2. Top-N Recommendation Algorithm

Two classical item-item similarity measures
Cosine-based (symmetric)
sim(Ii, Ij) cos(D,i,
D,j) (1)
Conditional Probability(CP)-based (asymmetric)
sim(Ii, Ij) P(Ij Ii) Freq(Ii
Ij) / Freq(Ii) (2)
Freq(X) the number of users who have
purchased the item set X.
The ranking score for item x
RS(x) ? b?B sim(b,x)
(3)
(the sum of similarity score between x and
the items in the basket B)

11
4. Preliminary Experimental Results

Dataset
The MovieLens (http//www.grouplens.org/data)
A web-based movies recommender system
Contains multi-valued ratings that indicate how
much each user liked a particular movie or not
Each user has rated at least 20 movies.
We treat the ratings as an indication that the
users have seen the movies (nonzero) or not
(zero).

Table 1 The characteristics of the MovieLens
dataset
of Users of Items Density1 Average Basket Size
943 1682 6.31 106.04
1Density the percentage of nonzero entries in
the user-item matrix.
12
4. Preliminary Experimental Results-1

Evaluation Design
Split the dataset into training and test sets by
randomly selecting one rated movie of each user
to be part of the test set,
use the remaining rated movies for training.
Cosine(COS)-based, CP-based, GCP-based methods,
10-runs average.
Evaluation Metrics
Hit-Rate (HR)
HR of hits /
n (6)
Average Reciprocal Hit-Rate (ARHR)
ARHR (?i1,h1/pi) /
n (7)
of hits the number of items in the test set
that were also in the top-N lists.
h is the number of hits that occurred at
positions p1, p2, , ph within the
top-N lists (i.e., 1 pi N).

13
4. Preliminary Experimental Results-1

Performance of Top-N Recommendation Algorithms
HR (left) x-axis top-N items, y-axis
hit-rate of all users.
ARHR (right) x-axis top-N items, y-axis
average reciprocal hit-rate of all users.
(For the GCP-based method, set d 2.)

14
4. Preliminary Experimental Results-2

Testing the Parameter d in GCP Method
Testing the effect of d ( d 1, 2, 3 ).
Evaluation Online Shopping Simulation
Randomly selecting part of the user records to be
the training set
Use the remaining user records for training.
STEP 0 Constructing the item-graph based on the
training set
STEP 1 for each user in the training set
randomly moving one item out of the users basket
and make recommendation based on the remaining
items in the basket
computing the order of this item in the
recommendation list
updating the item-graph.
STEP 2 Computing HR and ARHR metrics.

15
4. Preliminary Experimental Results-2

Performance of Top-N Recommendation Algorithms
HR (left) x-axis top-N items, y-axis
hit-rate of all users.
ARHR (right) x-axis top-N items, y-axis
average reciprocal hit-rate of all users.

16
5. Conclusion and Future Work

Conclusion
Top-N Recommendation Problem and item-centric
Algorithms
Cosine-based, conditional probability-based
Item-Graph model
Visualizing the relationship among items.
Easy to update.
Generalized Conditional Probability-based top-N
recommendation algorithm
Item-centric based on the Item-Graph model
Future Work
Clustering items and measuring item-item
similarities based on the Item-Graph model
Speeding up the GCP method.

17
References

Balabanovic97 M. Balabanovic and Y. Shoham.
Fab Content-based, Collaborative Recommendation.
Commun. ACM, 40(3)66-72, 1997.
Breese98 J. S. Breese, D. Heckerman, David
and C. Kadie. Empirical Analysis of Predictive
Algorithms for Collaborative Filtering. In
Proceedings of the 14th Conference on Uncertainty
in Artificial Intelligence (UAI-98), pages 43-52,
San Francisco, 1998.
Deshpande04 M. Deshpande and G. Karypis.
Item-based Top-N Recommendation Algorithms. ACM
Trans. Inf. Syst., 22(1)143-177, 2004.
Lin00 W. Lin. Association Rule Mining for
Collaborative Recommender Systems. Thesis
submitted for the Degree of M.S. in Computer
Science.
Linden03 G. Linden, B. Smith and J. York.
Amazon.com Recommendations Item-to-Item
Collaborative Filtering. IEEE Internet Computing,
7(1)76-80, 2003.
Resnick94 P. Resnick, N. Iacovou, M. Suchak, P.
Bergstorm and J. Riedl. GroupLens An Open
Architecture for Collaborative Filtering of
Netnews. Proc. Computer Supported Cooperative
Work Conf., pages 175-186, 1994.