Title: Top-N Recommendation Algorithm Based on Item-Graph
1Top-N Recommendation Algorithm Based on Item-Graph
- Allen, Zhenjiang LIN
- CSE, CUHK
- Nov 13, 2007
2Outline
- 1. Top-N Recommendation Problem
- 2. Top-N Recommendation Algorithm
- 3. Item-Graph Model and GCP-based Method
- Item-Graph Model
- Generalized Conditional Probability (GCP)-based
Recommendation Algorithm - 4. Preliminary Experimental Results
- 5. Conclusion and Future Work
31. Top-N Recommendation Problem
- The Top-N Recommendation Problem
- Given the preference information of users,
recommend a set of N items to a certain user that
he might be interested in, based on the items he
has selected. - E-commerce system example Amazon. COM,
customers vs products.
Item 1 Item 2 Item 3 Item m
User 1 1 0 1 0
User 2 1 1 0 0
User n 0 1 0 1
New User 1 ? 1 ? ?
User-Item matrix
4Example the Amazon.com
51. Top-N Recommendation Problem
- Challenges in E-commerce Systems
- Huge amounts of data millions of users and/or
items - Real-time return the results set
- Limited new users preference information
- Volatile users preference information.
62. Top-N Recommendation Algorithm
- Two major approaches
- Content-based recommend items based on the
content (textual information) of items. - Fab system Balabanovic97, Syskill Webert
system Pazzani97. - Collaborative Filtering (CF) recommend items by
collecting taste information from other users. - Collaborative (correlation) information between
users. - More popular than content-based recommendation,
since in many domains (such as music,
restaurants) it is hard to extract useful
features from items. - Tapestry system Goldberg92, Video Recommender
Hill95, Ringo Shardanand95, GroupLens
Konstan97, Jester system Goldberg01, Amazon
Linden03.
72. Top-N Recommendation Algorithm
- CF algorithms classified by strategy of using
data - Memory-based make recommendations based on the
entire collection of references of the users. - No pre-computing is needed, suffer serious
scalability problem. - E.g., Correlation-based Resnick94, Cosine-based
Breese98. - Model-based use the collection of user
preferences to learn a model, which is then used
to make recommendations. - Building a model off-line, more scalable.
- E.g., Cluster models Ungar98, Bayesian network
model Breese98, Association Rule Mining
approach Lin00.
82. Top-N Recommendation Algorithm
- CF algorithms classified by strategy of using
objects - User-centric look for similar (like-minded)
users first and then make recommendation. - Similarity between users is relatively dynamic.
- Pre-computing user neighborhood may lead to poor
predictions. - Item-centric look for similar (or related) items
first and then make recommendation. - Similarity between items is relatively static.
- Enables pre-computing of item-item similarity.
- More scalable.
92. Top-N Recommendation Algorithm
- Notations
- Item set I I1, I2, , Im.
- User set U U1, U2, , Un.
- User-Item (binary) matrix D (Dn,m).
- Basket of the active user B ? I.
- Similarity score of x and y sim(x,y).
- Formal definition of top-N recommendation problem
- Given a user-item matrix D and a set of items B
that have been selected by the active user,
identify an ordered set of items X, such that
X N, and X nB 0.
102. Top-N Recommendation Algorithm
- Two classical item-item similarity measures
- Cosine-based (symmetric)
- sim(Ii, Ij) cos(D,i,
D,j) (1) - Conditional Probability(CP)-based (asymmetric)
- sim(Ii, Ij) P(Ij Ii) Freq(Ii
Ij) / Freq(Ii) (2) - Freq(X) the number of users who have
purchased the item set X. - The ranking score for item x
- RS(x) ? b?B sim(b,x)
(3) - (the sum of similarity score between x and
the items in the basket B)
114. Preliminary Experimental Results
- Dataset
- The MovieLens (http//www.grouplens.org/data)
- A web-based movies recommender system
- Contains multi-valued ratings that indicate how
much each user liked a particular movie or not - Each user has rated at least 20 movies.
- We treat the ratings as an indication that the
users have seen the movies (nonzero) or not
(zero).
Table 1 The characteristics of the MovieLens
dataset
of Users of Items Density1 Average Basket Size
943 1682 6.31 106.04
1Density the percentage of nonzero entries in
the user-item matrix.
124. Preliminary Experimental Results-1
- Evaluation Design
- Split the dataset into training and test sets by
- randomly selecting one rated movie of each user
to be part of the test set, - use the remaining rated movies for training.
- Cosine(COS)-based, CP-based, GCP-based methods,
10-runs average. - Evaluation Metrics
- Hit-Rate (HR)
- HR of hits /
n (6) - Average Reciprocal Hit-Rate (ARHR)
- ARHR (?i1,h1/pi) /
n (7) - of hits the number of items in the test set
that were also in the top-N lists. - h is the number of hits that occurred at
positions p1, p2, , ph within the - top-N lists (i.e., 1 pi N).
134. Preliminary Experimental Results-1
- Performance of Top-N Recommendation Algorithms
- HR (left) x-axis top-N items, y-axis
hit-rate of all users. - ARHR (right) x-axis top-N items, y-axis
average reciprocal hit-rate of all users. - (For the GCP-based method, set d 2.)
144. Preliminary Experimental Results-2
- Testing the Parameter d in GCP Method
- Testing the effect of d ( d 1, 2, 3 ).
- Evaluation Online Shopping Simulation
- Randomly selecting part of the user records to be
the training set - Use the remaining user records for training.
- STEP 0 Constructing the item-graph based on the
training set - STEP 1 for each user in the training set
- randomly moving one item out of the users basket
and make recommendation based on the remaining
items in the basket - computing the order of this item in the
recommendation list - updating the item-graph.
- STEP 2 Computing HR and ARHR metrics.
154. Preliminary Experimental Results-2
- Performance of Top-N Recommendation Algorithms
- HR (left) x-axis top-N items, y-axis
hit-rate of all users. - ARHR (right) x-axis top-N items, y-axis
average reciprocal hit-rate of all users.
165. Conclusion and Future Work
- Conclusion
- Top-N Recommendation Problem and item-centric
Algorithms - Cosine-based, conditional probability-based
- Item-Graph model
- Visualizing the relationship among items.
- Easy to update.
- Generalized Conditional Probability-based top-N
recommendation algorithm - Item-centric based on the Item-Graph model
- Future Work
- Clustering items and measuring item-item
similarities based on the Item-Graph model - Speeding up the GCP method.
17References
- Balabanovic97 M. Balabanovic and Y. Shoham.
Fab Content-based, Collaborative Recommendation.
Commun. ACM, 40(3)66-72, 1997. - Breese98 J. S. Breese, D. Heckerman, David
and C. Kadie. Empirical Analysis of Predictive
Algorithms for Collaborative Filtering. In
Proceedings of the 14th Conference on Uncertainty
in Artificial Intelligence (UAI-98), pages 43-52,
San Francisco, 1998. - Deshpande04 M. Deshpande and G. Karypis.
Item-based Top-N Recommendation Algorithms. ACM
Trans. Inf. Syst., 22(1)143-177, 2004. - Lin00 W. Lin. Association Rule Mining for
Collaborative Recommender Systems. Thesis
submitted for the Degree of M.S. in Computer
Science. - Linden03 G. Linden, B. Smith and J. York.
Amazon.com Recommendations Item-to-Item
Collaborative Filtering. IEEE Internet Computing,
7(1)76-80, 2003. - Resnick94 P. Resnick, N. Iacovou, M. Suchak, P.
Bergstorm and J. Riedl. GroupLens An Open
Architecture for Collaborative Filtering of
Netnews. Proc. Computer Supported Cooperative
Work Conf., pages 175-186, 1994.