Title: Top-N%20Recommendation%20Algorithm%20Based%20on%20Item-Graph
1Top-N Recommendation Algorithm Based on Item-Graph
- Allen, Zhenjiang LIN
- CSE, CUHK
- June 7, 2007
2Outline
- 1. Top-N Recommendation Problem
- 2. Top-N Recommendation Algorithm
- 3. Item-Graph Model and GCP-based Method
- Item-Graph Model
- Generalized Conditional Probability(GCP)-based
Recommendation Algorithm - 4. Preliminary Experimental Results
- 5. Conclusion and Future Work
31. Top-N Recommendation Problem
- The Top-N Recommendation Problem
- Given the preference information of users,
recommend a set of N items to a certain user that
he might be interested in, based on the items he
has selected. - E-commerce system example Amazon. COM,
customers vs products.
Item 1 Item 2 Item 3 Item m
User 1 1 0 1 0
User 2 1 1 0 0
User n 0 1 0 1
New User 1 ? 1 ? ?
User-Item matrix
4Example the Amazon.com
51. Top-N Recommendation Problem
- Challenges in E-commerce Systems
- Huge amounts of data millions of users and/or
items - Real-time return the results set
- Limited new users preference information
- Volatile users preference information.
- Contributions
- Propose the Item-Graph model.
- simple incremental
- to reflect the relationship among items
- Develop the Generalized Conditional
Probability-based top-N recommendation algorithm. - item-centric
- based-on the Item-Graph model
62. Top-N Recommendation Algorithm
- Two main paradigms
- Content-based recommend items based on the
content (textual information) of items. - Fab system Balabanovic97, Syskill Webert
system Pazzani97. - Collaborative Filtering (CF) recommend items by
collecting taste information from other users. - Collaborative between users (link information).
- More popular than content-based recommendation,
since in many domains (such as music,
restaurants) it is hard to extract useful
features from items. - Tapestry system Goldberg92, Video Recommender
Hill95, Ringo Shardanand95, GroupLens
Konstan97, Jester system Goldberg01, Amazon
Linden03.
72. Top-N Recommendation Algorithm
- CF algorithms classified by strategy of using
data - Memory-based make recommendations based on the
entire collection of references of the users. - No pre-computing is needed, suffer serious
scalability problem. - E.g., Correlation-based Resnick94, Cosine-based
Breese98. - Model-based use the collection of user
preferences to learn a model, which is then used
to make recommendations. - Building a model off-line, more scalable.
- E.g., Cluster models Ungar98, Bayesian network
model Breese98, Association Rule Mining
approach Lin00.
82. Top-N Recommendation Algorithm
- CF algorithms classified by strategy of using
objects - User-centric look for similar (like-minded)
users first and then make recommendation. - Similarity between users is relatively dynamic.
- Pre-computing user neighborhood may lead to poor
predictions. - Item-centric look for similar (or related) items
first and then make recommendation. - Similarity between items is relatively static.
- Enables pre-computing of item-item similarity.
- Therefore, more scalable.
- The aim of our work
- Model-based Item-centric CF top-N recommendation
algorithm.
92. Top-N Recommendation Algorithm
- Notations
- Item set I I1, I2, , Im.
- User set U U1, U2, , Un.
- User-Item matrix D (Dn,m).
- Basket of the active user B ? I.
- Similarity score of x and y sim(x,y).
- Formal definition of top-N recommendation problem
- Given a user-item matrix D and a set of items B
that have been purchased by the active user,
identify an ordered set of items X such that
X N, and X nB 0.
102. Top-N Recommendation Algorithm
- Two classical item-item similarity measures
- Cosine-based (symmetric)
- sim(Ii, Ij) cos(D,i,
D,j) (1) - Conditional Probability(CP)-based (asymmetric)
- sim(Ii, Ij) P(Ij Ii) Freq(Ii
Ij) / Freq(Ii) (2) - Freq(X) the number of customers who have
purchased the item set X. - The ranking score for item x
- RS(x) ? b?B sim(b,x)
(3)
113. Item-Graph Model GCP-based Method
- Intuitions behind the Item-Graph
- The similarity between two items is proportional
to the times of co-purchase of them. - The similarity of item-pairs is transmissible.
- E.g.,
- Definition of the Item-Graph
- Given a dataset D (Dn,m), the Item-Graph is
defined by a weighted undirected graph G(V, E,
W), where - V is the item set I.
- An edge (x, y)?E if and only if items x and y
have been co-purchased. - The weight of edge (x, y) is defined by the
number of co-purchase of items x and y.
123. Item-Graph Model GCP-based Method
- Updating the Item-Graph is easy
- Adding new users preference information T into
the graph needs O(T2) operations, including
adding edges and/or increasing weight of edges. - E.g.,
- Potentially direct application of the Item-Graph
- Clustering the items.
- Measuring item-item similarity.
- Measuring importance of items.
133. Item-Graph Model GCP-based Method
- Ideas in Generalized Conditional
Probability-based method - According to the definition of top-N
recommendation problem, for any x in I-B, we
just need to compute the basket-based
conditional probability P(xB) Freq(xB) /
Freq(B). However, - Freq(xB) or Freq(B) may not exist, or
- Freq(xB) or Freq(B) are too small to make much
sense. - The CP-based method considers the sum of
1-item-based conditional probabilities P(xy)
instead, where x?I-B, y?B. - However, the multi-item-based conditional
probabilities may also contribute to the
recommendation. - E.g., suppose the ranking scores of x and y
computed by the CP-based method are equal, and
we also know P(xB)gtP(yB). Which one should be
ranked higher, x or y?
143. Item-Graph Model GCP-based Method
- The Generalized Conditional Probability
(GCP)-based recommendation algorithm - The ranking score of item x is defined by the sum
of all possible multi-item-based conditional
probabilities, that is, - GCP(xB) ? S ? B P(xS) ? S ? B
(Freq(xS) / Freq(S)). (4) - However, the number of subsets of B is 2B.
- Use GCPd(xB) instead (set d2 in the following
experiments) - GCPd(xB) ? S ? B, S d P(xS).
(5) - Freq(xS) and Freq(S) can be extracted from the
Item-Graph approximately.
153. Item-Graph Model GCP-based Method
- Extracting Freq(A) from Item-Graph approximately
- For an item set A, obtaining the exact Freq(A)
may not be possible from the Item-Graph. - Extracting approximate Freq(A) from the
Item-Graph instead. - Find out the complete sub-graph of A (denoted by
CSG(A)) in the Item-Graph, running time O(A2). - Freq(A) minimal weight of edges in CSG(A).
- E.g.,
- for A a,b, Freq(A) 3.
- for B a,b,c, Freq(B) 1.
- P(cab) Freq(abc) / Freq(ab) 1 / 3.
164. Preliminary Experimental Results
- Dataset
- The MovieLens (http//www.grouplens.org/data)
- A web-based movies recommender system
- Contains multi-valued ratings that indicate how
much each user liked a particular movie or not - Each user has rated at least 20 movies.
- We treat the ratings as an indication that the
users have seen the movies (nonzero) or not
(zero).
Table 1 The characteristics of the MovieLens
dataset
of Users of Items Density1 Average Basket Size
943 1682 6.31 106.04
1Density the percentage of nonzero entries in
the user-item matrix.
174. Preliminary Experimental Results-1
- Evaluation Design
- Split the dataset into a training and test set by
- randomly selecting one rated movie of each user
to be part of the test set, - use the remaining rated movies for training.
- Cosine(COS)-based, CP-based, GCP-based methods,
10-runs average. - Evaluation Metrics
- Hit-Rate (HR)
- HR of hits /
n (6) - Average Reciprocal Hit-Rate (ARHR)
- ARHR (?i1,h1/pi) /
n (7) - of hits the number of items in the test set
that were also in the top-N lists. - h is the number of hits that occurred at
positions p1, p2, , ph within the - top-N lists (i.e., 1 pi N).
184. Preliminary Experimental Results-1
- Performance of Top-N Recommendation Algorithms
- HR (left) x-axis top-N items, y-axis
hit-rate of all users. - ARHR (right) x-axis top-N items, y-axis
average reciprocal hit-rate of all users. - (For the GCP-based method, set d 2.)
194. Preliminary Experimental Results-2
- Testing the Parameter d in GCP Method
- Testing the effect of d ( d 1, 2, 3 ).
- Evaluation Online Shopping Simulation
- Randomly selecting part of the user records to be
the training set - Use the remaining user records for training.
- STEP 0 Constructing the item-graph based on the
training set - STEP 1 for each user in the training set
- randomly moving one item out of the users basket
and make recommendation based on the remaining
items in the basket - computing the order of this item in the
recommendation list - updating the item-graph.
- STEP 2 Computing HR and ARHR metrics.
204. Preliminary Experimental Results-2
- Performance of Top-N Recommendation Algorithms
- HR (left) x-axis top-N items, y-axis
hit-rate of all users. - ARHR (right) x-axis top-N items, y-axis
average reciprocal hit-rate of all users.
215. Conclusion and Future Work
- Conclusion
- Top-N Recommendation Problem and item-centric
Algorithms - Cosine-based, conditional probability-based
- Item-Graph model
- Visualizing the relationship among items.
- Easy to update.
- Generalized Conditional Probability-based top-N
recommendation algorithm - Item-centric based on the Item-Graph model
- Future Work
- Clustering items and measuring item-item
similarities based on the Item-Graph model - Speeding up the GCP method.
22References
- Balabanovic97 M. Balabanovic and Y. Shoham.
Fab Content-based, Collaborative Recommendation.
Commun. ACM, 40(3)66-72, 1997. - Breese98 J. S. Breese, D. Heckerman, David
and C. Kadie. Empirical Analysis of Predictive
Algorithms for Collaborative Filtering. In
Proceedings of the 14th Conference on Uncertainty
in Artificial Intelligence (UAI-98), pages 43-52,
San Francisco, 1998. - Deshpande04 M. Deshpande and G. Karypis.
Item-based Top-N Recommendation Algorithms. ACM
Trans. Inf. Syst., 22(1)143-177, 2004. - Lin00 W. Lin. Association Rule Mining for
Collaborative Recommender Systems. Thesis
submitted for the Degree of M.S. in Computer
Science. - Linden03 G. Linden, B. Smith and J. York.
Amazon.com Recommendations Item-to-Item
Collaborative Filtering. IEEE Internet Computing,
7(1)76-80, 2003. - Resnick94 P. Resnick, N. Iacovou, M. Suchak, P.
Bergstorm and J. Riedl. GroupLens An Open
Architecture for Collaborative Filtering of
Netnews. Proc. Computer Supported Cooperative
Work Conf., pages 175-186, 1994.