Top-N%20Recommendation%20Algorithm%20Based%20on%20Item-Graph - PowerPoint PPT Presentation

About This Presentation

Title:

Top-N%20Recommendation%20Algorithm%20Based%20on%20Item-Graph

Description:

... since in many domains (such as music, restaurants) it is hard to extract useful ... No pre-computing is needed, suffer serious scalability problem. ... – PowerPoint PPT presentation

Number of Views:173

Avg rating:3.0/5.0

Slides: 23

Provided by: All5162

Category:

more less

Transcript and Presenter's Notes

Title: Top-N%20Recommendation%20Algorithm%20Based%20on%20Item-Graph

1
Top-N Recommendation Algorithm Based on Item-Graph

Allen, Zhenjiang LIN
CSE, CUHK
June 7, 2007

2
Outline

1. Top-N Recommendation Problem
2. Top-N Recommendation Algorithm
3. Item-Graph Model and GCP-based Method
Item-Graph Model
Generalized Conditional Probability(GCP)-based
Recommendation Algorithm
4. Preliminary Experimental Results
5. Conclusion and Future Work

3
1. Top-N Recommendation Problem

The Top-N Recommendation Problem
Given the preference information of users,
recommend a set of N items to a certain user that
he might be interested in, based on the items he
has selected.
E-commerce system example Amazon. COM,
customers vs products.

Item 1 Item 2 Item 3 Item m
User 1 1 0 1 0
User 2 1 1 0 0

User n 0 1 0 1
New User 1 ? 1 ? ?
User-Item matrix
4
Example the Amazon.com
5
1. Top-N Recommendation Problem

Challenges in E-commerce Systems
Huge amounts of data millions of users and/or
items
Real-time return the results set
Limited new users preference information
Volatile users preference information.
Contributions
Propose the Item-Graph model.
simple incremental
to reflect the relationship among items
Develop the Generalized Conditional
Probability-based top-N recommendation algorithm.
item-centric
based-on the Item-Graph model

6
2. Top-N Recommendation Algorithm

Two main paradigms
Content-based recommend items based on the
content (textual information) of items.
Fab system Balabanovic97, Syskill Webert
system Pazzani97.
Collaborative Filtering (CF) recommend items by
collecting taste information from other users.
Collaborative between users (link information).
More popular than content-based recommendation,
since in many domains (such as music,
restaurants) it is hard to extract useful
features from items.
Tapestry system Goldberg92, Video Recommender
Hill95, Ringo Shardanand95, GroupLens
Konstan97, Jester system Goldberg01, Amazon
Linden03.

7
2. Top-N Recommendation Algorithm

CF algorithms classified by strategy of using
data
Memory-based make recommendations based on the
entire collection of references of the users.
No pre-computing is needed, suffer serious
scalability problem.
E.g., Correlation-based Resnick94, Cosine-based
Breese98.
Model-based use the collection of user
preferences to learn a model, which is then used
to make recommendations.
Building a model off-line, more scalable.
E.g., Cluster models Ungar98, Bayesian network
model Breese98, Association Rule Mining
approach Lin00.

8
2. Top-N Recommendation Algorithm

CF algorithms classified by strategy of using
objects
User-centric look for similar (like-minded)
users first and then make recommendation.
Similarity between users is relatively dynamic.
Pre-computing user neighborhood may lead to poor
predictions.
Item-centric look for similar (or related) items
first and then make recommendation.
Similarity between items is relatively static.
Enables pre-computing of item-item similarity.
Therefore, more scalable.
The aim of our work
Model-based Item-centric CF top-N recommendation
algorithm.

9
2. Top-N Recommendation Algorithm

Notations
Item set I I1, I2, , Im.
User set U U1, U2, , Un.
User-Item matrix D (Dn,m).
Basket of the active user B ? I.
Similarity score of x and y sim(x,y).
Formal definition of top-N recommendation problem
Given a user-item matrix D and a set of items B
that have been purchased by the active user,
identify an ordered set of items X such that
X N, and X nB 0.

10
2. Top-N Recommendation Algorithm

Two classical item-item similarity measures
Cosine-based (symmetric)
sim(Ii, Ij) cos(D,i,
D,j) (1)
Conditional Probability(CP)-based (asymmetric)
sim(Ii, Ij) P(Ij Ii) Freq(Ii
Ij) / Freq(Ii) (2)
Freq(X) the number of customers who have
purchased the item set X.
The ranking score for item x
RS(x) ? b?B sim(b,x)
(3)

11
3. Item-Graph Model GCP-based Method

Intuitions behind the Item-Graph
The similarity between two items is proportional
to the times of co-purchase of them.
The similarity of item-pairs is transmissible.
E.g.,
Definition of the Item-Graph
Given a dataset D (Dn,m), the Item-Graph is
defined by a weighted undirected graph G(V, E,
W), where
V is the item set I.
An edge (x, y)?E if and only if items x and y
have been co-purchased.
The weight of edge (x, y) is defined by the
number of co-purchase of items x and y.

12
3. Item-Graph Model GCP-based Method

Updating the Item-Graph is easy
Adding new users preference information T into
the graph needs O(T2) operations, including
adding edges and/or increasing weight of edges.
E.g.,
Potentially direct application of the Item-Graph
Clustering the items.
Measuring item-item similarity.
Measuring importance of items.

13
3. Item-Graph Model GCP-based Method

Ideas in Generalized Conditional
Probability-based method
According to the definition of top-N
recommendation problem, for any x in I-B, we
just need to compute the basket-based
conditional probability P(xB) Freq(xB) /
Freq(B). However,
Freq(xB) or Freq(B) may not exist, or
Freq(xB) or Freq(B) are too small to make much
sense.
The CP-based method considers the sum of
1-item-based conditional probabilities P(xy)
instead, where x?I-B, y?B.
However, the multi-item-based conditional
probabilities may also contribute to the
recommendation.
E.g., suppose the ranking scores of x and y
computed by the CP-based method are equal, and
we also know P(xB)gtP(yB). Which one should be
ranked higher, x or y?

14
3. Item-Graph Model GCP-based Method

The Generalized Conditional Probability
(GCP)-based recommendation algorithm
The ranking score of item x is defined by the sum
of all possible multi-item-based conditional
probabilities, that is,
GCP(xB) ? S ? B P(xS) ? S ? B
(Freq(xS) / Freq(S)). (4)
However, the number of subsets of B is 2B.
Use GCPd(xB) instead (set d2 in the following
experiments)
GCPd(xB) ? S ? B, S d P(xS).
(5)
Freq(xS) and Freq(S) can be extracted from the
Item-Graph approximately.

15
3. Item-Graph Model GCP-based Method

Extracting Freq(A) from Item-Graph approximately
For an item set A, obtaining the exact Freq(A)
may not be possible from the Item-Graph.
Extracting approximate Freq(A) from the
Item-Graph instead.
Find out the complete sub-graph of A (denoted by
CSG(A)) in the Item-Graph, running time O(A2).
Freq(A) minimal weight of edges in CSG(A).
E.g.,
for A a,b, Freq(A) 3.
for B a,b,c, Freq(B) 1.
P(cab) Freq(abc) / Freq(ab) 1 / 3.

16
4. Preliminary Experimental Results

Dataset
The MovieLens (http//www.grouplens.org/data)
A web-based movies recommender system
Contains multi-valued ratings that indicate how
much each user liked a particular movie or not
Each user has rated at least 20 movies.
We treat the ratings as an indication that the
users have seen the movies (nonzero) or not
(zero).

Table 1 The characteristics of the MovieLens
dataset
of Users of Items Density1 Average Basket Size
943 1682 6.31 106.04
1Density the percentage of nonzero entries in
the user-item matrix.
17
4. Preliminary Experimental Results-1

Evaluation Design
Split the dataset into a training and test set by
randomly selecting one rated movie of each user
to be part of the test set,
use the remaining rated movies for training.
Cosine(COS)-based, CP-based, GCP-based methods,
10-runs average.
Evaluation Metrics
Hit-Rate (HR)
HR of hits /
n (6)
Average Reciprocal Hit-Rate (ARHR)
ARHR (?i1,h1/pi) /
n (7)
of hits the number of items in the test set
that were also in the top-N lists.
h is the number of hits that occurred at
positions p1, p2, , ph within the
top-N lists (i.e., 1 pi N).

18
4. Preliminary Experimental Results-1

Performance of Top-N Recommendation Algorithms
HR (left) x-axis top-N items, y-axis
hit-rate of all users.
ARHR (right) x-axis top-N items, y-axis
average reciprocal hit-rate of all users.
(For the GCP-based method, set d 2.)

19
4. Preliminary Experimental Results-2

Testing the Parameter d in GCP Method
Testing the effect of d ( d 1, 2, 3 ).
Evaluation Online Shopping Simulation
Randomly selecting part of the user records to be
the training set
Use the remaining user records for training.
STEP 0 Constructing the item-graph based on the
training set
STEP 1 for each user in the training set
randomly moving one item out of the users basket
and make recommendation based on the remaining
items in the basket
computing the order of this item in the
recommendation list
updating the item-graph.
STEP 2 Computing HR and ARHR metrics.

20
4. Preliminary Experimental Results-2

Performance of Top-N Recommendation Algorithms
HR (left) x-axis top-N items, y-axis
hit-rate of all users.
ARHR (right) x-axis top-N items, y-axis
average reciprocal hit-rate of all users.

21
5. Conclusion and Future Work

Conclusion
Top-N Recommendation Problem and item-centric
Algorithms
Cosine-based, conditional probability-based
Item-Graph model
Visualizing the relationship among items.
Easy to update.
Generalized Conditional Probability-based top-N
recommendation algorithm
Item-centric based on the Item-Graph model
Future Work
Clustering items and measuring item-item
similarities based on the Item-Graph model
Speeding up the GCP method.

22
References

Balabanovic97 M. Balabanovic and Y. Shoham.
Fab Content-based, Collaborative Recommendation.
Commun. ACM, 40(3)66-72, 1997.
Breese98 J. S. Breese, D. Heckerman, David
and C. Kadie. Empirical Analysis of Predictive
Algorithms for Collaborative Filtering. In
Proceedings of the 14th Conference on Uncertainty
in Artificial Intelligence (UAI-98), pages 43-52,
San Francisco, 1998.
Deshpande04 M. Deshpande and G. Karypis.
Item-based Top-N Recommendation Algorithms. ACM
Trans. Inf. Syst., 22(1)143-177, 2004.
Lin00 W. Lin. Association Rule Mining for
Collaborative Recommender Systems. Thesis
submitted for the Degree of M.S. in Computer
Science.
Linden03 G. Linden, B. Smith and J. York.
Amazon.com Recommendations Item-to-Item
Collaborative Filtering. IEEE Internet Computing,
7(1)76-80, 2003.
Resnick94 P. Resnick, N. Iacovou, M. Suchak, P.
Bergstorm and J. Riedl. GroupLens An Open
Architecture for Collaborative Filtering of
Netnews. Proc. Computer Supported Cooperative
Work Conf., pages 175-186, 1994.