Title: La Inteligencia Artificial en la calificaci
1A clustering algorithm to find groups with
homogeneous preferences
J. Díez, J.J. del Coz, O. Luaces, A. Bahamonde
Centro de Inteligencia Artificial. Universidad de
Oviedo at Gijónwww.aic.uniovi.es Workshop on
Implicit Measures of User Interests and
Preferences
2The framework to learn preferences
- People tend to rate their preferences in a
relative way - Which middle circle do you think is larger?
3The framework to learn peoples preferences
- Regression is not a good idea
- We will use training sets of preference
judgments - pairs of vectors (v, u) where someone expresses
that he or she prefers v to u
SVMlinear
vi gt ui i ? IMe
fMe
fMe is a linear ranking function f(vi) gt f(ui)
whenever vi is preferable to ui
4The framework to learn peoples preferences
SVMlinear
vi gt ui i ? IMe
fMe
How useful is this ranking function fMe?
Accuracy, generalization error Training
examples Attributes
reliable, general,
5The problem addressed
- To improve ranking functions, we present a new
algorithm for clustering preference criteria
if f2U3 is better than f2 and f3
vi2 gt ui2
f2U3
vi3 gt ui3
6Applications
- Information retrieval
- Optimizing Search Engines Using Clickthrough Data
Joachims, 2002 - Personalized recommenders
- Adaptive Route Advisor Fiechter, Rogers, 2000
- Analysis of sensory data
- Used to test the quality (or the acceptability)
of market products - Panels of experts and consumers
7Baseline approaches
- If ratingi ? ratingj then merge Pi with Pj
- Where ? uses correlation or cosine
8Weaknesses of baseline approaches
- Correlation or cosine were devised for
prediction purposes in collaborative filtering,
and they are not easily extendable to clustering - Not all people have seen the same objects
- Two samples of preferences of the same person
would not be considered homogeneous - Rating is not a good idea
9Our approach a clustering algorithm
- Ranking functions are linear maps f(x) wx
- Then weight vectors w codify the rationale for
these preferences - Therefore, we will try to merge data sets with
similar (cosine) ranking functions ( weight
vectors) - The merge will be accepted if the join ranking
function improves the quality of individual
functions
10Our approach a clustering algorithm
- A set of clusters ClusterPreferencesCriteria
(a list of preference judgments (PJi i 1,,
N)) - Clusters ? for each i 1 to N wi
Learn a ranking hyperplane from (PJi)
Clusters Clusters U (PJi, wi) repeat
let (PJ1, w1) and (PJ2, w2) be the
clusters with most similar w1 and
w2 w Learn a ranking hyperplane from
(PJ1 U PJ2) if (quality of w gt (quality
of w1 quality of w2)) then replace the
clusters (PJ1, w1) and (PJ2, w2)
by (PJ1 U PJ2, w) in Clusters until (no new
merges can be tested) return Clusters
11To estimate quality of ranking functions
- The quality of the ranking functions depends on
- Accuracy, generalization errors
- Number of Training examples
- Number of Attributes
12To estimate quality of ranking functions
- If we have enough training data
- divide them in train (itself) and verification
sets - compute the confidence interval of the
probability of error when we apply each ranking
function to the corresponding verification set
L, R - quality is 1-Rthe estimated proportion of
successful generalization errorsin the
pessimistic case
13To estimate quality of ranking functions
- If we dont have too many training data
- Xi-alpha estimator Joachims, 2000 (texts)
- Cross-validation
- Other
14Experimental results
We used a collection of preference judgments
taken from EachMovie to simulate reasonable
situations in the study of preferences of groups
of people
- People the 100 spectators with more ratings
- Objects the ratings of 504 movies (60 train,
20 verification, 20 test) given by other - 89 spectators
- 808 spectators
- Training sets preference judgments
15Experimental results
808
89
16A clustering algorithm to find groups with
homogeneous preferences
J. Díez, J.J. del Coz, O. Luaces, A. Bahamonde
Centro de Inteligencia Artificial. Universidad de
Oviedo at Gijónwww.aic.uniovi.es Workshop on
Implicit Measures of User Interests and
Preferences