Title: EvaluatingOptimizing Search Engines using Clickthrough Data
1 Evaluating/Optimizing Search Engines using
Clickthrough Data
- Shui-Lung Chuang
- Mar. 28, 2003
2The Reference Papers
- Evaluating Retrieval Performance using
Clickthrough Data - Technical Report, Cornell U., 2002
- Optimizing Search Engines using Clickthrough Data
- KDD-2002
- Thorsten Joachims
- Department of Computer Science
- Cornell University
3About the Author
- Thorsten Joachims
- Now Assistant professor, Dept. of CS, Cornell
University - 2001 Finished his Ph.D. (1997, received a Diplom)
- 2000-01 As a Post Dr. (Knowledge Discovery Team)
- 1994-96 As a visiting scholar of Prof. Tom
Mitchell at CMU - As I know, the first guy applying support vector
machine (SVM) on text categorization (ECML-1998) - The author of SVMlight the cause of wild
popularity of SVM - available at http//svmlight.joachims.org/
4Outline
- Things about clickthrough data
- Evaluating search engines using clickthrough data
- Optimizing search engines using clickthrough data
5Search Engine Logs
Where is Web page of ICDM 2002 ?
Logs
Query Terms ICDM ICDM02 ICDM 2002
ICDM ICDM02 ICDM 2002
1
Search Engine
Click through
3
2
URLs http//kis.maebashi-it.ac.jp/icdm02/
http//www.wi-lab.com/icdm02
http//www.computer.org/.../pr01754.htm
6Clickthrough Data
- Clickthrough data can be thought of as triplets
(q,r,c) - the query q
- the ranking r presented to the user
- the set c of links the user clicked on
- E.g.,
- Clickthough data provide users feedback for
relevance judgment
q support vector machine
r
c link1 link3 link7
7A Mechanism to Record Clickthrough Data
- query-log the query words, the presented ranking
- click-log query-ID, clicked URL (via a proxy
server) - This process should be made transparent to the
user - This process should not influence system
performance
8Some Works on Search-Engine Logs
- Real-world search engine
- Direct Hit (http//www.directhit.com)
- Analyzing search vocabularies/subjects/topics
- C. Silverstein, M. Henzinger, H. Marais, and M.
Moricz. Analysis of a very large altavista query
log, Technical Report, Digital Systems Research
Center, 1998. - N. Ross and D. Wolfram. End user searching on the
internet An analysis of term pair topics
submitted to the Excite search engine.
JASIS-2000. - H.-T Pu and S.-L. Chuang and C. Yang. Subject
categorization of query terms for exploring Web
users search interests. JASIS-2002, 53(8). - S.-L. Chuang and L.-F. Chien. Enriching Web
taxonomies through subject categorization of
query terms from search engine logs. DSS-2003. - Clustering query terms
- D. Beeferman and A. Berger. Agglomerative
clustering of a search engine query log.
KDD-2000. - J.-R. Wen and J.-Y. Nie and H.-J. Zhang.
Clustering user queries of a search engine,
WWW-2001, ACMTOIS-2002. - S.-L. Chuang and L.-F. Chien. Towards automatic
generation of query taxonomy A hierarchical
query clustering approach. ICDM-2002. - Further . . . ?
9Outline
- Things about clickthrough data
- Evaluating search engines using clickthrough data
- Experiment setup for getting unbiased
clickthrough data - Theoretical analysis
- Optimizing search engines using clickthrough data
10The Problem
- A problem of statistical inference hypothesis
test - Users are only rarely willing to give explicit
feedback - Clickthrough data seem to provide users implicit
feedback Is them suitable for relevance
judgment? I.e., Click ? Relevance ?
Which search engine provides better results
Google or MSNSearch?
11EXP1 Regular Clickthrough Data
- Clicks heavily depend on the ranking
(presentation bias)
12EXP2 Unbiased Clickthrough Data
- The criteria to get unbiased clickthrough data
for comparing search engines - Blind test The interface should hide the random
variables underlying the hypothesis test to avoid
biasing the users response - Click ? preference The interface should be
designed so that a click demonstrates a
particular judgment of the user - Low usability impact The interface should not
substantially lower the productivity of the user
13EXP2 Unbiased Clickthrough Data
- Top l links of the combined ranking containing
the top ka and kb links from rankings A and B
ka-kb?1.
14Computing the Combined Ranking
15Experiment
- Google v.s. MSNSearch
- Experiment data gathered from 3 users,
9/25?10/18, 2001 - 180 queries and 211 clicks (1.17 clicks/query,
2.31 words/query) - The top k links for each query are manually
judged for the relevance - Questions to examine
- Dose the clickthrough evaluation agree with the
manual relevance judgments? - Click ? Preference?
- Is the experiment design a blind test?
16Theoretical Analysis
17Theoretical Analysis Assumption 1
- Intuitively, this assumption formalizes that
users click on a relevant link more frequently
than on a non-relevant link by a difference of ?.
18Theoretical Analysis Assumption 2
- Intuitively, the assumption states that the only
reason for a user clicking on a particular link
is due to the relevance of the link, but not to
other influence factors connected with a
particular retrieval function.
19Statistical Hypothesis Test
- Two-tailed paired t-test
- binomial sign test
Please refer to the paper if you have interest
20Clickthrough vs. Relevance
Google vs. MSNSearch (77 vs. 63) Google vs.
Default (85 vs. 18) MSNSearch vs. Default (91
vs. 12)
21Is Assumption 1 Valid?
- Assumption 1 User clicks on more relevant links
than non-relevant links on average
22Is Assumption 2 Valid?
23Outline
- Things about clickthrough data
- Evaluating search engines using clickthrough data
- Experiment setup for getting unbiased
clickthrough data - Theoretical analysis and experiments
- Optimizing search engines using clickthrough data
- Relevance feedback by clickthrough data
- A framework for learning of retrieval functions
- An SVM algorithm for learning of ranking
functions - Experiments
24An Illustrative Scenario
25Click ? Absolute Relevance Judgment
- Clickthrough data as a triplet (q,r,c)
- The presented ranking r depends on the query q as
determined by the retrieval function of search
engine - The set c of clicked-on links depends on both
query q and the presented ranking r - E.g., Highly ranked links have advantages to be
clicked - A click on a particular link cannot be seen as an
absolute relevance judgment
26Click Relative Preference Judgment
- Assuming that the user scanned the ranking from
top to bottom - E.g., c link1,link3,link7 (r the ranking
preferred by the user)
link7 ltr link2 link7 ltr link4 link7 ltr
link5 link7 ltr link6
link3 ltr link2
27A Framework for Learning Retrieval Fun.
- r is optimal ordering, rf(q) is the ordering of
retrieval function f on query q r and rf(q) are
binary relations over D?D, where Dd1,,dm is
the document collection - e.g., di ltr dj, then (di,dj)? r
- Kendalls ? (vs. average precision)
- For a fixed but unknown distribution Pr(q,r),
the goal is to learn a retrieval function with
the maximum expected Kendalls ?
P of concordant pairs Q of discordant pairs
28An SVM Algo. for Learning Ranking Fun.
- Given an independently and identically
distributed training sample S of size n
containing queries q with their target ranking r - The learner will select a ranking function f from
a family of ranking functions F that maximize the
empirical ?
29The Ranking SVM Algorithm
- Consider the class of linear ranking functions
w is a weight vector that is adjusted by
learning, ?(q,d) is a mapping onto features
describing the match between q and d
- The goal is to find a w so that
- max number of following
- inequalities is fulfilled
30The Categorization SVM
Learning a hypothesis h such that
?
?
?
?
?
?
Example
?
?
w
Learning ? Optimization problem
?
?
?
Minimize
?
?
?
so that
where yi 1 (?1) if di is in class (?)
31The Ranking SVM Algorithm (cont.)
- Optimization Problem 1 is convex and has no local
optima. By rearranging the constraints as
- A classification SVM on vectors
32Experiment Setup Meta Search
Google
MSNSearch
Striver
Excite
Altavista
Hotbot
33Features
34Experiment Results
35Learned Weights of Features
36Conclusions
- The first work (evaluating search engines) is
crucial - The feasibility of using clickthrough data in
evaluating retrieval performance has been
verified - The clickthrough data (less effort) perform as
well as manual relevance judgment (more effort)
in this task. - The second (KDD one) shows an interesting work on
clickthrough data - Negative comments
- The approaches did not been justified in a larger
scale, so whether the techniques are workable in
real cases is still uncertain. - The links that are relevant but ranked lower
remain invisible.
37What Can Clickthrough Data Help?
- Problem 1
- How to measure the retrieval quality of a search
engine? How to compare the performance between
two search engines? E.g., Which search engine
provides better results Google or MSNSearch? - Users are only rarely willing to give explicit
feedback. - Problem 2
- How to improve the ranking function of search
engines? - Can we learn something like for query q,
document a should be ranked higher than document
b?