Title: Extending Relevance Model for Relevance Feedback
1Extending Relevance Model for Relevance Feedback
Le Zhao Chenmin Liang Jamie Callan
Language Technologies Institute, School of
Computer Science, Carnegie Mellon University,
Pittsburgh, PA 15213, USA
Introduction TREC 2008 Relevance Feedback track
defines a testbed for evaluating relevance
feedback algorithms. It includes different levels
of feedback, from only 1 relevant feedback
document to over 100 judgments with at least 3
relevant documents per topic.
- The Extended Relevance Model
- Problem Setup
- weight feedback terms according to relevant
feedback docs and pseudo relevant docs instead
of building two queries and combining - use single tuning parameter to control how
much more important true relevant documents
should be than the pseudo ones - Goal separate out factors that affect term
weights from the two sources fdbk docs,
rel docs, P(I) etc., so that stable across
topics. - Key problem modeling P(I) can no longer be
dropped w/o cost!
Goal The design of feedback algorithms is most
challenging when the amount of feedback
information is minimal. Thus, we aim at
designing a robust relevance feedback algorithm
that can utilize even a small number of feedback
documents to achieve robust performance.
- Experiments
- Baseline
- Dependency model queries, for increased top
precision - Pseudo relevance feedback (relevance model) for
better recall - Best runs in 2005, 2006 Terabyte tracks
- Extended Relevance Model
- Stability of optimal
- tuning on a per topic basis gives only
3-4 improvement on feedback set C or D - suggest tuning the interpolation of the
extended relevance model with the original - query
- Optimal around 0.7-0.8, significantly
better than relevance feedback alone, when
only one (the top) relevant document is used for
feedback. plt0.004 by paired sign-test - No significant difference between merged model
w/ top rel doc fdbk and PRF - Performance change as amount of feedback
information increases
- Data Set
- Documents
- GOV2 collection
- Topics
- 50 topics from previous Terabyte tracks
- 150 topics from Million Query tracks
- Feedback
- Top documents ranked by systems from the
previous tracks - Judgments also from previous tracks
- training topics from previous
- Terabyte (TB) and MQ tracks
- different from test TB only
- feedback documents randomly
- sampled from judgments
- different from test top ranked by previous
TREC runs - almost flat curve
- PRF is gaining a lot
- need lower ranked relevant documents for
effective feedback?
- Modeling P(I)
- Generated from Collection model
- P(I C) (approximated with) P(Q C)
- Considering documents in the collection
- maxD in C P(I D) maxD in C P(Q D)
- Intuition relevant document is as good as the
best document in C - avgD in TopN P(I D) avgD in TopN P(Q D)
- Intuition relevant document is as good as the
average of TopN in C - Goal is to make stable, across topics with
different P(I D) values.
- The Relevance Model
- A distribution over terms, given information
need - I, (Lavrenko and Croft 2001). For term r,
- P(I) can be dropped w/o affecting the term
weights - Top n terms ? Relevance model Indri query
- weight(w1 r1 w2 r2 .. wn rn), where,
wi P(ri I) - Interpolation with original query
- weight( w Original_Query
(1-w) Relevant_Model_Query )
- Conclusions Future Work
- The extended relevance model works well.
(Otherwise would vary based on the number of
relevant documents.) - One randomly-sampled relevant document is more
informative than a top-ranked relevant document. - Merging relevance feedback and PRF is
significantly better than relevance feedback. - Top ranked negative feedback documents probably
carry more information for the system than top
ranked relevant feedback documents. Future work.