Title: Relevance Models In Information Retrieval
1Relevance Models In Information Retrieval
- Victor Lavrenko and W. Bruce Croft
- Center for Intelligent Information Retrieval
- Department of Computer Science University of
Massachusetts Amherst
Presenter Chia-Hao LeeÂ
2Outline
- Introduction
- Related Work
- Relevance Models
- Estimating a Relevance Model
- Experimental Results
- Conclusion
3Introduction
- The field of information retrieval has been
primarily concerned with developing algorithms to
identify relevant pieces of information in
response to a users information need. - The notion of relevance is central to information
retrieval, and much research in area has forced
on developing formal models of relevance.
4Introduction (cont.)
- One of the most popular models, introduced by
Robertson and Sparck Jones, ranks documents by
their likelihood of belonging to the relevant
class of documents for a query. - More recently, in the language modeling approach,
the advantage of this approach is to shift from
developing heuristic weights
for representing term importance to instead focus
on estimation techniques for the document model.
5Related Work
- Classical Probabilistic Approach
- Underlying most research on probabilistic
models of information retrieval is probability
ranking principle, advocated by Robertson in,
which suggests ranking the documents D by the
odds of their being observed in the relevant
class . -
6Related Work (cont.)
- Language Modeling Approach
- Most of these approaches rank he documents
in the collection by the probability that a query
Q would be observed during repeated random
sampling form the model of document D
7Related Work (cont.)
- Cross-Language Approach
- Language-modeling approaches have been extended
to cross-language retrieval by Hiemstra and de
Jong and Xu et al. - The model proposed by Berger and Lafferty
applies to the translation of a document into a
query in a monolingual environment, but it can
readily accommodate a bilingual environment.
8Relevance Models
- Define some parameter
- V a vocabulary in some language
- C some large collection of documents
- R the subset of documents in C ( )
- a relevance model to be the
probability distribution
9Relevance Models (cont.)
- The primary goal of Information Retrieval systems
is to identify a set of documents relevant to
some query Q . - Unigram language models ignore any short-range
interactions between the words in a sample of
text, so we cannot distinguish between
grammatical and non-grammatical samples of text. - The attempts to use higher-order models were few
and did not lead to noticeable improvements.
10Relevance Models (cont.)
- Two approaches to document ranking the
probability ratio, advocated by the classical
probabilistic models , - and cross-entropy.
11Relevance Models (cont.)
- Classical probabilistic models
- The Probability Ranking Principle suggests that
we should rank the documents . - In order of decreasing probability ratio
- If we assume a document D to be a sequence
independent words , the probability
ranking principle may be expressed as a product
of the ratios
12Relevance Models (cont.)
- Cross-entropy
- Let denote the language model of the
relevant class, and for every document D let
denote the corresponding document language
model. - Cross-entropy is a natural measure of divergence
between two language models, defined as
13Relevance Models (cont.)
- Intuitively, documents with small cross-entropy
from the relevance model are likely to be
relevant, so we rank the documents by increasing
cross-entropy. - So, we can know that cross-entropy enjoys a
number of attractive theoretical properties. - One property is of particular importance suppose
we estimate as the relative frequency of
the word w in the user query Q.
14Estimating a Relevance Model
- We discuss a set of techniques that could be used
to estimate the set of probabilities .
- Estimation from a set of examples.
- Estimation without example.
- Cross-lingual estimation.
15Estimating a Relevance Model (cont.)
- Estimation from a set of Examples
- Let denote the probability of
randomly picking document D from the relevant
set R. - We assume each relevant document is equally
likely to be picked at random, so the estimate is
- The probability of observing a word w if we
randomly pick some word from D is simply the
relative frequency of w in D
16Estimating a Relevance Model (cont.)
- Combining the estimates from the above two
equations, the probability of randomly picking a
document D and then observing the word w is - the overall probability of observing the word w
in the relevant class -
17Estimating a Relevance Model (cont.)
- Now suppose we have a sufficiently large, but
incomplete subset of examples , and
would like to estimate the relevance model
. - Indeed, the resulting estimator has a
number of interesting probabilities - is an unbiased estimator of
for a random subset . - is the maximum-likelihood estimator
with respect to the set of examples S . - is the maximum-entropy probability
distribution constrained by S.
18Estimating a Relevance Model (cont.)
- Most smoothing methods center around a fairly
simple idea -
- is a parameter that controls the degree of
smoothing. - This connection allows us to interpret smoothing
as a way of selecting a different prior
distribution .
.
19Estimating a Relevance Model (cont.)
- Estimation without Examples
- Estimation of relevance models when no training
examples are available. - As a running example we will consider the task
of ad-hoc information retrieval, where we are
given a short 2-3 word query, indicative of the
users information need and no examples of
relevant documents. -
-
20Estimating a Relevance Model (cont.)
- Our best bet is to relate the probability of w
to the conditional probability of observing w
given that we just observed - We are translating a set of words into a single
word.
21Estimating a Relevance Model (cont.)
- Method 1 i.i.d. sampling
- Assume that the query words and the words
w in relevant document are sampled identically
and independently from a unigram distribution
. -
- We assumed that w and all are sampled
independently and identically to each other
22Estimating a Relevance Model (cont.)
23Estimating a Relevance Model (cont.)
- Method 2 conditional sampling
- We fix a value of w according to some prior
. Then perform the following process k
times pick a distribution according to
, the sample the query word from with
probability . - The effect of this sampling strategy is that we
assume the query words to be independent of
each other.
24Estimating a Relevance Model (cont.)
- we compute the expectation over the universe C
of our unigram models. - Combination
-
25Estimating a Relevance Model (cont.)
Figure
26Estimating a Relevance Model (cont.)
- Cross-lingual Estimation
- Goal estimate the relevance model in some
target language, different from the language of
the query. - Let be the query in the
source language and - let be the unknown set of target documents
that are relevant to that query.
27Estimating a Relevance Model (cont.)
- For example the probability distribution for
every word t in the vocabulary of the target
language. - An implicit assumption behind equation (2.22) is
that there exists a joint probabilistic model
from which we can compute the joint
probability . - That and t represent words from different
languages, and so will not naturally occur in the
same documents.
28Estimating a Relevance Model (cont.)
- 1.Estimation with a parallel corpus
- Suppose we have at our disposal a parallel
corpus C , a collection pf document pairs ,
where is am document in the source
language, and is a document in the target
language discussing the same topic as . - Method 1
- Method 2
29Estimating a Relevance Model (cont.)
- 2.Estimation with a statistical lexicon
- A statistical lexicon is special dictionary
which gives the translation
probability for every source word and every
target word t . - In this case we let C be the set of target
documents . - In order to compute for a source
word in a target document .
30Experiment
- Experiment Setup
- English Resources
31Experiment (cont.)
32Experiment (cont.)
- Ad-hoc Retrieval Experiments
33Experiment (cont.)
- Comparison of Ranking Methods
34Experiment (cont.)
35Experiment (cont.)
36Experiment (cont.)
37Experiment (cont.)
38Experiment (cont.)
39Experiment (cont.)
- Cross-Lingual Experiments
40Experiment (cont.)
41Conclusions
- In this work we introduced a formal framework for
modeling the notion of relevance in Information
Retrieval. - And we defined a relevance model to be the
language model that reflects word frequencies in
the class of documents that are relevant to some
given information need.