CS 430 INFO 430 Information Retrieval - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

CS 430 INFO 430 Information Retrieval

Description:

terms with qi = 0 are pi/ri equal to 1. 19. Binary Independence ... How do we estimate the pi and ri? Initial guess, with no information to work from: ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 28
Provided by: willia160
Category:

less

Transcript and Presenter's Notes

Title: CS 430 INFO 430 Information Retrieval


1
CS 430 / INFO 430 Information Retrieval
Lecture 11 Probabilistic Information Retrieval
2
Course Administration
Assignment 2 Remember to check the FAQ for hints
and answers to questions that people have
raised.
3
Three Approaches to Information Retrieval
Many authors divide the classical methods of
information retrieval into three
categories Boolean (based on set
theory) Vector space (based on linear
algebra) Probabilistic (based on Bayesian
statistics) In practice, the latter two have
considerable overlap.
4
Probability revision independent random variables
Independent events Let a, b be two events, with
probability P(a) and P(b). The events a and b are
independent if and only if P(a ? b)
P(a)P(b) In general, a1, a2, ... , an are
independent if and only if P(a1 ? a2 ?
... ? an) P(a1)P(a2)...P(an)
5
Probability revision conditional probability
Let a, b be two events, with probability P(a) and
P(b). Conditional probability P(a b) is the
probability of a given b, also called the
conditional probability of a given b. Conditional
independence The events a1, ..., an are
conditionally independent if and only if
P(ai aj) P(ai) for all i and j.
6
Example independent random variables and
conditional probability
Independent a and b are the results of throwing
two dice P(a5 b3) P(a5) 1/6 Not
independent a and b are the results of throwing
two dice t is the sum of the two dice t a
b P(t8 a2) 1/6 P(t8 a1) 0
7
Probability Conditional probability
a
where a is the event not a
x
a
y
w
b
z
b
P(a) x y P(b) w x P(a b) x / (w
x) P(a b) P(b) P(a ? b) P(b a) P(a)
8
Probability Theory -- Bayes Theorem
Notation Let a, b be two events.
P(a b) is the probability of a given b Bayes
Theorem P(a b) Derivation P(a b) P(b)
P(a ? b) P(b a) P(a)
9
Probability Theory -- Bayes Theorem
Terminology used with Bayes Theorem P(a
b) P(a) is called the prior probability
of a P(a b) is called the posterior
probability of a given b
10
Example of Bayes Theorem
Example a Weight over 200 lb. b Height over 6
ft.
P(a b) x / (wx) x / P(b) P(b a) x /
(xy) x / P(a) x is P(a ? b)
Over 200 lb
x
y
w
z
Over 6 ft
11
Probability Ranking Principle
"If a reference retrieval systems response to
each request is a ranking of the documents in the
collections in order of decreasing probability of
usefulness to the user who submitted the request,
where the probabilities are estimated as
accurately a possible on the basis of whatever
data is made available to the system for this
purpose, then the overall effectiveness of the
system to its users will be the best that is
obtainable on the basis of that data." W.S.
Cooper
12
Probabilistic Principle
Basic concept The probability that a document is
relevant to a query is assumed to depend on the
terms in the query and the terms used to index
the document, only. Given a user query q, the
ideal answer set, R, is the set of all relevant
documents. Given a user query q and a document d
in the collection, the probabilistic model
estimates the probability that the user will find
d relevant, i.e., that d is a member of R.
13
Probabilistic Principle
Ranking measure To rank the probability that d
is relevant to q, we use the ratio
probability that d is relevant to q
probability that d is not relevant to q This
measure runs from near zero, if the probability
is small that the document is relevant, to large
as the probability of relevance approaches
one. In practice, since S' is used for ranking
documents, we can ignore constants that are the
same for all documents. Often it is convenient
to use S' log(S).
S(d, q)
14
Probabilistic Principle
S (d, q)
by
Bayes Theorem
x k where k is constant
P(d R) is the probability of randomly selecting
d from R.
15
Binary Independence Retrieval Model (BIR)
Let x (x1, x2, ... xn) be the term incidence
vector for d, xi 1 if term i is in the
document and 0 otherwise. We estimate P(d R) by
P(x R) If the index terms are independent P(x
R) P(x1 ? R) P(x2 ? R) ... P(xn ? R) P(x1
R) P(x2 R) ... P(xn R) ? P(xi R) This
is known as the Naive Bayes probabilistic
model.
16
Binary Independence Retrieval Model (BIR)
? P(xi
R) ?
P(xi R) Since the xi are either 0 or 1, this
can we written P(xi
1 R) P(xi 0 R)
xi 1 P(xi 1 R) xi 0 P(xi
0 R)
S(d, q) k
? ?
S k
17
Binary Independence Retrieval Model (BIR)
Let pi
P(xi 1 R) ri P(xi 1 R)
pi 1 - pi
xi 1 ri xi 0 1 - ri
? ?
S k
18
Binary Independence Retrieval Model (BIR)
Let qi 1 if term i appears in the query and 0
otherwise For terms
that do not appear in the query assume pi
ri pi
1 - pi xi qi
1 ri xi 0, qi 1 1 - ri
pi (1 - ri) 1
- pi xi qi 1 ri(1 - pi)
qi 1 1 - ri
terms with qi 0 are pi/ri equal to 1
? ?
S k k
constant for a given query
? ?
19
Binary Independence Retrieval Model (BIR)
Ignoring factors that are constant for a given
query and taking logs, we have
pi (1 - ri )
(1 - pi) ri where the summation is taken
over those terms that appear in both the query
and the document.
S' log(S)

? log
20
Relationship to Term Vector Space Model
Suppose that, in the term vector space, document
d is represented by a vector that has component
in dimension i of
pi (1 - ri )
(1 - pi) ri and the query
q is represented by a vector with value 1 in each
dimension that corresponds to a term in the
vector. Then the Binary Independence Retrieval
similarity, S' is the inner product of these two
vectors. Thus this approach can be considered as
a probabilistic way of determining term weights
in the vector space model

log
21
Practical Application
The probabilistic model is an alternative to the
term vector space model. The Binary Independence
Retrieval measure, S', is used instead of the
cosine similarity measure to rank all documents
against the query q. Techniques such as stoplists
and stemming can be used with either
model. Variations to the model result in slightly
different expressions for the similarity measure.
22
Initial Estimates of P(xi R)
Problem How do we estimate the pi and
ri? Initial guess, with no information to work
from pi P(xi R) c ri P(xi R) ni
/ N where c is an arbitrary constant, e.g.,
0.5 ni is the number of documents that contain
xi N is the total number of documents in the
collection
23
Initial Similarity Estimates
With these assumptions
pi
(1 - ri )
(1 - pi) ri ? log(N -
ni)/ni where the summation is taken over those
terms that appear in both the query and the
document.

S' (d, q) ? log
24
Probabilistic Ranking using Relevance Feedback
Basic concept "For a given query, if we know
some documents that are relevant, terms that
occur in those documents should be given greater
weighting in searching for other relevant
documents. "By making assumptions about the
distribution of terms and applying Bayes Theorem,
it is possible to derive weights
theoretically." Van Rijsbergen
25
Probabilistic Ranking using Relevance Feedback
  • Early uses of probabilistic information retrieval
  • were based on relevance feedback
  • R is a set of documents that are guessed to be
    relevant and R
  • the complement of R.
  • 1. Guess a preliminary probabilistic description
    of R and use it to retrieve a first set of
    documents.
  • 2. Interact with the user to refine the
    description of R (relevance feedback).
  • Repeat, thus generating a succession of
    approximations to R.

26
Improving the Estimates without Human Intervention
Automatically (a) Run query q using initial
values. Consider the t top ranked documents.
Let si be the number of these documents that
contain the term xi. (b) The new estimates
are pi P(xi R) si / t ri P(xi R)
(ni - si) / (N - t)
27
Discussion of Probabilistic Model
Advantages Based on firm theoretical
basis Disadvantages Initial definition of R has
to be guessed. Weights ignore term
frequency Assumes independent index terms (as
does vector model)
Write a Comment
User Comments (0)
About PowerShow.com