If a reference retrieval system's response to each request is a ranking of the documents in the collection in order of decreasing probability of relevance to the user who submitted the request, where the probabilities are estimated as accurately as possible on the basis of whatever data have been made available to the system for this purpose, the overall effectiveness of the system to its user will be the best that is obtainable on the basis of those data.
- Robertson
4 Bag of words? What bag?
Documents are a vector of term occurrences
Assumption of exchangeability
What is this really?
A hyperspace where each dimension is represented by a term
Values are term occurrences
5 Can we model this bag?
Binomial Distribution
Bernoulli / Success Fail Trials
e.g. Flipping a coin chance of getting a head
Multinomial
Probability of events occurring
e.g. Flipping a coin chance of head, chance of tail
e.g. Die Roll chance of 1, 2, , 6
e.g. Document chance of a term occurring
6 Review
What is the Probability Ranking Principle?
What is the bag of words model?
What is exchangeability?
What is a binomial?
What is a multinomial?
7 Some Terminology
Term t
Vocabulary V t1 t2 tn
Document dx tdx1 tdxm ? V
Corpus C d1 d2 dk
Query Q q1 q2 qi ? V
8 Language Modeling
A document is represented by multinomial
Unigram model
A piece of text is generated by each term independently
p(t1 t2 tn) p(t1)p(t2)p(tn)
p(t1)p(t2)p(tn)1
9 Why Unigram
Easy to implement
Reasonable performance
Word order and structure not captured
How much benefit would they add?
Open question
More parameters to tune in complex models
Need more data to train
Need more time to compute
Need more space to store
10 Enough how do I retrieve documents?
p(Qd) p(q1d)p(q2d)p(qnd)
How do we estimate p(qd)?
Maximum Likelihood Estimate
MLE(qd) freq(qd) / ?freq(id)
Probability Ranking Principle
11 Review
What is the unigram model?
Is the language model a binomial or multinomial?
Why use the unigram model?
Given a query, how do we use a language model to retrieve documents?
12 What is wrong with MLE
Creates 0 probabilities for terms that do not occur
0 probabilities break similarity scoring function
Is a 0 probability sensible?
Can a word never ever occur?
13 How can we fix this?
How do we get around the zero probabilities?
New similarity function?
Remove zero probabilities?
Build a different model?
14 Smoothing Approaches
Laplace / Addictive
Mixture Models
Interpolation
Jelinek Mercer
Dirichlet
Absolute Discounting
Backoff
15 Laplace
Just up all term frequencies by 1
Where have you seen this before?
Is this a good idea?
Strengths
Weaknesses
16 Interpolation
Mixture model approach
Combine probability models
Traditionally combine document model with the corpus model
Is this a good idea?
What else is the corpus model used for?
Strengths
Weaknesses
17 Backoff
Only add probability mass to terms that are not seen
What does this do to the probability model?
Flatter?
Is this a good idea?
18 Are their other sources for probability mass?
Document Clusters
Document Classes
User Profiles
Topic models
19 Review
What is wrong with 0 probabilities?
How does smoothing fix it?
What is smoothing really doing?
What is Interpolation?
What is that mixture model really representing?
What can we use to mix with the document model?
20 Bored yet? Lets do something complicated
Entropy - Information Theory
H(x) -?p(x) log p(x)
Good for data compression
Relative Entropy
D(pq) ?p(x) log (p(x)/q(x))
Not a true distance measure
Used to find differences between probability models
21 Ok thats nice
What does relative entropy give us?
Why not just subtract probabilities?
On your calculators calculate
p(x) log (p(x)/q(x)) for
p(x) .8, q(x) .6
p(x) .6, q(x) .4
22 Clarity Score
Calculate the relative entropy between the result set and the corpus
Positive correlation between high clarity score / relative entropy and query performance
So what is that actually saying?
23 Relative Entropy Query Expansion
Relevance Feedback
Blind Relevance Feedback
Expand query with terms that contribute the most to relative entropy
What are we doing to the query when we do this?
24 Controlled Query Generation
Some of my research
p(x) log (p(x)/q(x)) is a good term discrimination function
Regulate the construction of queries for evaluating retrieval algorithms
First real controlled reaction experiments with retrieval algorithms
25 Review
Who is the father of Information Theory?
What is Entropy?
What is Relative Entropy?
What is the Clarity Score?
What are the terms that contribute the most to relative entropy?
Are they useful?
26 You have been a good class
Introduced to the language model for information retrieval
Documents represented as multinomial distributions