Language Modeling

1 / 28
About This Presentation
Title:

Language Modeling

Description:

'If a reference retrieval system's response to each ... Laplace / Addictive. Mixture Models. Interpolation. Jelinek Mercer. Dirichlet. Absolute Discounting ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Language Modeling


1
Language Modeling
  • Putting a curve to the bag of words

2
What models we covered in class so far
  • Boolean
  • Extended Boolean
  • Vector Space
  • TFIDF
  • Probabilistic Modeling
  • log P(DR) / P(DN)

3
Probability Ranking Principle
  • If a reference retrieval system's response to
    each request is a ranking of the documents in the
    collection in order of decreasing probability of
    relevance to the user who submitted the request,
    where the probabilities are estimated as
    accurately as possible on the basis of whatever
    data have been made available to the system for
    this purpose, the overall effectiveness of the
    system to its user will be the best that is
    obtainable on the basis of those data.
  • - Robertson

4
Bag of words? What bag?
  • Documents are a vector of term occurrences
  • Assumption of exchangeability
  • What is this really?
  • A hyperspace where each dimension is represented
    by a term
  • Values are term occurrences

5
Can we model this bag?
  • Binomial Distribution
  • Bernoulli / Success Fail Trials
  • e.g. Flipping a coin chance of getting a head
  • Multinomial
  • Probability of events occurring
  • e.g. Flipping a coin chance of head, chance of
    tail
  • e.g. Die Roll chance of 1, 2, , 6
  • e.g. Document chance of a term occurring

6
Review
  • What is the Probability Ranking Principle?
  • What is the bag of words model?
  • What is exchangeability?
  • What is a binomial?
  • What is a multinomial?

7
Some Terminology
  • Term t
  • Vocabulary V t1 t2 tn
  • Document dx tdx1 tdxm ? V
  • Corpus C d1 d2 dk
  • Query Q q1 q2 qi ? V

8
Language Modeling
  • A document is represented by multinomial
  • Unigram model
  • A piece of text is generated by each term
    independently
  • p(t1 t2 tn) p(t1)p(t2)p(tn)
  • p(t1)p(t2)p(tn)1

9
Why Unigram
  • Easy to implement
  • Reasonable performance
  • Word order and structure not captured
  • How much benefit would they add?
  • Open question
  • More parameters to tune in complex models
  • Need more data to train
  • Need more time to compute
  • Need more space to store

10
Enough how do I retrieve documents?
  • p(Qd) p(q1d)p(q2d)p(qnd)
  • How do we estimate p(qd)?
  • Maximum Likelihood Estimate
  • MLE(qd) freq(qd) / ?freq(id)
  • Probability Ranking Principle

11
Review
  • What is the unigram model?
  • Is the language model a binomial or multinomial?
  • Why use the unigram model?
  • Given a query, how do we use a language model to
    retrieve documents?

12
What is wrong with MLE
  • Creates 0 probabilities for terms that do not
    occur
  • 0 probabilities break similarity scoring function
  • Is a 0 probability sensible?
  • Can a word never ever occur?

13
How can we fix this?
  • How do we get around the zero probabilities?
  • New similarity function?
  • Remove zero probabilities?
  • Build a different model?

14
Smoothing Approaches
  • Laplace / Addictive
  • Mixture Models
  • Interpolation
  • Jelinek Mercer
  • Dirichlet
  • Absolute Discounting
  • Backoff

15
Laplace
  • Just up all term frequencies by 1
  • Where have you seen this before?
  • Is this a good idea?
  • Strengths
  • Weaknesses

16
Interpolation
  • Mixture model approach
  • Combine probability models
  • Traditionally combine document model with the
    corpus model
  • Is this a good idea?
  • What else is the corpus model used for?
  • Strengths
  • Weaknesses

17
Backoff
  • Only add probability mass to terms that are not
    seen
  • What does this do to the probability model?
  • Flatter?
  • Is this a good idea?

18
Are their other sources for probability mass?
  • Document Clusters
  • Document Classes
  • User Profiles
  • Topic models

19
Review
  • What is wrong with 0 probabilities?
  • How does smoothing fix it?
  • What is smoothing really doing?
  • What is Interpolation?
  • What is that mixture model really representing?
  • What can we use to mix with the document model?

20
Bored yet? Lets do something complicated
  • Entropy - Information Theory
  • H(x) -?p(x) log p(x)
  • Good for data compression
  • Relative Entropy
  • D(pq) ?p(x) log (p(x)/q(x))
  • Not a true distance measure
  • Used to find differences between probability
    models

21
Ok thats nice
  • What does relative entropy give us?
  • Why not just subtract probabilities?
  • On your calculators calculate
  • p(x) log (p(x)/q(x)) for
  • p(x) .8, q(x) .6
  • p(x) .6, q(x) .4

22
Clarity Score
  • Calculate the relative entropy between the result
    set and the corpus
  • Positive correlation between high clarity score /
    relative entropy and query performance
  • So what is that actually saying?

23
Relative Entropy Query Expansion
  • Relevance Feedback
  • Blind Relevance Feedback
  • Expand query with terms that contribute the most
    to relative entropy
  • What are we doing to the query when we do this?

24
Controlled Query Generation
  • Some of my research
  • p(x) log (p(x)/q(x)) is a good term
    discrimination function
  • Regulate the construction of queries for
    evaluating retrieval algorithms
  • First real controlled reaction experiments with
    retrieval algorithms

25
Review
  • Who is the father of Information Theory?
  • What is Entropy?
  • What is Relative Entropy?
  • What is the Clarity Score?
  • What are the terms that contribute the most to
    relative entropy?
  • Are they useful?

26
You have been a good class
  • Introduced to the language model for information
    retrieval
  • Documents represented as multinomial
    distributions
  • Generative model
  • Queries are generated
  • Smoothing
  • Applications in IR

27
Questions for me?
28
Questions for you
  • What is the Maximum Likelihood Estimate?
  • Why is smoothing important?
  • What is interpolation?
  • What is entropy?
  • What is relative entropy?
  • Does language modeling make sense?
Write a Comment
User Comments (0)