Integrating Word Relationships into Language Models

1 / 27
About This Presentation
Title:

Integrating Word Relationships into Language Models

Description:

D partment dInformatique et de Recherche Op rationnelle,Universit ... So, we concentrate on three relations related to nouns: synonym, hypernym, and hyponym. ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 28
Provided by: KOI6

less

Transcript and Presenter's Notes

Title: Integrating Word Relationships into Language Models


1
Integrating Word Relationships into Language
Models
  • Guihong Cao , Jian-Yun Nie , Jing Bai
  • Départment dInformatique et de Recherche
    Opérationnelle,Université de Montréal

Presenter Chia-Hao Lee 
2
Outline
  • Introduction
  • Previous Work
  • A Dependency Model to Combine WordNet and
    Co-occurrence
  • Parameter estimation
  • Estimating conditional probabilities
  • Estimating mixture weights
  • Experiments
  • Conclusion and feature work

3
Introduction
  • In recent years, language models for information
    retrieval (IR) have increased in popularity.
  • The basic idea behind is to compute the
    conditional probability .
  • In most approaches, the computation is
    conceptually decomposed into two distinct steps
  • (1) Estimating the document model
  • (2) Computing the query likelihood using the
    estimated document
  • model

4
Introduction (cont.)
  • When estimating the document model, the words in
    the document are assumed to be independent with
    respect to one another, leading to the so called
    bag-of-word model.
  • However, from our own knowledge of natural
    language, we know that the assumption of term
    independence is a matter of mathematical
    convenience rather than a reality.
  • For example, the words computer and program
    are not independent. A query requesting for
    computer might be well satisfied by a document
    about program.

5
Introduction (cont.)
  • Some studies have been carried out to relax the
    independence assumption.
  • The first one is data-driven, which tries to
    capture dependency among terms by statistical
    information derived from the corpus directly.
  • Another direction is to exploit hand-crafted
    thesauri, such as WordNet.

6
Previous Work
  • In classical language modeling approach to IR, a
    multinomial model over terms is estimated
    for each document in the collection to be
    indexed and searched.
  • In most cases, each query term is assumed to be
    independent of the others, the query likelihood
    is estimated by .
  • After the specification of a document prior
    ,the posteriori probability of a document is
    given by

7
Previous Work (cont.)
  • However the classical language model approach for
    IR does not address the problem of dependence
    between words.
  • The term dependence may mean two different
    things
  • Dependence between words within a query or within
    a document
  • Dependence between query words and document words
  • The first meaning, one may try to recognize the
    relationships between words in sentence.
  • Under the second meaning, dependence means any
    relationship that can be exploited during query
    evaluation.

8
Previous Work (cont.)
  • The incorporate term relationships into the
    document language model, we propose a translation
    model .
  • With the translation model, the document-to-query
    model becomes
  • Even though their model is general than other
    language models, it is different to determine the
    translation probability
  • in practice.
  • To solve this problem, we generate an artificial
    collection of synthetic data for training by
    assuming that a sentence is parallel to the
    paragraph that contains the sentence.

9
A Dependency Model to Combine WordNet and
Co-occurrence
  • Given a query q and a document d, the query can
    be related directly, or they can be related
    indirectly through some word relationships.
  • An example of the first case is that the document
    and the query contain the same words.
  • In the second case, a document can contain a
    different word, but synonymous or related to the
    one in the query.

10
A Dependency Model to Combine WordNet and
Co-occurrence (cont.)
  • In order to take both cases into our modeling, we
    assume that there are two sources to generate a
    term from a document one from a dependency model
    and another from a non-dependency model.
  • the
    parameter of dependency model
  • the
    parameter of non-dependency model

11
A Dependency Model to Combine WordNet and
Co-occurrence (cont.)
  • The non-dependency model tries to capture the
    direct generation of the query by the document,
    we can model it by unigram document model
  • Then, we select a term in the document randomly
    first.
  • Second, a query term is generated based on the
    observed term. Therefore we have

12
A Dependency Model to Combine WordNet and
Co-occurrence (cont.)
  • As for the translation model, we also have the
    problem of estimating the dependency between two
    term, i.e.
  • To address the problem, we assume that some word
    relationships have been manually identified and
    stored in a linguistic resource, and some other
    relationships have to be found automatically
    according to co-occurrences.

13
A Dependency Model to Combine WordNet and
Co-occurrence (cont.)
  • So, this combination can be achieved by a linear
    interpolation smoothing. Thus
  • In our study, we only consider co-occurrence
    information beside WordNet.
  • So, is just the co-occurrence model.

14
A Dependency Model to Combine WordNet and
Co-occurrence (cont.)
  • For the simplicity of expression, we denote
    probability of link model as , i.e.
    , and the co-occurrence model
    as .
  • Substitute Equations 4 and 5 into 3, we obtain
    Equation 6

15
A Dependency Model to Combine WordNet and
Co-occurrence (cont.)






16
A Dependency Model to Combine WordNet and
Co-occurrence (cont.)
  • The idea can become more obvious if we make some
    simplification in the formula.
  • So, we can get

consisting of link model, co-occurrence model and
unigram model
17
A Dependency Model to Combine WordNet and
Co-occurrence (cont.)
  • Let , , denote the respect weights of
    link model, co-occurrence model, and unigram
    model.
  • Then equation 9 can be rewritten as
  • For information retrieval, the most important
    terms are nouns. So, we concentrate on three
    relations related to nouns synonym, hypernym,
    and hyponym.

NSLM
SLM
18
Parameter estimation
  • 1.Estimating conditional probabilities
  • The unigram model ,we use the MLE
    estimation, smoothed by interpolated absolute
    discount, that is

(related to D)
19
Parameter estimation (cont.)
  • For , it can be approximated by the
    maximum likelihood probability .
  • This approximation is motivated by the fact that
    the word is primarily generated from in a way
    quite independent from the model .
  • The estimation of - the probability of
    link between two words according to WordNet.

20
Parameter estimation (cont.)
  • Equation 13 defines our estimation of
    by interpolated Absolute discount

?
21
Parameter estimation (cont.)
  • The estimation of the components of the
    co-occurrence model is similar to
    those of the link model expect that
    that when counting the co-occurrence frequency,
    the requirement of having a link in WordNet is
    removed.

22
Parameter estimation (cont.)
  • 2.Estimating mixture weights
  • We introduce an EM algorithm to
    estimate the mixture weights in NSLM.
  • Because NSLM is a three-component
    mixture model, the optimal weights should
    maximize the likelihood of the queries.
  • Let be the mixture
    weights, we then have

23
Parameter estimation (cont.)
  • However, some documents having high weights are
    not truly relevant to the query. They contain
    noise.
  • To account for the noise, we further assume that
    there are two distinctive sources to generate the
    query.
  • One is the relevant documents, another is a noisy
    source, which is approximated by the collection C

24
Parameter estimation (cont.)
  • With this setting, the hidden and can
    be estimated using the EM algorithm.
  • The update formulas are as follows

25
Experiments
We evaluated our model described in the
previous sections using three different TREC
collections WSJ,AP and SJM
26
Experiments (cont.)
27
Conclusion and feature work
  • In this paper, we integrate word relationships
    into the language modeling framework.
  • We used EM algorithm to train the parameters.
    This method worked well for our experiment.
Write a Comment
User Comments (0)