Integrating Word Relationships into Language Models

About This Presentation

Title:

Integrating Word Relationships into Language Models

Description:

D partment dInformatique et de Recherche Op rationnelle,Universit ... So, we concentrate on three relations related to nouns: synonym, hypernym, and hyponym. ... – PowerPoint PPT presentation

Number of Views:28

Avg rating:3.0/5.0

Slides: 28

Provided by: KOI6

more less

Transcript and Presenter's Notes

Title: Integrating Word Relationships into Language Models

1
Integrating Word Relationships into Language
Models

Guihong Cao , Jian-Yun Nie , Jing Bai
Départment dInformatique et de Recherche
Opérationnelle,Université de Montréal

Presenter Chia-Hao Lee
2
Outline

Introduction
Previous Work
A Dependency Model to Combine WordNet and
Co-occurrence
Parameter estimation
Estimating conditional probabilities
Estimating mixture weights
Experiments
Conclusion and feature work

3
Introduction

In recent years, language models for information
retrieval (IR) have increased in popularity.
The basic idea behind is to compute the
conditional probability .
In most approaches, the computation is
conceptually decomposed into two distinct steps
(1) Estimating the document model
(2) Computing the query likelihood using the
estimated document
model

4
Introduction (cont.)

When estimating the document model, the words in
the document are assumed to be independent with
respect to one another, leading to the so called
bag-of-word model.
However, from our own knowledge of natural
language, we know that the assumption of term
independence is a matter of mathematical
convenience rather than a reality.
For example, the words computer and program
are not independent. A query requesting for
computer might be well satisfied by a document
about program.

5
Introduction (cont.)

Some studies have been carried out to relax the
independence assumption.
The first one is data-driven, which tries to
capture dependency among terms by statistical
information derived from the corpus directly.
Another direction is to exploit hand-crafted
thesauri, such as WordNet.

6
Previous Work

In classical language modeling approach to IR, a
multinomial model over terms is estimated
for each document in the collection to be
indexed and searched.
In most cases, each query term is assumed to be
independent of the others, the query likelihood
is estimated by .
After the specification of a document prior
,the posteriori probability of a document is
given by

7
Previous Work (cont.)

However the classical language model approach for
IR does not address the problem of dependence
between words.
The term dependence may mean two different
things
Dependence between words within a query or within
a document
Dependence between query words and document words
The first meaning, one may try to recognize the
relationships between words in sentence.
Under the second meaning, dependence means any
relationship that can be exploited during query
evaluation.

8
Previous Work (cont.)

The incorporate term relationships into the
document language model, we propose a translation
model .
With the translation model, the document-to-query
model becomes
Even though their model is general than other
language models, it is different to determine the
translation probability
in practice.
To solve this problem, we generate an artificial
collection of synthetic data for training by
assuming that a sentence is parallel to the
paragraph that contains the sentence.

9
A Dependency Model to Combine WordNet and
Co-occurrence

Given a query q and a document d, the query can
be related directly, or they can be related
indirectly through some word relationships.
An example of the first case is that the document
and the query contain the same words.
In the second case, a document can contain a
different word, but synonymous or related to the
one in the query.

10
A Dependency Model to Combine WordNet and
Co-occurrence (cont.)

In order to take both cases into our modeling, we
assume that there are two sources to generate a
term from a document one from a dependency model
and another from a non-dependency model.
the
parameter of dependency model
the
parameter of non-dependency model

11
A Dependency Model to Combine WordNet and
Co-occurrence (cont.)

The non-dependency model tries to capture the
direct generation of the query by the document,
we can model it by unigram document model
Then, we select a term in the document randomly
first.
Second, a query term is generated based on the
observed term. Therefore we have

12
A Dependency Model to Combine WordNet and
Co-occurrence (cont.)

As for the translation model, we also have the
problem of estimating the dependency between two
term, i.e.
To address the problem, we assume that some word
relationships have been manually identified and
stored in a linguistic resource, and some other
relationships have to be found automatically
according to co-occurrences.

13
A Dependency Model to Combine WordNet and
Co-occurrence (cont.)

So, this combination can be achieved by a linear
interpolation smoothing. Thus
In our study, we only consider co-occurrence
information beside WordNet.
So, is just the co-occurrence model.

14
A Dependency Model to Combine WordNet and
Co-occurrence (cont.)

For the simplicity of expression, we denote
probability of link model as , i.e.
, and the co-occurrence model
as .
Substitute Equations 4 and 5 into 3, we obtain
Equation 6

15
A Dependency Model to Combine WordNet and
Co-occurrence (cont.)

16
A Dependency Model to Combine WordNet and
Co-occurrence (cont.)

The idea can become more obvious if we make some
simplification in the formula.
So, we can get

consisting of link model, co-occurrence model and
unigram model
17
A Dependency Model to Combine WordNet and
Co-occurrence (cont.)

Let , , denote the respect weights of
link model, co-occurrence model, and unigram
model.
Then equation 9 can be rewritten as
For information retrieval, the most important
terms are nouns. So, we concentrate on three
relations related to nouns synonym, hypernym,
and hyponym.

NSLM
SLM
18
Parameter estimation

1.Estimating conditional probabilities
The unigram model ,we use the MLE
estimation, smoothed by interpolated absolute
discount, that is

(related to D)
19
Parameter estimation (cont.)

For , it can be approximated by the
maximum likelihood probability .
This approximation is motivated by the fact that
the word is primarily generated from in a way
quite independent from the model .
The estimation of - the probability of
link between two words according to WordNet.

20
Parameter estimation (cont.)

Equation 13 defines our estimation of
by interpolated Absolute discount

?
21
Parameter estimation (cont.)

The estimation of the components of the
co-occurrence model is similar to
those of the link model expect that
that when counting the co-occurrence frequency,
the requirement of having a link in WordNet is
removed.

22
Parameter estimation (cont.)

2.Estimating mixture weights
We introduce an EM algorithm to
estimate the mixture weights in NSLM.
Because NSLM is a three-component
mixture model, the optimal weights should
maximize the likelihood of the queries.
Let be the mixture
weights, we then have

23
Parameter estimation (cont.)

However, some documents having high weights are
not truly relevant to the query. They contain
noise.
To account for the noise, we further assume that
there are two distinctive sources to generate the
query.
One is the relevant documents, another is a noisy
source, which is approximated by the collection C

24
Parameter estimation (cont.)

With this setting, the hidden and can
be estimated using the EM algorithm.
The update formulas are as follows

25
Experiments
We evaluated our model described in the
previous sections using three different TREC
collections WSJ,AP and SJM
26
Experiments (cont.)
27
Conclusion and feature work

In this paper, we integrate word relationships
into the language modeling framework.
We used EM algorithm to train the parameters.
This method worked well for our experiment.

Write a Comment

User Comments (0)