Title: Guihong Cao, JianYun Nie and Jing Bai
1Using Markov Chains to Exploit Word Relationships
in Information Retrieval
- Guihong Cao, Jian-Yun Nie and Jing Bai
- Department of Computer Science and Operation
Research University of Montreal
2Outline
- Introduction
- Statistical Language Models (SLMs) for IR
- Smoothing of Language Models
- Previous work to use word relationships for IR
- General Language Model based on Markov Chains
- Conclusion and Future Work
3SLM for IR
- Model the relevance in two ways
- Query likelihood with respect to a language model
from the document - KL-divergence (cross entropy) between the query
model and document model -
- Retrieval Problem Document/Query model
estimation - Smoothing is an important problem avoid zero
probability - Interpolation between MLE document model and
collection model, i.e., P(qiD) aPml(qiD)(1-
a)Pml(qiC) - Collection model language model estimated from
whole document collection
Document model
4Effect of smoothing?
- D Tsunami, ocean, Asia, Qnatural disaster
- Smoothing probability redistribution
- Redistribution uniformly/according to collection
(also to unrelated terms)
Tsunami ocean Asia computer
disaster
5Desired effect
- Using Tsunami ? disaster
- Knowledge-based/Semantic smoothing
- Relationships between terms
Tsunami ocean Asia computer
disaster
6Outline
- Introduction
- Previous work to use word relationships for IR
- Document Expansion
- Query Expansion
- Limitations
- General Language Model based on Markov Chains
- Conclusion and Future Work
7Document Expansion
- Inference from a document term to a different
query term - Translation model
- Inference w?qi
- Key issue estimate the translation probability
t(qiw) - Estimation of the translation model
- Translation model (Berger et al, 1999)
- IBM1 with synthesized data for training
- Title language model (Jin et al, 2002)
- A title is viewed as query relevant to the
document - Train translation model with IBM1 and
document-title pair - Nature co-occurrence
8Query Expansion
- Inference from one query term to a new query
term sharing more terms with the document - Using word relationships
- Use co-occurrence and information flow to do
query expansion Bai et al., 2005 co-occurrence
information - Use WordNet Voorhees, 1994 Liu et al, 2004
semantic information - Pseudo-relevance feedback Xu et al., 1996
- Treat top-n documents in first retrieval as
relevant documents and update the query model
based on the documents - Significant improvement
- Opposite inference of document expansion
- Should be complementary
9Limitations of previous work
- Limits to one aspect to extract word
relationships statistical methods or semantic
methods - Deals with one aspect of document expansion or
query expansion - Opposite inference processes complementary
- Limits to one-step inference use the direct
relationships - e.g., contract ? agree ? negotiation
- ? contract ? negotiation
-
10Outline
- Introduction
- Previous work to use word relationships for IR
- General Language Model based on Markov Chains
- Model Description
- Parameter Estimation
- Experiments
- Conclusion and Future Work
11Model Description
- General Model combining document expansion and
query expansion - Document ranking formula the negative
cross-entropy between expanded document model and
query model - Special cases
- Document Expansion
- Query Expansion
- Markov Chain (MC) is a mathematical tool to do
multi-step inference - Represent the expanded document/query model with
stationary distribution of the corresponding MC
12Illustration of the General Model
- Query expansion and document expansion are
opposite inference process - The process are complementary
- Query expansion query ? document
- Document expansion document ? query
13Why we use MCs?
- Document/query expression corresponds to a MC
words gt states word relationships ? state
transition probabilities - Stationary distribution of MCs is an ideal
representation for query/document - MC stationary distribution has been used in
PageRank Brin and Page, 1998 and PP-attachment
resolution Toutanova et al., 2004
..
14The Process to Generate a Query/Document
- A figure to illustrate the process to generate a
query
15Parameter Estimation
- Three kinds of Parameters
- Initial Distribution of Query/Document Expansion
Model - Transition Probability of Query/Document
Expansion Model - Coefficients of Query/Document Expansion Model
- Parameters for document expansion VS query
expansion - Different initial distributions P0(wiD) VS
P0(wiQ) - Different state transition probabilities
P(wiwj,D) VS P(wiwj,Q) - Similar methods to estimate coefficients global
optimization methods
16Initial Distribution P0(wiQ) and P0(wiD)
- Query model
- The prior distribution of query terms
- Mixture model interpolate original query model
with the pseudo-relevance feedback model - e.g.,
- Document model
- Interpolated the document model with the
collection model - e.g.,
17Transition Probability P(wiwj,Q) and P(wiwj,
D)
- Query model
- Transition probabilityword relationship
- Feedback documents are more informative
- from feedback documents
- from the whole collection
- Document model
- Assume the transition probability is independent
of the document - e.g.,
18Defining Word Relationships
- Combine different word relationships (resources)
via language smoothing - A probabilistic framework makes it possible to
estimate the parameters automatically - Here combine co-occurrence (statistical
information) and WordNet (semantic information )
two-component mixture model - e.g.,
- Pco is the co-occurrence model and PWN is the
WordNet model - The two models are complementary
- is estimated automatically according to
specific contexts
19Illustration of Different Word Relationships
w
i
Co
-
occurrence
Other
WordNet
Relation
Relation
Relationships
w
j
20Defining Word Relationships(Cont.)
- Estimating the co-occurrence model
- Terms co-occur within a window (8 words)
- Interpolated absolute discount smoothing
- Estimating the WordNet model
- Terms should co-occur within a window (one
paragraph) - Terms should be linked in WordNet
- Interpolated absolute discount smoothing
21Estimating Coefficients
- Query Expansion Model
- Combining global and local probabilities
- Combining relations
- Document Expansion Model
- Combining relations
- Global optimization method to maximize Mean
Average Precision of training data Simulated
Annealing algorithm
22Outline
- Introduction
- Previous work to use word relationships for IR
- General Language Model based on Markov Chains
- Model Description
- Parameter Estimation
- Experiments
- Conclusion and Future Work
23Experimental Setting
- Corpus Three TREC dataset
- WSJ SJM and AP
- WordNet WordNet2.0
- http//wordnet.princeton.edu/obtain
- Metrics
- Mean average precision
- Recall
- T-test to test statistical significance
24Experimental Setting (Cont.)
25Research Problems
- Does MC model work?
- Experiments on query expansion
- Is multi-step inference model better than
one-step inference model? - Experiments on query expansion
- Does the general model (document expansion
query expansion) work?
26Does MC Model Work?
- UM unigram model
- MixM mixture model, interpolation between
original model and pseudo-relevance feedback
model - MC-QE Query expansion based on Markov Chains
27Is Multi-step Inference better than One-step
Inference?
- Performance with various iterations
28Does the general model (document expansion
query expansion) work?
- UM unigram model
QE multi-step query expansion - DE one-step document expansion GM
general model to combine DE and QE
29Outline
- Introduction
- Previous work to use word relationships for IR
- General Language Model based on Markov Chains
- Conclusion and Future Work
30Conclusion and Future Work
- Conclusion
- The general model combining query expansion and
document expansion is superior to using them
solely - Incorporating multiple word relationship is
helpful for IR performance - The weight of each component should be set in an
appropriate way (SA algorithm) - Multi-step inference model is superior to
one-step inference model - Future Work
- Integrate more relationships syntactic
relationships - Refine the initial distribution of document
expansion with other similar documents document
clustering
31Thanks!