A Generative Retrieval Model for Structured Documents

1 / 29

About This Presentation

Title:

A Generative Retrieval Model for Structured Documents

Description:

a) //article[about(., music)] b) //article[about(.//section, music)] //section[about(., pop) ... pop.. ...music.. ...pop... Music ... ... music... ... music ... – PowerPoint PPT presentation

Number of Views:51

Avg rating:3.0/5.0

Slides: 30

Provided by: carnegieme

more less

Transcript and Presenter's Notes

Title: A Generative Retrieval Model for Structured Documents

1
A Generative Retrieval Model for Structured
Documents

Le Zhao, Jamie Callan
Language Technologies InstituteSchool of
Computer ScienceCarnegie Mellon University
Oct 2008

2
Background

Structured documents
Author Edited Fields
Library systems Title, meta-data of books
Web documents HTML, XML
Automatic Annotations
Part Of Speech, Named Entity, Semantic Role
Structured query
Human
Automatic

3
Example Structured Retrieval Queries

XML element retrieval
NEXI query (Wikipedia XML)
a) //articleabout(., music) b)
//articleabout(.//section, music)
//sectionabout(., pop)
Question Answering
Indri query (ASSERT style SRL annotation)
combinesentence( combinetarget(
love combine./arg0 (
anyperson ) combine./arg1 (
Mary ) ) )

music..
..pop..
Music
pop
music
music
D
D
arg0
arg1
John loves Mary
4
Motivation

Basis Language Model Inference Net (Indri
search engine/Lemur)
Already supports field retrieval indexing and
retrieving relations between annotations
flexible query language testing new query forms
promptly
Main problems
Approximate matching (structure keyword)
Evidence combination
Extension from keyword retrieval model
Approximate structure keyword matching
Combining evidence from inner fields
Goal Outperform keyword in precision, thru
Coherent structured retrieval model better
understanding
Better smoothing Guiding query formulation
Finer control via accurate robust structured
queries

5
Roadmap

Brief overview of Indri Field retrieval
Existing Problems
The generative structured retrieval model
Term Field level Smoothing
Evidence combination alternatives
Experiments
Conclusions

6
Indri Document Retrieval

combine(iraq war)
Scoring scope is document
Return a list of scores for docs
Language Model built from Scoring scope, smoothed
with the collection model.
Because of smoothing partial matches can also be
returned.

7
Indri Field Retrieval

combinetitle(iraq war)
Scoring scope is title
Return a list of scores for titles
Language Model built from Scoring scope (title),
smoothed with Document and Collection models.
Results on Wikipedia collectionscore document-nu
mber start end content-1.19104 199636.xm
l 0 2 Iraq War-1.58453 1595906.xml 0 3
Anglo-Iraqi War-1.58734 14889.xml 0 3
Iran-Iraq War-1.87811 184613.xml 0 4 2003
Iraq war timeline-2.07668 2316643.xml 0 5
Canada and the Iraq War-2.09957 202304.xml 0 5
Protests against the Iraq war-2.23997 2656581.x
ml 0 6 Saddam's Trial and Iran-Iraq
War-2.35804 1617564.xml 0 7 List of Iraq War
Victoria Cross Because of smoothing partial
matches can also be returned.

8
Existing Problems
9
Evidence Combination

Topic A document with multiple sections about
iraq war, and discusses Bushs exit strategy.
combine( combinesection(iraq war)
bush 1(exit strategy) )sections could
return scores (0.2, 0.2, 0.1, 0.002) for one
document
Some options
max (Bilotti et al 2007) Only considers one
match
or Favors many matches, even if weak matches
and Biased against many matches, even if good
matches
average Favors many good matches, hurt by weak
matches
What about documents that dont contain a section
element?
But, do have a lot of matching terms?

10
Evidence Combination Methods (1)
AVG OR MAX Result Sentences
-1.545 -0.5075 -0.5085 1) It must takemeasures.
-1.881 -0.8467 -0.8520 2) U.S. investments worldwide could be in jeopardy if other countries take up similar measures.
-2.349 -0.2425 -0.5030 3) Chanting the slogan "takemeasures before they takeour measurements," the Greenpeace activists set up a coffin outside the ministry to draw attention to the deadly combination of atmospheric pollution and rising temperatures in Athens, which are expected to reach 42 degrees centigrade at the weekend.
-2.401 -0.4817 -0.5012 4) SINGAPORE, May 14 (Xinhua) -- The Singapore government will take measures to discourage speculation in the private and tighten credit, particularly for foreigners, Deputy Prime Minister Lee Hsien Loong announced here today.
11
Bias toward short fields

Topic Who loves Mary?combine( combinetarget
( Loves combine./arg0( anyperson )
combine./arg1( Mary ) ) )
PMLE(qiE) count(qi)/E
Produces very skewed scores when E is small
E.g., if E 1, PMLE(qiE) is either 0 or 1
Biases toward combinetarget(Loves)
target usually length 1, arg0/1 longer
Ratio between having and not having a target
match is larger than that of arg0/1, with
Jelinek-Mercer smoothing

12
The Generative Structured Retrieval model

A new framework for structured retrieval
A new term-level smoothing method

13
A New Framework

combine( combinesection(iraq war) bush
1(exit strategy))
Query
Traditional merely the sampled terms
New specifies a graphical model, a generation
process
Scoring scope is document
For one document, calculate probability of the
model
Sections are used as evidence of relevance for
the document
a hidden variable in the graphical model
In general, inner fields are hidden, and used to
score outer fields.
Hidden variables are summed over to produce final
score
Averaging the scores from section fields,
(uniform prior over sections)

14
A New FrameworkField-Level Smoothing

Term level smoothing (traditional)
no section contains iraq or war
add prior terms to section Dirichlet prior
from collection model
Field level smoothing (new)
no section field in document add prior fields

Terms in S
P(wC)
Ss
µ
S
Sections in D
P(wsection, D, C)
Ds
15
A New FrameworkAdvantages

Soft matching of sections
Matches documents even w/o section fields
prior fields, (Bilotti et al 2007) called this
empty fields
Aggregation of all matching fields
P-OR, Max,
Heuristics
From our generative model
Probabilistic-Average

16
Reduction to Keyword Likelihood Model

Assume term tag around each term in collection
Assume no document level smoothing (µd inf, ?d
0)then, no matter with how many empty
smoothing fields, the AVG model degenerates to
the Keyword retrieval model, in the following
way
combine( combineterm( u )
combine( u v )
combineterm( v ) )(same collection level
smoothing Dirichlet/Jelinek-Mercer is preserved)

17
Term Level Smoothing Revisited

Two level Jelinek-Mercer (traditional)
Equivalently (more general parameterization),
Two level Dirichlet (new)
Corrects J-Ms Bias toward shorter fields
Relative gain of matching independent of field
length

18
Experiments

- Smoothing- Evidence combination methods

19
Datasets

XML retrieval
INEX 06, 07 (Wikipedia collection)
Goal evaluate evidence combination (and
smoothing)
Topics (modified) element retrieval ? document
retrieval, e.g. combine( combinesection(w
ifi security))
Assessments (modified) any element relevant ?
document relevant
Smoothing parameters
Trained on INEX 06, 62 topics
Tested on INEX 07, 60 topics

Question Answering
TREC 2002
AQUAINT Corpus
Topics
Training 55 original topics -gt 367 relevant
sentences (new topics)
Test 54 original topics -gt 250 relevant
sentences (new topics)
For example,Question who loves MaryRelevant
sentence John says he loves MaryQuery
combinetarget( love
combine./arg1(Mary) )
Relevance feedback setup, stronger than (Bilotti
et al 2007)

20
Effects of 2-level Dirichlet smoothing
Table 3. A comparison of two-level
Jelinek-Mercer and two-level Dirichlet smoothing
on the INEX and QA datasets.
Structured query Structured query Structured query Keyword query Keyword query Keyword query
Collections Metric 2-level Jelinek-Mercer 2-level Dirichlet Improvement 2-level Jelinek-Mercer 2-level Dirichlet Improvement
INEX06 (training) MRR P_at_10 MAP 0.73020.46940.2900 0.78720.47100.2927 7.8060.34090.9310 0.70170.47900.2918 0.75600.50810.2956 7.7386.0751.302
INEX07 (test) MRR P_at_10 MAP 0.70610.45170.2838 0.73860.46330.2871 4.6032.5681.163 0.65150.44330.2839 0.77340.46670.2979 18.715.2794.931
QA (training) MRR P_at_10 MAP 0.63710.22530.1402 0.67290.23710.1460 5.6195.2374.137 0.52110.20980.1651 0.52270.23360.1755 0.307011.346.299
QA (test) MRR P_at_10 MAP 0.51280.16840.1634 0.51380.17640.1623 0.19504.751-0.6732 0.16490.08080.1197 0.17910.08080.1189 8.6110.0000-0.6683
significance level lt 0.04, significance
level lt 0.002, significance level lt 0.00001
21
Optimal Smoothing Parameters
Datasets Queries Jelinek-Mercer Jelinek-Mercer Dirichlet Dirichlet
Datasets Queries ?1 ?2 µd µc
INEX Keyword 0.6 0.2 Any 800
INEX Structured 0.7 0.1 100 800
QA Keyword 0.6 0.2 10 1000
QA Structured 0.7 0.1 5 50

Optimization with grid search
Optimal values for Dirichlet related to average
length of the fields being queried

22
Evidence Combination Methods
Structured query Metric AVG MAX OR Keyword query
INEX 06 (training) MRR P_at_10 MAP 0.75600.50810.2956 0.78180.50650.3094 0.80930.46610.2914 0.85380.52740.3538
INEX 07 (test) MRR P_at_10 MAP 0.77340.46670.2979 0.75520.45830.3006 0.73860.46330.2871 0.81290.47830.3463
QA (training) MRR P_at_10 MAP 0.42650.17250.1251 0.64520.23730.1501 0.07620.03130.0225 0.52270.23360.1755
QA (test) MRR P_at_10 MAP 0.39470.14280.1264 0.51380.17520.1617 0.07010.03120.0364 0.17910.08080.1189

For QA, MAX is best
For INEX
Evaluation at document level does not discount
irrelevant text portions
Not clear which combination method performs best

23
Better Evaluation for INEX Datasets

NDCG
Assumptions
Degree of relevance is somehow given
user spends similar amount of effort on each
document, and effort decreases in log-rank
With more informative element level judgments
Degree of relevance for a document relevance
density
the proportion of relevant texts (in bytes) in
the document
Discount lower ranked relevant documents
not by docs ranked ahead
but by length (in bytes) of texts ranked ahead
Effectively discounts irrelevant texts ranked
ahead

24
Measuring INEX topics with NDCG
Structured retrieval Structured retrieval AVG AVG MAX MAX OR OR Keyword query Keyword query
Smoothing method Smoothing method J-M1 Dir1 J-M Dir J-M Dir J-M Dir
INEX 06 (training) NDCG_at_10 NDCG_at_20 NDCG_at_30 0.48210.54900.5847 0.50170.56690.5889 0.43200.50120.5184 0.44780.51640.5388 0.40480.46710.4903 0.42380.48570.5091 0.45550.53760.5816 0.52810.59770.6514
INEX 06 (training) AvgLen_at_10 AvgLen_at_20 AvgLen_at_30 5818.65327.45140.0 6268.85621.55228.1 8296.67715.27654.7 7981.17708.27086.6 126351122011144 12361104599792.4 5138.95106.35063.1 9906.88717.48065.9
INEX 07 (test) NDCG_at_10 NDCG_at_20 NDCG_at_30 0.47550.54050.5788 0.52910.59740.6390 0.45550.51410.5574 0.47940.55090.5893 0.49840.55370.5895 0.48270.53620.5729 0.46520.55700.6212 0.50630.57970.6388
INEX 07 (test) AvgLen_at_10 AvgLen_at_20 AvgLen_at_30 3819.64135.33875.3 5218.74704.14493.9 6488.66233.75955.0 6943.86576.16066.1 107129759.28853.6 109009553.58668.3 3464.83367.13493.9 8736.77784.57228.8

p lt 0.007 between AVG and MAX or AVG and OR
No significant difference between AVG and keyword!

25
Error Analysis for INEX06 Queries and Correcting
INEX07 Queries

Two Changes (looking only at training set)
Semantic mismatch with topic (mainly keyword
query) (22/70)
Lacking alternative fields image ?
image,figure
Wrong ANDOR semantics (astronaut AND
cosmonaut) ? (astronaut OR cosmonaut)
Misspellings VanGogh ? Van Gogh
Over-restricted query terms using phrases
1(origin of universe) ? uw4(origin universe)
All article restrictions ? whole document
(34/70)
Proofreading test (INEX07) queries
Retrieval results of the queries are not
referenced in any way.
Only looked at keyword query topic description

26
Performance after query correction

df 30 p lt 0.006, for NDCG_at_10 p lt 0.0004, for
NDCG_at_20 p lt 0.002 for NDCG_at_30

27
Conclusions

A structured query specifies a generative model
for P(QD),model parameters estimated from D,
rank D by P(QD)
Best evidence combination strategy is task
dependent
Dirichlet smoothing corrects bias to short
fields, and outperforms Jelinek-Mercer
Guidance to structured query formulation
Robust structured queries can outperform keyword

28
Acknowledgements