A Generative Retrieval Model for Structured Documents

1 / 29
About This Presentation
Title:

A Generative Retrieval Model for Structured Documents

Description:

a) //article[about(., music)] b) //article[about(.//section, music)] //section[about(., pop) ... pop.. ...music.. ...pop... Music ... ... music... ... music ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 30
Provided by: carnegieme

less

Transcript and Presenter's Notes

Title: A Generative Retrieval Model for Structured Documents


1
A Generative Retrieval Model for Structured
Documents
  • Le Zhao, Jamie Callan
  • Language Technologies InstituteSchool of
    Computer ScienceCarnegie Mellon University
  • Oct 2008

2
Background
  • Structured documents
  • Author Edited Fields
  • Library systems Title, meta-data of books
  • Web documents HTML, XML
  • Automatic Annotations
  • Part Of Speech, Named Entity, Semantic Role
  • Structured query
  • Human
  • Automatic

3
Example Structured Retrieval Queries
  • XML element retrieval
  • NEXI query (Wikipedia XML)
  • a) //articleabout(., music) b)
    //articleabout(.//section, music)
    //sectionabout(., pop)
  • Question Answering
  • Indri query (ASSERT style SRL annotation)
  • combinesentence( combinetarget(
    love combine./arg0 (
    anyperson ) combine./arg1 (
    Mary ) ) )

music..
..pop..
Music
pop
music
music
D
D
arg0
arg1
John loves Mary
4
Motivation
  • Basis Language Model Inference Net (Indri
    search engine/Lemur)
  • Already supports field retrieval indexing and
    retrieving relations between annotations
  • flexible query language testing new query forms
    promptly
  • Main problems
  • Approximate matching (structure keyword)
  • Evidence combination
  • Extension from keyword retrieval model
  • Approximate structure keyword matching
  • Combining evidence from inner fields
  • Goal Outperform keyword in precision, thru
  • Coherent structured retrieval model better
    understanding
  • Better smoothing Guiding query formulation
  • Finer control via accurate robust structured
    queries

5
Roadmap
  • Brief overview of Indri Field retrieval
  • Existing Problems
  • The generative structured retrieval model
  • Term Field level Smoothing
  • Evidence combination alternatives
  • Experiments
  • Conclusions

6
Indri Document Retrieval
  • combine(iraq war)
  • Scoring scope is document
  • Return a list of scores for docs
  • Language Model built from Scoring scope, smoothed
    with the collection model.
  • Because of smoothing partial matches can also be
    returned.

7
Indri Field Retrieval
  • combinetitle(iraq war)
  • Scoring scope is title
  • Return a list of scores for titles
  • Language Model built from Scoring scope (title),
    smoothed with Document and Collection models.
  • Results on Wikipedia collectionscore document-nu
    mber start end content-1.19104 199636.xm
    l 0 2 Iraq War-1.58453 1595906.xml 0 3
    Anglo-Iraqi War-1.58734 14889.xml 0 3
    Iran-Iraq War-1.87811 184613.xml 0 4 2003
    Iraq war timeline-2.07668 2316643.xml 0 5
    Canada and the Iraq War-2.09957 202304.xml 0 5
    Protests against the Iraq war-2.23997 2656581.x
    ml 0 6 Saddam's Trial and Iran-Iraq
    War-2.35804 1617564.xml 0 7 List of Iraq War
    Victoria Cross Because of smoothing partial
    matches can also be returned.

8
Existing Problems
9
Evidence Combination
  • Topic A document with multiple sections about
    iraq war, and discusses Bushs exit strategy.
  • combine( combinesection(iraq war)
    bush 1(exit strategy) )sections could
    return scores (0.2, 0.2, 0.1, 0.002) for one
    document
  • Some options
  • max (Bilotti et al 2007) Only considers one
    match
  • or Favors many matches, even if weak matches
  • and Biased against many matches, even if good
    matches
  • average Favors many good matches, hurt by weak
    matches
  • What about documents that dont contain a section
    element?
  • But, do have a lot of matching terms?

10
Evidence Combination Methods (1)
AVG OR MAX Result Sentences
-1.545 -0.5075 -0.5085 1) It must takemeasures.
-1.881 -0.8467 -0.8520 2) U.S. investments worldwide could be in jeopardy if other countries take up similar measures.
-2.349 -0.2425 -0.5030 3) Chanting the slogan "takemeasures before they takeour measurements," the Greenpeace activists set up a coffin outside the ministry to draw attention to the deadly combination of atmospheric pollution and rising temperatures in Athens, which are expected to reach 42 degrees centigrade at the weekend.
-2.401 -0.4817 -0.5012 4) SINGAPORE, May 14 (Xinhua) -- The Singapore government will take measures to discourage speculation in the private and tighten credit, particularly for foreigners, Deputy Prime Minister Lee Hsien Loong announced here today.
11
Bias toward short fields
  • Topic Who loves Mary?combine( combinetarget
    ( Loves combine./arg0( anyperson )
    combine./arg1( Mary ) ) )
  • PMLE(qiE) count(qi)/E
  • Produces very skewed scores when E is small
  • E.g., if E 1, PMLE(qiE) is either 0 or 1
  • Biases toward combinetarget(Loves)
  • target usually length 1, arg0/1 longer
  • Ratio between having and not having a target
    match is larger than that of arg0/1, with
    Jelinek-Mercer smoothing

12
The Generative Structured Retrieval model
  • A new framework for structured retrieval
  • A new term-level smoothing method

13
A New Framework
  • combine( combinesection(iraq war) bush
    1(exit strategy))
  • Query
  • Traditional merely the sampled terms
  • New specifies a graphical model, a generation
    process
  • Scoring scope is document
  • For one document, calculate probability of the
    model
  • Sections are used as evidence of relevance for
    the document
  • a hidden variable in the graphical model
  • In general, inner fields are hidden, and used to
    score outer fields.
  • Hidden variables are summed over to produce final
    score
  • Averaging the scores from section fields,
    (uniform prior over sections)

14
A New FrameworkField-Level Smoothing
  • Term level smoothing (traditional)
  • no section contains iraq or war
  • add prior terms to section Dirichlet prior
    from collection model
  • Field level smoothing (new)
  • no section field in document add prior fields

Terms in S
P(wC)
Ss
µ
S
Sections in D
P(wsection, D, C)
Ds
15
A New FrameworkAdvantages
  • Soft matching of sections
  • Matches documents even w/o section fields
  • prior fields, (Bilotti et al 2007) called this
    empty fields
  • Aggregation of all matching fields
  • P-OR, Max,
  • Heuristics
  • From our generative model
  • Probabilistic-Average

16
Reduction to Keyword Likelihood Model
  • Assume term tag around each term in collection
  • Assume no document level smoothing (µd inf, ?d
    0)then, no matter with how many empty
    smoothing fields, the AVG model degenerates to
    the Keyword retrieval model, in the following
    way
  • combine( combineterm( u )
    combine( u v )
    combineterm( v ) )(same collection level
    smoothing Dirichlet/Jelinek-Mercer is preserved)

17
Term Level Smoothing Revisited
  • Two level Jelinek-Mercer (traditional)
  • Equivalently (more general parameterization),
  • Two level Dirichlet (new)
  • Corrects J-Ms Bias toward shorter fields
  • Relative gain of matching independent of field
    length

18
Experiments
  • - Smoothing- Evidence combination methods

19
Datasets
  • XML retrieval
  • INEX 06, 07 (Wikipedia collection)
  • Goal evaluate evidence combination (and
    smoothing)
  • Topics (modified) element retrieval ? document
    retrieval, e.g. combine( combinesection(w
    ifi security))
  • Assessments (modified) any element relevant ?
    document relevant
  • Smoothing parameters
  • Trained on INEX 06, 62 topics
  • Tested on INEX 07, 60 topics
  • Question Answering
  • TREC 2002
  • AQUAINT Corpus
  • Topics
  • Training 55 original topics -gt 367 relevant
    sentences (new topics)
  • Test 54 original topics -gt 250 relevant
    sentences (new topics)
  • For example,Question who loves MaryRelevant
    sentence John says he loves MaryQuery
    combinetarget( love
    combine./arg1(Mary) )
  • Relevance feedback setup, stronger than (Bilotti
    et al 2007)

20
Effects of 2-level Dirichlet smoothing
Table 3. A comparison of two-level
Jelinek-Mercer and two-level Dirichlet smoothing
on the INEX and QA datasets.
Structured query Structured query Structured query Keyword query Keyword query Keyword query
Collections Metric 2-level Jelinek-Mercer 2-level Dirichlet Improvement 2-level Jelinek-Mercer 2-level Dirichlet Improvement
INEX06 (training) MRR P_at_10 MAP 0.73020.46940.2900 0.78720.47100.2927 7.8060.34090.9310 0.70170.47900.2918 0.75600.50810.2956 7.7386.0751.302
INEX07 (test) MRR P_at_10 MAP 0.70610.45170.2838 0.73860.46330.2871 4.6032.5681.163 0.65150.44330.2839 0.77340.46670.2979 18.715.2794.931
QA (training) MRR P_at_10 MAP 0.63710.22530.1402 0.67290.23710.1460 5.6195.2374.137 0.52110.20980.1651 0.52270.23360.1755 0.307011.346.299
QA (test) MRR P_at_10 MAP 0.51280.16840.1634 0.51380.17640.1623 0.19504.751-0.6732 0.16490.08080.1197 0.17910.08080.1189 8.6110.0000-0.6683
significance level lt 0.04, significance
level lt 0.002, significance level lt 0.00001
21
Optimal Smoothing Parameters
Datasets Queries Jelinek-Mercer Jelinek-Mercer Dirichlet Dirichlet
Datasets Queries ?1 ?2 µd µc
INEX Keyword 0.6 0.2 Any 800
INEX Structured 0.7 0.1 100 800
QA Keyword 0.6 0.2 10 1000
QA Structured 0.7 0.1 5 50
  • Optimization with grid search
  • Optimal values for Dirichlet related to average
    length of the fields being queried

22
Evidence Combination Methods
Structured query Metric AVG MAX OR Keyword query
INEX 06 (training) MRR P_at_10 MAP 0.75600.50810.2956 0.78180.50650.3094 0.80930.46610.2914 0.85380.52740.3538
INEX 07 (test) MRR P_at_10 MAP 0.77340.46670.2979 0.75520.45830.3006 0.73860.46330.2871 0.81290.47830.3463
QA (training) MRR P_at_10 MAP 0.42650.17250.1251 0.64520.23730.1501 0.07620.03130.0225 0.52270.23360.1755
QA (test) MRR P_at_10 MAP 0.39470.14280.1264 0.51380.17520.1617 0.07010.03120.0364 0.17910.08080.1189
  • For QA, MAX is best
  • For INEX
  • Evaluation at document level does not discount
    irrelevant text portions
  • Not clear which combination method performs best

23
Better Evaluation for INEX Datasets
  • NDCG
  • Assumptions
  • Degree of relevance is somehow given
  • user spends similar amount of effort on each
    document, and effort decreases in log-rank
  • With more informative element level judgments
  • Degree of relevance for a document relevance
    density
  • the proportion of relevant texts (in bytes) in
    the document
  • Discount lower ranked relevant documents
  • not by docs ranked ahead
  • but by length (in bytes) of texts ranked ahead
  • Effectively discounts irrelevant texts ranked
    ahead

24
Measuring INEX topics with NDCG
Structured retrieval Structured retrieval AVG AVG MAX MAX OR OR Keyword query Keyword query
Smoothing method Smoothing method J-M1 Dir1 J-M Dir J-M Dir J-M Dir
INEX 06 (training) NDCG_at_10 NDCG_at_20 NDCG_at_30 0.48210.54900.5847 0.50170.56690.5889 0.43200.50120.5184 0.44780.51640.5388 0.40480.46710.4903 0.42380.48570.5091 0.45550.53760.5816 0.52810.59770.6514
INEX 06 (training) AvgLen_at_10 AvgLen_at_20 AvgLen_at_30 5818.65327.45140.0 6268.85621.55228.1 8296.67715.27654.7 7981.17708.27086.6 126351122011144 12361104599792.4 5138.95106.35063.1 9906.88717.48065.9
INEX 07 (test) NDCG_at_10 NDCG_at_20 NDCG_at_30 0.47550.54050.5788 0.52910.59740.6390 0.45550.51410.5574 0.47940.55090.5893 0.49840.55370.5895 0.48270.53620.5729 0.46520.55700.6212 0.50630.57970.6388
INEX 07 (test) AvgLen_at_10 AvgLen_at_20 AvgLen_at_30 3819.64135.33875.3 5218.74704.14493.9 6488.66233.75955.0 6943.86576.16066.1 107129759.28853.6 109009553.58668.3 3464.83367.13493.9 8736.77784.57228.8
  • p lt 0.007 between AVG and MAX or AVG and OR
  • No significant difference between AVG and keyword!

25
Error Analysis for INEX06 Queries and Correcting
INEX07 Queries
  • Two Changes (looking only at training set)
  • Semantic mismatch with topic (mainly keyword
    query) (22/70)
  • Lacking alternative fields image ?
    image,figure
  • Wrong ANDOR semantics (astronaut AND
    cosmonaut) ? (astronaut OR cosmonaut)
  • Misspellings VanGogh ? Van Gogh
  • Over-restricted query terms using phrases
    1(origin of universe) ? uw4(origin universe)
  • All article restrictions ? whole document
    (34/70)
  • Proofreading test (INEX07) queries
  • Retrieval results of the queries are not
    referenced in any way.
  • Only looked at keyword query topic description

26
Performance after query correction
  • df 30 p lt 0.006, for NDCG_at_10 p lt 0.0004, for
    NDCG_at_20 p lt 0.002 for NDCG_at_30

27
Conclusions
  • A structured query specifies a generative model
    for P(QD),model parameters estimated from D,
    rank D by P(QD)
  • Best evidence combination strategy is task
    dependent
  • Dirichlet smoothing corrects bias to short
    fields, and outperforms Jelinek-Mercer
  • Guidance to structured query formulation
  • Robust structured queries can outperform keyword

28
Acknowledgements
  • Paul Ogilvie
  • Matthew Bilotti
  • Eric Nyberg
  • Mark Hoy

29
Thanks!
  • Comments Questions?
Write a Comment
User Comments (0)