WIDIT at TREC2005 HARD Track - PowerPoint PPT Presentation

1 / 35

About This Presentation

Title:

WIDIT at TREC2005 HARD Track

Description:

WIDIT at TREC2005 HARD Track – PowerPoint PPT presentation

Number of Views:59

Avg rating:3.0/5.0

Slides: 36

Provided by: stude603

Category:

more less

Transcript and Presenter's Notes

Title: WIDIT at TREC2005 HARD Track

1
WIDIT at TREC-2005 HARD Track

Kiduk Yang, Ning Yu, Hui Zhang, Ivan Record,
Shahrier Akram
WIDIT Laboratory
School of Library Information Science
Indiana University at Bloomington

2
Research Questions

HARD question
Can user feedback improve retrieval performance?
What information to get from the user?
Content of CF (CF_content)
How to obtain target information from the user?
User interface of CF (CF_UI)
How to utilize user feedback?
Utilization method of CF_content (CF_UM)
Is user feedback (CF_content, CF_UM effect on
retrieval performance) affected by topic
difficulty and/or baseline performance?

RobustSystem
HardTopics
non-RobustSystem
3
Research Questions

Baseline Run
How can IR system handle difficult queries?
Why are HARD topics difficult? (Harmon Buckley,
2004)
lack of good terms
add good terms
misdirection by non-pivotal terms or partial
concept
identify important terms phrases
Clarification Form
What information to get from user?
How can user help with difficult queries?
identify good/important terms
identify relevant documents
Final Run
How to apply CF data to improve search results?
CF-term expanded query
Rank boosting

4
WIDIT Approach Conceptual

Lack of good terms
Query Expansion
synonyms, definition terms
Web query expansion
CF (i.e., user feedback) terms
Fusion
System misdirection by
Non-relevant text (nrt) in topics
nrt exclusion
Composite concept in topics
CF BoolAnd phrase identification
wd1 AND wd2
Minipar verbnoun, modifiernoun
Multiple concept in topics
important concept identification
noun, noun phrase, OSW, CF

5
WIDIT HARD System Architecture
6
QE Overlapping Sliding Window

Function
identify important phrases
Assumption
phrases appearing in multiple fields/sources tend
to be important
Algorithm
Set window size and the number or maximum words
allowed between words.
Slide window from left to right in a
field/source. For each of the phrase it catches,
look for the same/similar phrase in other
fields/sources.
Output the Overlapping Sliding Window (OSW)
phrase when match is found.
Change source field/source and repeat step 1 to 3
till all the fields/sources have been used.
Application
Topic
title, description, narrative
Definition
WordIQ, Google, Dictionary.com

7
QE NLP1 NLP2

NLP1
Identify nouns noun phrases
uses Brill tagger
Find synonyms
queries WordNet
Find definitions
queries the Web (WordIQ, Google, Dictionary.com)
NLP2
Refine noun phrase identification
uses multiple taggers
Identify best synset based on term context
uses sense disambiguation module by NLP group at
UMN

8
QE Noun Phrase Identification
AND relation
Minipar
POS tagging
Proper noun phrase
Brills Tagger
Topics
Collins Parser
Noun phrase
Simple phrase
Dictionary phrase
Complex phrase
9
QE Web Query Expansion

Basic Idea
Use the Web as a type of thesaurus to find
related terms (Grunfeld et al., 2004 Kwok et
al., 2005)
Method
Web Query Construction
construct web query by selecting 5 most salient
terms from HARD topic
uses NLP-based techniques and rotating window to
identify salient terms
Web Search
query Google with the Web query
Result Parsing Term Selection
parse the top 100 search results (snippets
document texts)
extract up to 60 best terms
uses PIRC algorithm to rank the terms (Grunfeld
et al., 2004 Kwok et al., 2005)

Web Queries
Google
Web Query Generator
Processed Topics
Selected expansion terms
Search Results
Term Selector
Google Parser
10
QE WebX by Rotating Window

Rationale
NLP-based identification of salient/important
term does not always work
Related terms to salient/important query terms
are likely to appear frequently in search results
Method
Rotate a 5-word window across HARD topic
description
generates m queries for a description of m terms
(mgt5)
Query Google
Merge all the results
Rank the documents based on their frequency in m
result lists.
Select 60 terms with highest weight
(length-normalized frequency) from top 100
documents

11
Fusion Baseline Run

Fusion Pool
Query Formulation results
combination of topic fields (title, description,
narrative)
stemming (simple plural stemmer, combo stemmer)
term weights (okapi, SMART)
Query Expansion results
NLP, OSW, WQX
Fusion Formula
Result merging by Weighted Sum
FSws ?(wiNSi) where wi is the weight of
system i (relative contribution of each
system) NSi is the normalized score of a
document by system i NSi (Si Smin) / (Smax
Smin)
Fusion Optimization
Training data
2004 Robust test collection
Automatic Fusion Optimization by Category

12
Automatic Fusion Optimization
Category 1 Top 10 systems
Category n
Category 2 Top system in each query length
Automatic fusion optimization
performance gaingt threshold?
No
Fetching result sets For different categories
optimized fusion formula
Yes
Results pool
13
Clarification Form

Objective
Collect information from the user that can be
used to improve the baseline retrieval result
Strategy
Ask the user to identify and add
relevant/important terms
validation/filtering of system QE results
nouns, synonyms definition terms, OSW NLP
phrases
manual QE terms that system missed
free text box
Ask the user to identify relevant documents
Problem
HARD topics tend to retrieve non-relevant
documents in top ranks
3 minute time limit for each CF
Solution
cluster top 200 results and select best sentence
from each cluster

14
Clarification Form
15
Reranking

Objective
Float low ranking relevant documents to the top
Method
Identify reranking factors
e.g. Phrases (OSW, CF), CF-reldocs
Compute reranking factor scores (rf_sc) for top k
documents
Boost the ranks of documents with rf_sc gt
threshold above rank R
doc_score rf_sc doc_score_at_rankR
Application
Post-retrieval compensation
e.g. phrase matching
Force rank-boosting for trusted input
e.g. CF-UM
Implication
No new relevant documents are retrieved
i.e. no recall improvement
High precision improvement

16
Results Indexing

Document Processing
Simple Stemmer (SS) vs. Combo Stemmer (CS)
consistently superior performance of CS
Term weight
consistently superior performance of Okapi
Topic Processing
Exclusion of non-relevant text (nrt)
consistent but negligible improvement
Implication
existence of nrt effect but ineffective
utilization?
Q use nrt as NOT terms? (active utilization)
Noun identification
helps retrieval in general

17
Results Query Expansion

Web Query Expansion (WebX)
Most effective QE method
Most gain in performance for title query
Adverse effect for description query QE except
for rotating window
Non-WebX Query Expansion
Synonym Definition terms (SynDef)
adverse effect on retrieval performance (noise?)
Proper Noun Phrases
helps retrieval performance for longer queries
Overlapping Sliding Window (OSW)
helps retrieval performance for longer queries
CF terms Nouns, Syndefs Phrases
helps for longer queries

18
Results Composite Effects

Query Length Effect
Without QE
The longer the query, the better the performance
With QE
Title
positive impact on WebX
Description
negative impact on WebX except for Rotating
Window
Description long (title description
narrative)
positive impact on NP, OSW, CF

19
Results Fusion

Query Fusion
Methods
Top Systems (i.e., best expansion methods)
WebX, OSW, NP (Dictionary, Proper Noun)
By Category
WebX, OSW, NP, SynDef
Not better than best overall run
more selective fusion (e.g. fusion optimization)?
WebX domination effect?
Result Fusion
Improves retrieval performance across the board
fusion optimization effect

20
Results Reranking

Reranking Factors
OSW phrases
CF terms (nouns, syndef terms, noun BoolAnd
phrases)
CF relevant documents (CF-reldoc)
Effect
CF-reldoc gt OSW gt CF terms

O OSW D CF-reldoc C CF-term
21
Results Overall Baseline
title only
title desc narr

rank runs 1 massbasetee3 2
uiuc05hardb0 3 massbasetrm3 4
wdoqsz1d2 5 ncarhard05b 6 stra1
7 uwatbaset 8 twenbase1
rank runs 1 saicbase2 2
saicbase1 3 pittbtdn225 4
wdf1t10q1 5 wdf1t3qf2 6 nlprb 7
york05hb3 8 york05hb2 9 meijihilbl2 10
york05hb1
MRP MAP 0.3291 0.3039 0.2723
0.2132 0.2660 0.2451 0.2416 0.1694 0.2184
0.1598 0.2155 0.1598 0.2002 0.1235 0.1116
0.0721
MRP MAP 0.3152 0.2876 0.3021
0.2435 0.2981 0.2637 0.2965 0.2317 0.2961
0.2324 0.2942 0.2586 0.2260 0.1670 0.2258
0.1622 0.2236 0.1654 0.1936 0.1253
22
Results Overall Final
Title only
Title desc narr

runs 1 nlprcf1cf2 2 wf1t10q1rcdx 3
wf1t10q1rodx 4 nlprcf1s2cf2 5 nlprcf1s1cf2 6
nlprcf1 7 nlprcf1wcf2 8 pitthdcomb1 9
nlprcf2 10 nlprcf1s2 11 nlprcf1w 12
nlprcf1s1 13 york05ha1 14 york05ha2 15
saicfinal3 16 york05ha4 17 saicfinal1 18
saicfinal6 19 saicfinal4 20 saicfinal2 21
saicfinal5 22 york05ha5 23 york05ha3

MRP MAP 0.3514 0.3179 0.3451
0.2914 0.3442 0.2918 0.3441 0.3088 0.3429
0.3105 0.3336 0.3007 0.3318 0.2876 0.3242
0.2771 0.3234 0.2745 0.3186 0.2818 0.3154
0.2631 0.3074 0.2745 0.2907 0.2524 0.2904
0.2503 0.2881 0.2488 0.2849 0.2419 0.2826
0.2415 0.2814 0.2469 0.2806 0.2391 0.2657
0.2265 0.2636 0.2343 0.2634 0.2167 0.2344
0.1937
runs 1 masstrms 2 uiuchcfb3 3 masstrmr 4
uiuchtfb3 5 uiuchcfb6 6 uiuchtfb1 7
uiuchcfb1 8 uiuchtfb6 9 masspsgrm3r 10
masspsgrm3 11 wf2t3qs1rodx 12 wf2t3qs1rcx 13
ncarhard05f1 14 ncarhard05f3 15
ncarhard05f2 16 straxprfb 17 uwathardexp1 18
uwathardexp2 19 dublf 20 straxmtg 21
straxmta 22 twendiff1 23 twenblind2 24
twenblind1
MRP MAP 0.3547 0.3223 0.3355
0.3017 0.3353 0.3019 0.3295 0.2928 0.3245
0.2914 0.3237 0.2891 0.3221 0.2900 0.3180
0.2813 0.3082 0.2766 0.3024 0.2688 0.3020
0.2513 0.2838 0.2375 0.2833 0.2346 0.2785
0.2277 0.2677 0.2193 0.2635 0.2088 0.2375
0.1635 0.2342 0.1666 0.1953 0.1448 0.1765
0.1322 0.1740 0.1316 0.1040 0.0642 0.1022
0.0591 0.1014 0.0550
23
Results Overall Improvement
Title only
Title desc narr

runs 1 ncarhard05f1 2 uiuchcfb3 3
wf2t3qs1rodx 4 ncarhard05f3 5
uiuchtfb3 6 uiuchcfb6 7 uiuchtfb1 8
uiuchcfb1 9 ncarhard05f2 10 straxprfb 11
uiuchtfb6 12 wf2t3qs1rcx 13 uwathardexp1 14
uwathardexp2 15 masstrms 16 masstrmr 17
twendiff1 18 twenblind1 19 twenblind2 20
masspsgrm3r 21 masspsgrm3 22 straxmtg
23 straxmta
delta MRPb MRPf 0.0649 0.2184
0.2833 0.0632 0.2723 0.3355 0.0604 0.2416
0.3020 0.0601 0.2184 0.2785 0.0572 0.2723
0.3295 0.0522 0.2723 0.3245 0.0514 0.2723
0.3237 0.0498 0.2723 0.3221 0.0493 0.2184
0.2677 0.0480 0.2155 0.2635 0.0457 0.2723
0.3180 0.0422 0.2416 0.2838 0.0373 0.2002
0.2375 0.0340 0.2002 0.2342 0.0256 0.3291
0.3547 0.0062 0.3291 0.3353 -0.0076 0.1116
0.1040 -0.0102 0.1116 0.1014 -0.0125 0.1147
0.1022 -0.0209 0.3291 0.3082 -0.0267 0.3291
0.3024 -0.0390 0.2155 0.1765 -0.0415 0.2155
0.1740
delta MRPb MRPf 0.0572 0.2942
0.3514 0.0499 0.2942 0.3441 0.0487 0.2942
0.3429 0.0486 0.2965 0.3451 0.0477 0.2965
0.3442 0.0394 0.2942 0.3336 0.0376 0.2942
0.3318 0.0339 0.2903 0.3242 0.0292 0.2942
0.3234 0.0244 0.2942 0.3186 0.0212 0.2942
0.3154 0.0132 0.2942 0.3074 -0.0140 0.3021
0.2881 -0.0195 0.3021 0.2826 -0.0207 0.3021
0.2814 -0.0215 0.3021 0.2806 -0.0364 0.3021
0.2657 -0.0385 0.3021 0.2636
runs 1 nlprcf1cf2 2 nlprcf1s2cf2 3
nlprcf1s1cf2 4 wf1t10q1rcdx 5 wf1t10q1rodx 6
nlprcf1 7 nlprcf1wcf2 8 pitthdcomb1 9
nlprcf2 10 nlprcf1s2 11 nlprcf1w 12
nlprcf1s1 13 saicfinal3 14 saicfinal1 15
saicfinal6 16 saicfinal4 17 saicfinal2 18
saicfinal5
24
WIDIT performance by Topic
25
References

Grunfeld, L., Kwok, K.L., Dinstl, N., Deng, P.
(2004). TREC 2003 Robust, HARD, and QA track
experiments using PIRCS. Proceedings of the 12th
Text Retrieval Conference, 510-521.
Harman, D. Buckley, C. (2004). The NRRC
Reliable Information Access (RIA) workshop.
Proceedings of the 27th Annual International ACM
SIGIR Conference, 528-529.
Kwok, K. L., Grunfeld, L., Sun, H. L., Deng, P.
(2005). TREC2004 robust track experiments using
PIRCS. Proceedings of the 13th Text REtrieval
Conference.
Yang, K., Yu, N. (in press). WIDIT
Fusion-based Approach to Web Search
Optimization. Asian Information Retrieval
Symposium 2005.
Yang, K., Yu, N., George, N., Loehrlen, A.,
MaCaulay, D., Zhang, H., Akram, S., Mei, J.,
Record, I. (in press). WIDIT in TREC2005 HARD,
Robust, and SPAM tracks. Proceedings of the 14th
Text Retrieval Conference.

26
Stemmer Comparison
27
Term Weight Comparison
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
Non-relevant Text Effect
32
Query Length Effect
33
Baseline vs. Re-ranked Baseline
34
Reranking Effect
O OSW D CF-reldoc C CF-term
35
WIDIT Approach Strategic