Title: WIDIT at TREC2005 HARD Track
1WIDIT at TREC-2005 HARD Track
- Kiduk Yang, Ning Yu, Hui Zhang, Ivan Record,
Shahrier Akram - WIDIT Laboratory
- School of Library Information Science
- Indiana University at Bloomington
2Research Questions
- HARD question
- Can user feedback improve retrieval performance?
- What information to get from the user?
- Content of CF (CF_content)
- How to obtain target information from the user?
- User interface of CF (CF_UI)
- How to utilize user feedback?
- Utilization method of CF_content (CF_UM)
- Is user feedback (CF_content, CF_UM effect on
retrieval performance) affected by topic
difficulty and/or baseline performance?
RobustSystem
HardTopics
non-RobustSystem
3Research Questions
- Baseline Run
- How can IR system handle difficult queries?
- Why are HARD topics difficult? (Harmon Buckley,
2004) - lack of good terms
- add good terms
- misdirection by non-pivotal terms or partial
concept - identify important terms phrases
- Clarification Form
- What information to get from user?
- How can user help with difficult queries?
- identify good/important terms
- identify relevant documents
- Final Run
- How to apply CF data to improve search results?
- CF-term expanded query
- Rank boosting
4WIDIT Approach Conceptual
- Lack of good terms
- Query Expansion
- synonyms, definition terms
- Web query expansion
- CF (i.e., user feedback) terms
- Fusion
- System misdirection by
- Non-relevant text (nrt) in topics
- nrt exclusion
- Composite concept in topics
- CF BoolAnd phrase identification
- wd1 AND wd2
- Minipar verbnoun, modifiernoun
- Multiple concept in topics
- important concept identification
- noun, noun phrase, OSW, CF
5WIDIT HARD System Architecture
6QE Overlapping Sliding Window
- Function
- identify important phrases
- Assumption
- phrases appearing in multiple fields/sources tend
to be important - Algorithm
- Set window size and the number or maximum words
allowed between words. - Slide window from left to right in a
field/source. For each of the phrase it catches,
look for the same/similar phrase in other
fields/sources. - Output the Overlapping Sliding Window (OSW)
phrase when match is found. - Change source field/source and repeat step 1 to 3
till all the fields/sources have been used. - Application
- Topic
- title, description, narrative
- Definition
- WordIQ, Google, Dictionary.com
7QE NLP1 NLP2
- NLP1
- Identify nouns noun phrases
- uses Brill tagger
- Find synonyms
- queries WordNet
- Find definitions
- queries the Web (WordIQ, Google, Dictionary.com)
- NLP2
- Refine noun phrase identification
- uses multiple taggers
- Identify best synset based on term context
- uses sense disambiguation module by NLP group at
UMN
8QE Noun Phrase Identification
AND relation
Minipar
POS tagging
Proper noun phrase
Brills Tagger
Topics
Collins Parser
Noun phrase
Simple phrase
Dictionary phrase
Complex phrase
9QE Web Query Expansion
- Basic Idea
- Use the Web as a type of thesaurus to find
related terms (Grunfeld et al., 2004 Kwok et
al., 2005) - Method
- Web Query Construction
- construct web query by selecting 5 most salient
terms from HARD topic - uses NLP-based techniques and rotating window to
identify salient terms - Web Search
- query Google with the Web query
- Result Parsing Term Selection
- parse the top 100 search results (snippets
document texts) - extract up to 60 best terms
- uses PIRC algorithm to rank the terms (Grunfeld
et al., 2004 Kwok et al., 2005)
Web Queries
Google
Web Query Generator
Processed Topics
Selected expansion terms
Search Results
Term Selector
Google Parser
10QE WebX by Rotating Window
- Rationale
- NLP-based identification of salient/important
term does not always work - Related terms to salient/important query terms
are likely to appear frequently in search results - Method
- Rotate a 5-word window across HARD topic
description - generates m queries for a description of m terms
(mgt5) - Query Google
- Merge all the results
- Rank the documents based on their frequency in m
result lists. - Select 60 terms with highest weight
(length-normalized frequency) from top 100
documents
11Fusion Baseline Run
- Fusion Pool
- Query Formulation results
- combination of topic fields (title, description,
narrative) - stemming (simple plural stemmer, combo stemmer)
- term weights (okapi, SMART)
- Query Expansion results
- NLP, OSW, WQX
- Fusion Formula
- Result merging by Weighted Sum
- FSws ?(wiNSi) where wi is the weight of
system i (relative contribution of each
system) NSi is the normalized score of a
document by system i NSi (Si Smin) / (Smax
Smin) - Fusion Optimization
- Training data
- 2004 Robust test collection
- Automatic Fusion Optimization by Category
12Automatic Fusion Optimization
Category 1 Top 10 systems
Category n
Category 2 Top system in each query length
Automatic fusion optimization
performance gaingt threshold?
No
Fetching result sets For different categories
optimized fusion formula
Yes
Results pool
13Clarification Form
- Objective
- Collect information from the user that can be
used to improve the baseline retrieval result - Strategy
- Ask the user to identify and add
relevant/important terms - validation/filtering of system QE results
- nouns, synonyms definition terms, OSW NLP
phrases - manual QE terms that system missed
- free text box
- Ask the user to identify relevant documents
- Problem
- HARD topics tend to retrieve non-relevant
documents in top ranks - 3 minute time limit for each CF
- Solution
- cluster top 200 results and select best sentence
from each cluster
14Clarification Form
15Reranking
- Objective
- Float low ranking relevant documents to the top
- Method
- Identify reranking factors
- e.g. Phrases (OSW, CF), CF-reldocs
- Compute reranking factor scores (rf_sc) for top k
documents - Boost the ranks of documents with rf_sc gt
threshold above rank R - doc_score rf_sc doc_score_at_rankR
- Application
- Post-retrieval compensation
- e.g. phrase matching
- Force rank-boosting for trusted input
- e.g. CF-UM
- Implication
- No new relevant documents are retrieved
- i.e. no recall improvement
- High precision improvement
16Results Indexing
- Document Processing
- Simple Stemmer (SS) vs. Combo Stemmer (CS)
- consistently superior performance of CS
- Term weight
- consistently superior performance of Okapi
- Topic Processing
- Exclusion of non-relevant text (nrt)
- consistent but negligible improvement
- Implication
- existence of nrt effect but ineffective
utilization? - Q use nrt as NOT terms? (active utilization)
- Noun identification
- helps retrieval in general
17Results Query Expansion
- Web Query Expansion (WebX)
- Most effective QE method
- Most gain in performance for title query
- Adverse effect for description query QE except
for rotating window - Non-WebX Query Expansion
- Synonym Definition terms (SynDef)
- adverse effect on retrieval performance (noise?)
- Proper Noun Phrases
- helps retrieval performance for longer queries
- Overlapping Sliding Window (OSW)
- helps retrieval performance for longer queries
- CF terms Nouns, Syndefs Phrases
- helps for longer queries
18Results Composite Effects
- Query Length Effect
- Without QE
- The longer the query, the better the performance
- With QE
- Title
- positive impact on WebX
- Description
- negative impact on WebX except for Rotating
Window - Description long (title description
narrative) - positive impact on NP, OSW, CF
19Results Fusion
- Query Fusion
- Methods
- Top Systems (i.e., best expansion methods)
- WebX, OSW, NP (Dictionary, Proper Noun)
- By Category
- WebX, OSW, NP, SynDef
- Not better than best overall run
- more selective fusion (e.g. fusion optimization)?
- WebX domination effect?
- Result Fusion
- Improves retrieval performance across the board
- fusion optimization effect
20Results Reranking
- Reranking Factors
- OSW phrases
- CF terms (nouns, syndef terms, noun BoolAnd
phrases) - CF relevant documents (CF-reldoc)
- Effect
- CF-reldoc gt OSW gt CF terms
O OSW D CF-reldoc C CF-term
21Results Overall Baseline
title only
title desc narr
rank runs 1 massbasetee3 2
uiuc05hardb0 3 massbasetrm3 4
wdoqsz1d2 5 ncarhard05b 6 stra1
7 uwatbaset 8 twenbase1
rank runs 1 saicbase2 2
saicbase1 3 pittbtdn225 4
wdf1t10q1 5 wdf1t3qf2 6 nlprb 7
york05hb3 8 york05hb2 9 meijihilbl2 10
york05hb1
MRP MAP 0.3291 0.3039 0.2723
0.2132 0.2660 0.2451 0.2416 0.1694 0.2184
0.1598 0.2155 0.1598 0.2002 0.1235 0.1116
0.0721
MRP MAP 0.3152 0.2876 0.3021
0.2435 0.2981 0.2637 0.2965 0.2317 0.2961
0.2324 0.2942 0.2586 0.2260 0.1670 0.2258
0.1622 0.2236 0.1654 0.1936 0.1253
22Results Overall Final
Title only
Title desc narr
runs 1 nlprcf1cf2 2 wf1t10q1rcdx 3
wf1t10q1rodx 4 nlprcf1s2cf2 5 nlprcf1s1cf2 6
nlprcf1 7 nlprcf1wcf2 8 pitthdcomb1 9
nlprcf2 10 nlprcf1s2 11 nlprcf1w 12
nlprcf1s1 13 york05ha1 14 york05ha2 15
saicfinal3 16 york05ha4 17 saicfinal1 18
saicfinal6 19 saicfinal4 20 saicfinal2 21
saicfinal5 22 york05ha5 23 york05ha3
MRP MAP 0.3514 0.3179 0.3451
0.2914 0.3442 0.2918 0.3441 0.3088 0.3429
0.3105 0.3336 0.3007 0.3318 0.2876 0.3242
0.2771 0.3234 0.2745 0.3186 0.2818 0.3154
0.2631 0.3074 0.2745 0.2907 0.2524 0.2904
0.2503 0.2881 0.2488 0.2849 0.2419 0.2826
0.2415 0.2814 0.2469 0.2806 0.2391 0.2657
0.2265 0.2636 0.2343 0.2634 0.2167 0.2344
0.1937
runs 1 masstrms 2 uiuchcfb3 3 masstrmr 4
uiuchtfb3 5 uiuchcfb6 6 uiuchtfb1 7
uiuchcfb1 8 uiuchtfb6 9 masspsgrm3r 10
masspsgrm3 11 wf2t3qs1rodx 12 wf2t3qs1rcx 13
ncarhard05f1 14 ncarhard05f3 15
ncarhard05f2 16 straxprfb 17 uwathardexp1 18
uwathardexp2 19 dublf 20 straxmtg 21
straxmta 22 twendiff1 23 twenblind2 24
twenblind1
MRP MAP 0.3547 0.3223 0.3355
0.3017 0.3353 0.3019 0.3295 0.2928 0.3245
0.2914 0.3237 0.2891 0.3221 0.2900 0.3180
0.2813 0.3082 0.2766 0.3024 0.2688 0.3020
0.2513 0.2838 0.2375 0.2833 0.2346 0.2785
0.2277 0.2677 0.2193 0.2635 0.2088 0.2375
0.1635 0.2342 0.1666 0.1953 0.1448 0.1765
0.1322 0.1740 0.1316 0.1040 0.0642 0.1022
0.0591 0.1014 0.0550
23Results Overall Improvement
Title only
Title desc narr
runs 1 ncarhard05f1 2 uiuchcfb3 3
wf2t3qs1rodx 4 ncarhard05f3 5
uiuchtfb3 6 uiuchcfb6 7 uiuchtfb1 8
uiuchcfb1 9 ncarhard05f2 10 straxprfb 11
uiuchtfb6 12 wf2t3qs1rcx 13 uwathardexp1 14
uwathardexp2 15 masstrms 16 masstrmr 17
twendiff1 18 twenblind1 19 twenblind2 20
masspsgrm3r 21 masspsgrm3 22 straxmtg
23 straxmta
delta MRPb MRPf 0.0649 0.2184
0.2833 0.0632 0.2723 0.3355 0.0604 0.2416
0.3020 0.0601 0.2184 0.2785 0.0572 0.2723
0.3295 0.0522 0.2723 0.3245 0.0514 0.2723
0.3237 0.0498 0.2723 0.3221 0.0493 0.2184
0.2677 0.0480 0.2155 0.2635 0.0457 0.2723
0.3180 0.0422 0.2416 0.2838 0.0373 0.2002
0.2375 0.0340 0.2002 0.2342 0.0256 0.3291
0.3547 0.0062 0.3291 0.3353 -0.0076 0.1116
0.1040 -0.0102 0.1116 0.1014 -0.0125 0.1147
0.1022 -0.0209 0.3291 0.3082 -0.0267 0.3291
0.3024 -0.0390 0.2155 0.1765 -0.0415 0.2155
0.1740
delta MRPb MRPf 0.0572 0.2942
0.3514 0.0499 0.2942 0.3441 0.0487 0.2942
0.3429 0.0486 0.2965 0.3451 0.0477 0.2965
0.3442 0.0394 0.2942 0.3336 0.0376 0.2942
0.3318 0.0339 0.2903 0.3242 0.0292 0.2942
0.3234 0.0244 0.2942 0.3186 0.0212 0.2942
0.3154 0.0132 0.2942 0.3074 -0.0140 0.3021
0.2881 -0.0195 0.3021 0.2826 -0.0207 0.3021
0.2814 -0.0215 0.3021 0.2806 -0.0364 0.3021
0.2657 -0.0385 0.3021 0.2636
runs 1 nlprcf1cf2 2 nlprcf1s2cf2 3
nlprcf1s1cf2 4 wf1t10q1rcdx 5 wf1t10q1rodx 6
nlprcf1 7 nlprcf1wcf2 8 pitthdcomb1 9
nlprcf2 10 nlprcf1s2 11 nlprcf1w 12
nlprcf1s1 13 saicfinal3 14 saicfinal1 15
saicfinal6 16 saicfinal4 17 saicfinal2 18
saicfinal5
24WIDIT performance by Topic
25References
- Grunfeld, L., Kwok, K.L., Dinstl, N., Deng, P.
(2004). TREC 2003 Robust, HARD, and QA track
experiments using PIRCS. Proceedings of the 12th
Text Retrieval Conference, 510-521. - Harman, D. Buckley, C. (2004). The NRRC
Reliable Information Access (RIA) workshop.
Proceedings of the 27th Annual International ACM
SIGIR Conference, 528-529. - Kwok, K. L., Grunfeld, L., Sun, H. L., Deng, P.
(2005). TREC2004 robust track experiments using
PIRCS. Proceedings of the 13th Text REtrieval
Conference. - Yang, K., Yu, N. (in press). WIDIT
Fusion-based Approach to Web Search
Optimization. Asian Information Retrieval
Symposium 2005. - Yang, K., Yu, N., George, N., Loehrlen, A.,
MaCaulay, D., Zhang, H., Akram, S., Mei, J.,
Record, I. (in press). WIDIT in TREC2005 HARD,
Robust, and SPAM tracks. Proceedings of the 14th
Text Retrieval Conference.
26Stemmer Comparison
27Term Weight Comparison
28(No Transcript)
29(No Transcript)
30(No Transcript)
31Non-relevant Text Effect
32Query Length Effect
33Baseline vs. Re-ranked Baseline
34Reranking Effect
O OSW D CF-reldoc C CF-term
35WIDIT Approach Strategic
- Baseline Run
- Automatic Query Expansion
- add related terms
- synonym identification, definition term
extraction, Web query expansion - identify important query terms
- noun phrase extraction, keyword extraction by
overlapping sliding window - Fusion
- Clarification Form
- User Feedback
- identify related/important query terms
- identify relevant documents
- Final Run
- Query Expansion by CF terms
- Fusion
- Post-retrieval Reranking