Title: Collection Fusion
1Collection Fusion
-Parallel Retrieval on Different Information
Sources (e.g. Different Search
Engines, or different collections) -Merger of
results
- MetaCrawler - University of Washington (Selberg
Etzioni,1995) - Towell Vorhees
2Collection Fusion
Merging results from different Search Engines
(Web brokers)
Lycos Altavista Infoseek
Excite Joes Bot
1 2 3 4 5 6 7 8 9 10 11 12
.99 .98 .96 .94 .94 .92 .92
4 4 4 3.5 3.2 3.0 2.1
.99 .97 .97 .95 .95 .92
Rank
-Different Methods (Good Thing) -Merge by
downloading all and rerank using private
relevance scheme
Bayes Nets
Bag of words
3Collection Fusion
- Issues
- Different weighting and relevance scales
(logarithmic, linear, different ranges) - No ranking or weighting in some cases
- Different sizes of response set
- Different biases of collections
- Duplicate identification and removal
- Cost (money) or latency/bandwidth (time) as
factor in relevance ranking
4Goal
- Learn
- Ranking scale
- Ranking Reliability
- Relevance Ratio
- Function
- Rank(CF) a1f(Rank(A1)) a2f(Rank(A2))
ai 1/k, with knumber of collections
May need log transfer and/or scale shift
5Issues
-Duplicate Identification and Removal -Link
Checking (Reliability)
6Impact on Service Provider
-Charge Per Access -Advertising Solutions?
7Rank-Driven Collection Fusion
Rank(CF,di) S aj f(Rank(collectionj, di))
j Î collections
May need log transform or scale shift
Rank of document i in collection j
Will depend on collections overall relevance and
reliability of rankings
8Collection 1
CF
Collection 2
Assuming -Relevance µ rank -Collection sizes are
equal -Smaller returned set? More selective
9Collection 1
CF
Collection 2
Assuming -Relevance µ -Equal selectivity -Smalle
r returned set? Smaller collection
rank
Total Returned
10lexcite .01 f(correlation with my judgements)
lAltavista .50
Merge using Relevance judgments of different
search engines
)
My rank or relevance
f(
service provider
Their rankings Rel judgments
Their past performance
- nature of scale used
11Sample-Based Relevance
sample
Collection 1
.99 .96 .96 .95 .95 .94 .94 .93 .92 .91
1.00
100
ideal
System 1
System Ranking
User Ranking
System 2
User
(Collective Judgment)
System 3
0
System 4
Collection 2
.99 .96 .96 .95 .95 .94 .94 .93 .92 .91
100
0.0
1.00
0.0
System
0