Developments in Evaluation of Search Engines - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Developments in Evaluation of Search Engines

Description:

Relevance assessment requires many person hours. 7. Query pooling ... of the 21st annual international ACM-SIGIR conference on Research and ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 24
Provided by: marks195
Category:

less

Transcript and Presenter's Notes

Title: Developments in Evaluation of Search Engines


1
Developments in Evaluation of Search Engines
  • Mark Sanderson,
  • University of Sheffield

2
Evaluation in IR
  • Use a test collection set of
  • Documents
  • Topics
  • Relevance Judgements

3
How to get lots of judgements?
  • Do you check all documents for all topics?
  • In the old days
  • Yes
  • But this doesnt scale

4
To form larger test collections
  • Get your relevance judgements
  • from pools
  • How does that work?

5
Pooling many participants
Collection
  • Run 1
  • Run 2
  • Run 3
  • Run 4
  • Run 5
  • Run 6
  • Run 7

6
Classic pool formation
  • 10-100 runs
  • Judge 1-2K documents per topic
  • 10-20 hours per topic
  • 50 topics, too much effort for one person

7
Look at the two problem areas
  • Pooling requires many participants
  • Relevance assessment requires many person hours

8
Query pooling
  • Dont have multiple runs from groups
  • Have one person create multiple queries

9
Query pooling
  • First proposed by
  • Cormack, G.V., Palmer, R.P., Clarke, C.L.A.
    (1998) Efficient Constructions of Large Test
    Collections, in Proceedings of the 21st annual
    international ACM-SIGIR conference on Research
    and development in information retrieval 282-289
  • Confirmed by
  • Forming test collections with no system pooling,
    M. Sanderson, H. Joho, In the 27th ACM Conference
    of the Special Interest Group in Information
    Retrieval 2004.

10
Query pooling
Collection
  • Nuclear waste dumping
  • Radioactive waste
  • Radioactive waste storage
  • Hazardous waste
  • Nuclear waste storage
  • Utah nuclear waste
  • Waste dump

11
Another approach
  • Maybe your assessors
  • Can read very fast,
  • but cant search very well.
  • Form different queries with relevant feedback

12
Query pooling, relevance feedback
Collection
  • Nuclear waste dumping
  • Feedback 1
  • Feedback 2
  • Feedback 3
  • Feedback 4
  • Feedback 5
  • Feedback 6

13
Relevance feedback
  • Use relevance feedback to form queries
  • Soboroff, I, Robertson, S. (2003) Building a
    filtering test collection for TREC 2002, in
    Proceedings of the ACM SIGIR conference.

14
Both options save time
  • With query pooling
  • 2 hours per topic
  • With system pooling
  • 10-20 hours per topic?

15
Notice, didnt get everything
  • How much was missed?
  • Attempts to estimate
  • Zobel,
  • ACM SIGIR 1998
  • Manmatha,
  • ACM SIGIR 2001

P(r)
Rank
1
16
Do missing Rels matter?
  • For conventional IR testing?
  • No not interested in such things
  • Just want to know
  • AgtB
  • AB
  • AltB

17
Not good enough?
  • 1-2 hours per topic still a lot of work
  • Hints that 50 topics are too few
  • Million query task of TREC
  • What can we do?

18
Test collections are
  • Reproducible
  • Reusable
  • Encourage collaboration
  • Cross comparison
  • Tell you if your new idea works
  • Help you publish your work

19
How do you do this?
  • Focus on reducing number of relevance assessments

20
Simple approach
  • TREC/CLEF judge down to
  • top 100 (sometimes 50)
  • Judge down to top 10
  • Far fewer documents
  • 11-14 relevance assessor effort
  • Compared to top 100

21
Impact of saving
  • Save a lot of time
  • Loose a little in measurement accuracy

22
Use time saved
  • To work on more topics
  • Measurement accuracy improves.
  • M. Sanderson, J. Zobel (2005) Information
    Retrieval System Evaluation Effort, Sensitivity,
    and Reliability, in the proceedings of the 28th
    ACM SIGIR conference

23
Questions?
  • m.sanderson_at_shef.ac.uk
  • dis.shef.ac.uk/mark
Write a Comment
User Comments (0)
About PowerShow.com