Effectiveness Measures - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Effectiveness Measures

Description:

'the answers delivered by a retrieval system are strictly ordered by the relevance score. ... ones: Q-mre with very small gain values is the most discriminative. fr. ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 20
Provided by: hie2
Category:

less

Transcript and Presenter's Notes

Title: Effectiveness Measures


1
Effectiveness Measures Relevance Functions in
ranking INEX systems
  • Huyen-Trang Vu
  • Patrick Gallinari
  • (LIP6, UPMC, Paris, France)

2
Outline
  • Motivation
  • Evaluation setting
  • Experiments
  • graded relevance scales
  • discrimination power
  • interval performance
  • Conclusion
  • Perspectives

3
2 TREC-style assumptions
  • ties free
  • the answers delivered by a retrieval system
    are strictly ordered by the relevance score.
  • binary relevance
  • a document is either relevant or non relevant
    to a topic.

? simplify relevance judgment measurement
procedures ? BUT are the measured results
appropriate?
4
INEX campaign
  • INitiative for the Evaluation of XML Retrieval
  • retrieval unit arbitrary XML element
  • more ties than usual of document retrieval?
  • graded scales to describe finer rel. levels?
  • 1 opportunity to revise the 2 assumptions
    together!

5
Analysis Methods
  • traditional techniques on system ranking
  • correlation statistics e.g. Kendalls tau
  • observe swap occurrences
  • we adopt also
  • multicomparison statistical test
  • interval performance

6
Outline
  • Motivation
  • Evaluation setting rel. functions measures
  • Experiments
  • graded relevance scales
  • discrimination power
  • interval performance
  • Conclusion
  • Perspectives

7
Experiment setting
  • 4 INEX rel. functions
  • binary trec, s3e321
  • graded gen (5 levels), sog (7 levels)
  • 4 measure versions
  • MAP
  • inex_eval
  • Q-measure ß0.1 and ß10

8
Performance measures
9
Outline
  • Motivation
  • Evaluation setting
  • Experiments
  • graded relevance scales
  • discrimination power
  • interval performance
  • Conclusion
  • Perspectives

10
(No Transcript)
11
Outline
  • Motivation
  • Evaluation setting
  • Experiments
  • graded relevance scales
  • discrimination power
  • interval performance
  • Conclusion
  • Perspectives

12
Discrimination power
  • A single value indicator to express the
    sensitivity of an evaluation setting given a test
    collection capacity to distinguish systems
  • e.g.
  • REER (Voorhees Buckley)
  • P(swap of system pair)
  • size of group A Tague-Sutcliffe Blustein
  • runs which are not statistically different to
    the top run

13
Size of group A
  • binary scales are more stable than graded ones.
  • ß affects discr. power of Q-mre in graded scales
  • low discr. power (12)
  • difficult task?
  • small test collection?

14
Outline
  • Motivation
  • Evaluation setting
  • Experiments
  • graded relevance scales
  • discrimination power
  • interval performance
  • Conclusion
  • Perspectives

15
Interval performance
  • so far point estimator of performance
  • interval performance expresses robustness across
    different topics
  • confidence interval obtained via Bootstrap
    technique

16
Interval Performance calculated by inex_eval
17
Conclusion
  • binary rel. scales high resemblance of these
    measures.
  • graded rel. ones Q-mre with very small gain
    values is the most discriminative.
  • fr. efficiency viewpoint (rel. judgment, ß
    setting, computation complex.) binary MAP on the
    liberal judgment seems an appropriate trade-off.
  • these evaluation settings arent suitable to
    describe system robustness.

18
Perspectives
  • System ranking by other statistics median, geo.
    mean, etc
  • REER (Voorhees Buckley) statistical tests by
    Bootstrap.
  • robust retrieval methods.

19
open questions on retrieval methods
  • modelling graded retrieval status (ADMs
    motivation)?
  • using Q-measure or inex_eval as retrieval
    criteria?
Write a Comment
User Comments (0)
About PowerShow.com