Retrieval Evaluation Chapter 3 - PowerPoint PPT Presentation

1 / 21

About This Presentation

Title:

Retrieval Evaluation Chapter 3

Description:

Number of Views:66

Avg rating:3.0/5.0

Slides: 22

Provided by: KCK86

Category:

Tags: chapter | evaluation | retrieval | seta

Transcript and Presenter's Notes

Title: Retrieval Evaluation Chapter 3

1
Retrieval Evaluation (Chapter 3)

2
Introduction

Test reference collection consists of
collection of documents
set of example information requests
set of relevant documents (provided by
specialists) for each example information request
Example test reference collection
TIPSTER/TREC (Section 3.3.1)
CACM, ISI (Section 3.3.2)

Given retrieval strategy S, evaluation measure
quantifies (for each example query)
similarity between set of documents retrieved by
S and set of relevant documents provided by
specialists
Evaluation measure
recall
precision

5
Retrieval Performance Evaluation

6
Recall and Precision

Consider information request I and its set R of
relevant documents
let R be no. of documents in set
assume that given retrieval strategy (being
evaluated) processes I and generates document
answer set A
let A be no. of documents in set
let Ra be no. of documents in intersection of
sets R and A

7
Recall and Precision (Cont.)

Proper evaluation requires plotting a precision
versus recall curves
Assume set Rq (defined) contain relevant
documents for query q
Assume Rq d3, d5, d9, d25, d39, d44, d56, d71,
d89, d123
Hence, according to group of specialists, there
are ten documents relevant to query q

Assume that a retrieval algorithm returns, for
query q, a ranking of documents in answer set as
follows
Ranking for query q
1 d123 ? 6 d9 ? 11 d38
2 d84 7 d511 12 d48
3 d56 ? 8 d129 13 d250
4 d6 9 d187 14 d113
5 d8 10 d25 ? 15 d3 ?

So far, the precision and recall figures are for
single query (see Fig. 3.2)
However, retrieval algorithms usually evaluated
for several distinct queries
To evaluate retrieval performance of algorithm
over all test queries, the precision at each
recall level averaged

(r) is avg. precision at recall level r, Nq is
no. of queries used, and Pi(r) is precision at
recall level r for i-th query
since recall levels for each query distinct from
11 standard recall levels, interpolation often
necessary

16
Interpolation

Let rj, j ? 0, 1, 2, , 10 be reference to j-th
standard recall level (I.e. r5 is reference to
recall level 50)
P(rj) max rj?r?rj1 P(r)
interpolated precision at j-th standard recall
level is maximum known precision at any recall
level between j-th recall level and (j1)-th
recall level

17
Interpolation (Cont.)

In our last example, the interpolation rule
yields following precision and recall (see Fig.
3.3)
at recall levels 0, 10, 20 and 30,
interpolated precision is 33.3 (known precision
at recall level 33.3)
at recall levels 40, 50 and 60, the
interpolated precision is 25 (known precision at
recall level 66.6)
at recall levels 70, 80, 90 and 100,
interpolated precision is 20

Average precision versus recall figures used to
compare retrieval performance of distinct
retrieval algorithms
standard evaluation strategy for IR systems
Fig. 3.4 illustrates average precision versus
recall figures for two distinct retrieval
algorithms
one algorithm has higher precision at lower
recall levels
second algorithm is superior at higher recall
levels

19
Trec Collection

Document Collection
TREC-3 2 gigabytes
TREC-6 5.8 gigabytes
documents come from Wall Street Journal, AP,
ZIFF, FR, DOE, San Jose Mercury News, US Patents,
Financial Times, CR, FBIS, LA Times (see Table
3.1 for TREC-6)
example of TREC document numbered WSJ880406-0090
shown in Fig. 4.7

Relevant Documents
at TREC conferences, set of relevant documents
for each topic obtained from pool of possible
relevant documents
this pool created by taking top K (usually,
K100) documents in rankings generated by various
retrieval systems
documents in pool shown to human assessors who
then decide on relevance of each document

Write a Comment

User Comments (0)