Retrieval Evaluation Chapter 3 - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Retrieval Evaluation Chapter 3

Description:

Assumes that all documents in set A has been examined ... Thus, recall and precision measures vary as user proceeds examining set A ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 22
Provided by: KCK86
Category:

less

Transcript and Presenter's Notes

Title: Retrieval Evaluation Chapter 3


1
Retrieval Evaluation (Chapter 3)
  • ..
  • .

2
Introduction
  • Before final implementation of IR system,
    evaluation is carried out
  • IR system require evaluation of how precise
    answer set is
  • retrieval performance evaluation
  • Such evaluation based on test reference
    collection and on evaluation measure

3
  • Test reference collection consists of
  • collection of documents
  • set of example information requests
  • set of relevant documents (provided by
    specialists) for each example information request
  • Example test reference collection
  • TIPSTER/TREC (Section 3.3.1)
  • CACM, ISI (Section 3.3.2)

4
  • Given retrieval strategy S, evaluation measure
    quantifies (for each example query)
  • similarity between set of documents retrieved by
    S and set of relevant documents provided by
    specialists
  • Evaluation measure
  • recall
  • precision

5
Retrieval Performance Evaluation
  • Consideration criteria for retrieval performance
    evaluation
  • nature of query request (batch or interactive)
  • nature of setting (laboratory or real life
    situation)
  • nature of interface (batch mode or interactive
    mode)

6
Recall and Precision
  • Consider information request I and its set R of
    relevant documents
  • let R be no. of documents in set
  • assume that given retrieval strategy (being
    evaluated) processes I and generates document
    answer set A
  • let A be no. of documents in set
  • let Ra be no. of documents in intersection of
    sets R and A

7
Recall and Precision (Cont.)
  • Recall is fraction of relevant documents (set R)
    retrieved
  • recall
  • Precision is fraction of retrieved documents
    (set A) which is relevant
  • precision

8
  • Assumes that all documents in set A has been
    examined
  • However, user not usually presented with all
    documents in set A at once
  • Documents in set A ranked and user examines
    ranked list starting from top
  • Thus, recall and precision measures vary as user
    proceeds examining set A

9
  • Proper evaluation requires plotting a precision
    versus recall curves
  • Assume set Rq (defined) contain relevant
    documents for query q
  • Assume Rq d3, d5, d9, d25, d39, d44, d56, d71,
    d89, d123
  • Hence, according to group of specialists, there
    are ten documents relevant to query q

10
  • Assume that a retrieval algorithm returns, for
    query q, a ranking of documents in answer set as
    follows
  • Ranking for query q
  • 1 d123 ? 6 d9 ? 11 d38
  • 2 d84 7 d511 12 d48
  • 3 d56 ? 8 d129 13 d250
  • 4 d6 9 d187 14 d113
  • 5 d8 10 d25 ? 15 d3 ?

11
  • Document d123 (rank 1) is relevant
  • precision of 100 (1 out of 1 document examined
    in answer set is relevant)
  • 10 recall (1 of 10 relevant documents in set Rq)
  • Document d56 (rank 3) is next relevant
  • precision of 66 (2 out of 3 documents examined
    in answer set is relevant)
  • 20 recall (2 of 10 relevant documents in set Rq)

12
  • So far, the precision and recall figures are for
    single query (see Fig. 3.2)
  • However, retrieval algorithms usually evaluated
    for several distinct queries
  • To evaluate retrieval performance of algorithm
    over all test queries, the precision at each
    recall level averaged

13
  • (r) is avg. precision at recall level r, Nq is
    no. of queries used, and Pi(r) is precision at
    recall level r for i-th query
  • since recall levels for each query distinct from
    11 standard recall levels, interpolation often
    necessary

14
  • Assume that relevant document set Rq for query q
    changed to
  • Rq d3, d56, d129
  • Answer set is still set of 15 ranked doc.
  • Document d56 is relevant
  • recall level of 33.3 (1 of 3 relevant docs. in
    set Rq)
  • precision is 33.3 (1 of 3 docs. examined is
    relevant)

15
  • Document d129 is next relevant doc.
  • recall level of 66.6 (2 of 3 relevant docs. in
    set Rq)
  • precision is 25 (2 of 8 docs. examined is
    relevant)
  • Document d3 is next relevant doc.
  • recall level of 100 (3 of 3 relevant docs. in
    set Rq)
  • precision is 20 (3 of 15 docs. examined is
    relevant)

16
Interpolation
  • Let rj, j ? 0, 1, 2, , 10 be reference to j-th
    standard recall level (I.e. r5 is reference to
    recall level 50)
  • P(rj) max rj?r?rj1 P(r)
  • interpolated precision at j-th standard recall
    level is maximum known precision at any recall
    level between j-th recall level and (j1)-th
    recall level

17
Interpolation (Cont.)
  • In our last example, the interpolation rule
    yields following precision and recall (see Fig.
    3.3)
  • at recall levels 0, 10, 20 and 30,
    interpolated precision is 33.3 (known precision
    at recall level 33.3)
  • at recall levels 40, 50 and 60, the
    interpolated precision is 25 (known precision at
    recall level 66.6)
  • at recall levels 70, 80, 90 and 100,
    interpolated precision is 20

18
  • Average precision versus recall figures used to
    compare retrieval performance of distinct
    retrieval algorithms
  • standard evaluation strategy for IR systems
  • Fig. 3.4 illustrates average precision versus
    recall figures for two distinct retrieval
    algorithms
  • one algorithm has higher precision at lower
    recall levels
  • second algorithm is superior at higher recall
    levels

19
Trec Collection
  • Document Collection
  • TREC-3 2 gigabytes
  • TREC-6 5.8 gigabytes
  • documents come from Wall Street Journal, AP,
    ZIFF, FR, DOE, San Jose Mercury News, US Patents,
    Financial Times, CR, FBIS, LA Times (see Table
    3.1 for TREC-6)
  • example of TREC document numbered WSJ880406-0090
    shown in Fig. 4.7

20
  • Example Information Requests (Topics)
  • each request is description of information need
    in natural language
  • each test information request referred to as a
    topic
  • number of topics prepared for first six TREC
    conferences number 350
  • example TREC information request for topic number
    168 is shown in Fig. 3.8
  • converting topic to system query (e.g. Boolean
    query) done by system

21
  • Relevant Documents
  • at TREC conferences, set of relevant documents
    for each topic obtained from pool of possible
    relevant documents
  • this pool created by taking top K (usually,
    K100) documents in rankings generated by various
    retrieval systems
  • documents in pool shown to human assessors who
    then decide on relevance of each document
Write a Comment
User Comments (0)
About PowerShow.com