Evaluating Crosslanguage Information Retrieval Systems - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Evaluating Crosslanguage Information Retrieval Systems

Description:

What is an IR System Evaluation Campaign? An activity which tests the ... involved include word order, morphology, diacritic characters, language variants ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 43
Provided by: carol320
Category:

less

Transcript and Presenter's Notes

Title: Evaluating Crosslanguage Information Retrieval Systems


1
Evaluating Cross-language Information Retrieval
Systems
  • Carol Peters
  • IEI-CNR

2
Outline
  • Why IR System Evaluation is Important
  • Evaluation programs
  • An Example

3
What is an IR System Evaluation Campaign?
  • An activity which tests the performance of
    different systems on a given task (or set of
    tasks) under standard conditions
  • Permits contrastive analysis of
    approaches/technologies

4
How well does system meet information need?
  • System evaluation
  • how good are document rankings?
  • User-based evaluation
  • how satisfied is the user?

5
Why we need Evaluation
  • evaluation permits hypotheses to be validated and
    progress assessed
  • evaluation helps to identify areas where more RD
    is needed
  • evaluation saves developers time and money
  • CLIR systems are still in experimental stage
  • Evaluation is particularly important!

6
CLIR System Evaluation is Complex
  • CLIR systems consist of integration of
    components and technologies
  • need to evaluate single components
  • need to evaluate overall system performance
  • need to distinguish methodological aspects from
    linguistic knowledge

7
Technology vs. Usage Evaluation
  • Usage Evaluation
  • shows value of a technology for user
  • determines the technology thresholds that are
    indispensable for specific usage
  • provides directions for choice of criteria for
    technology evaluation
  • Influence of language and culture on usability of
    technology needs to be understood

8
Organising an Evaluation Activity
  • select control task(s)
  • provide data to test and tune systems
  • define protocol and metrics to be used in results
    assessment
  • Aim is an objective comparison between systems
    and approaches

9
Test Collection
  • Set of documents - must be representative of task
    of interest must be large
  • Set of topics - statement of user needs from
    which system data structure (query) is extracted
  • Relevance judgments judgments vary by assessor
    but no evidence that differences affect
    comparative evaluation of systems

10
Using Pooling to Create Large Test Collections
Ellen Voorhees CLEF 2001 Workshop
11
Cross-language Test Collections
  • Consistency harder to obtain than for monolingual
  • parallel or comparable document collections
  • multiple assessors per topic creation and
    relevance assessment (for each language)
  • must take care when comparing different language
    evaluations (e.g., cross run to mono baseline)
  • Pooling harder to coordinate
  • need to have large, diverse pools for all
    languages
  • retrieval results are not balanced across
    languages
  • Taken from Ellen Voorhees CLEF 2001 Workshop

12
Evaluation Measures
  • Recall measures ability of system to find all
    relevant items
  • recall
  • Precision measures ability of system to find
    only relevant items
  • precision

no. of rel. items retrieved ----------------------
------------ no. of rel. items in collection
no. of rel. items retrieved ----------------------
------------ total no. of items retrieved
Recall-Precision Graph is used to compare systems
13
Main CLIR Evaluation Programs
  • TIDES sponsors TREC (Text REtrieval Conferences)
    and TDT (Topic Detection and Tracking) -
    Chinese-English tracks in 2000 TREC focussing on
    English/French - Arabic in 2001
  • NTCIR Nat.Inst. for Informatics, Tokyo.
    Chinese-English Japanese-English C-L tracks
  • AMARYLLIS focused on French 98-99 campaign
    included C-L track 3rd campaign begins Sept.01
  • CLEF Cross Language Evaluation Forum - C-L
    evaluation for European languages

14
Cross-Language Evaluation Forum
  • Funded by DELOS Network of Excellence for Digital
    libraries and US National Institute for Standards
    and Technology (200-2001)
  • Extension of CLIR track at TREC (1997-1999)
  • Coordination is distributed - national sites for
    each language in multilingual collection

15
CLEF Partners (2000-2001)
  • Eurospider, Zurich, Switzerland (Peter Schäuble,
    Martin Braschler)
  • IEEC-UNED, Madrid, Spain (Felisa Verdejo, Julio
    Gonzalo)
  • IEI-CNR, Pisa, Italy (Carol Peters)
  • IZ Sozialwissenschaften, Bonn, Germany (Michael
    Kluck)
  • NIST, Gaithersburg MD, USA (Donna Harman, Ellen
    Voorhees)
  • University of Hildesheim, Germany (Christa
    Womser-Hacker)
  • University of Twente, The Netherlands (Djoerd
    Hiemstra)

16
CLEF - Main Goals
  • Promote research by providing an appropriate
    infrastructure for
  • CLIR system evaluation, testing and tuning
  • comparison and discussion of results
  • building of test-suites for system developers

17
CLEF 2001Task Description
  • Four main evaluation tracks in CLEF 2001
  • multilingual information retrieval
  • bilingual IR
  • monolingual (non-English) IR
  • domain-specific IR
  • plus
  • experimental track for interactive C-L systems

18
CLEF 2001Data Collection
  • Multilingual comparable corpus of news agencies
    and newspaper documents for six languages
    (DE,EN,FR,IT,NL,SP). Nearly 1 million documents
  • Common set of 50 topics (from which queries are
    extracted) created in 9 European languages
    (DE,EN,FR,IT,NL,SPFI,RU,SV) and 3 Asian
    languages (JP,TH,ZH)

19
CLEF 2001 Creating the Queries
  • Title European Industry
  • Description What factors damage the
    competitiveness of European industry on the
    world's markets?
  • Narrative Relevant documents discuss factors
    that render European industry and manufactured
    goods less competitive with respect to the rest
    of the world, e.g. North America or Asia.
    Relevant documents must report data for Europe as
    a whole rather than for single European nations.
  • Queries are extracted from topics 1 or more
    fields

20
CLEF 2001 Creating the Queries
  • Distributed activity (Bonn, Gaithersburg, Pisa,
    Hildesheim, Twente, Madrid)
  • Each group produced 13-15 queries (topics), 1/3
    local, 1/3 European, 1/3 international
  • Topic selection at meeting in Pisa (50 topics)
  • Topics were created in DE, EN,FR,IT,NL,SP and
    additionally translated to SV,RU,FI and TH,JP,ZH
  • Cleanup after topic translation

21
CLEF 2001 Multilingual IR
Topics either DE,EN,FR,IT FI,NL,SP,SV, RU,ZH,JP,TH
documents
English
German
French
Italian
Spanish
Participants Cross-Language Information
Retrieval System
One result list of DE, EN, FR,IT and SP documents
ranked in decreasing order of estimated relevance
22
CLEF 2001 Bilingual IR
  • Task query English or Dutch target document
    collections
  • Goal retrieve documents for target language,
    listing results in ranked list
  • Easier task for beginners !

23
CLEF 2001 Monolingual IR
  • Task querying document collections in
    FRDEITNLSP
  • Goal acquire better understanding of language-
    dependent retrieval problems
  • different languages present different retrieval
    problems
  • issues involved include word order, morphology,
    diacritic characters, language variants

24
CLEF 2001Domain-Specific IR
  • Task querying a structured database from a
    vertical domain (social sciences) in German
  • German/English/Russian thesaurus and English
    translations of document titles
  • Monolingual or cross-language task
  • Goal understand implications of querying in
    domain-specific context

25
CLEF 2001Interactive C-L
  • Task interactive document selection in an
    unknown target language
  • Goal evaluation of results presentation rather
    than system performance

26
CLEF 2001 Participation
34 participants, 15 different countries
N.America
Asia
Europe
27
Details of Experiments
28
Runs per Topic Language
29
Topic Fields
30
CLEF 2001Participation
  • CMU
  • Eidetica
  • Eurospider
  • Greenwich U
  • HKUST
  • Hummingbird
  • IAI
  • IRIT
  • ITC-irst
  • JHU-APL
  • Kasetsart U
  • KCSL Inc.
  • Medialab
  • Nara Inst. of Tech.
  • National Taiwan U
  • OCE Tech. BV
  • SICS/Conexor
  • SINAI/U Jaen
  • Thomson Legal
  • TNO TPD
  • U Alicante
  • U Amsterdam
  • U Exeter
  • U Glasgow
  • U Maryland (interactive only)
  • U Montreal/RALI
  • U Neuchâtel
  • U Salamanca
  • U Sheffield (interactive only)
  • U Tampere
  • U Twente ()
  • UC Berkeley (2 groups)
  • UNED (interactive only)

( also participated in 2000)
31
CLEF 2001Approaches
  • All traditional approaches used
  • commercial MT systems (Systran, Babelfish,
    Globalink Power Translator, )
  • both query and document translation tried
  • bilingual dictionary look-up (on-line and
    in-house tools)
  • aligned parallel corpora (web-derived)
  • comparable corpora (similarity thesaurus)
  • conceptual networks (Eurowordnet, ZH-EN wordnet)
  • multilingual thesaurus (domain-specific task)

32
CLEF 2001Techniques Tested
  • Text processing for multiple languages
  • Porter stemmer, Inxight commercial stemmer,
    on-site tools
  • simple generic quickdirty stemming
  • language independent stemming
  • separate stopword lists vs single list
  • morphological analysis
  • n-gram indexing, word segmentation, decompounding
    (e.g. Chinese, German)
  • use of NLP methods, e.g. phrase identification,
    morphosyntactic analysis

33
CLEF 2001Techniques Tested
  • Cross-language strategies included
  • integration of methods (MT, corpora and MRDs)
  • pivot language to translate from L1 -gt L2 (DE -gt
    FR,SP,IT via EN)
  • N-gram based technique to match untranslatable
    words
  • prior and post-translation pseudo-relevance
    feedback (query expanded by associating frequent
    cooccurrences)
  • vector-based semantic analysis (query expanded by
    associating semantically similar terms)

34
CLEF 2001Techniques Tested
  • Different strategies experimented for results
    merging
  • This remains still an unsolved problem

35
CLEF 2001 Workshop
  • Results of CLEF 2001 campaign presented at
    Workshop, 3-4 September 2001, Darmstadt, Germany
  • 50 researchers and system developers from
    academia and industry participated.
  • Working Notes containing preliminary reports and
    statistics on CLEF2001 experiments distributed.

36
CLEF-2001 vs. CLEF-2000
  • Most participants were back
  • Less MT
  • More Corpus-Based
  • People really start to try each others
    ideas/methods
  • corpus-based approaches (parallel web,
    alignments)
  • n-grams
  • combination approaches

37
Effect of CLEF
  • Many more European groups
  • Dramatic increase of work in stemming/decompoundin
    g (for languages other than English)
  • Work on mining the web for parallel texts
  • Work on merging (breakthrough still missing?)
  • Work on combination approaches

38
CLEF 2002
Accompanying Measure under IST programme
Contract No. IST-2000-31002. October 2001 CLEF
Consortium IEI-CNR, Pisa ELRA/ELDA, Paris
Eurospider, Zurich UNED, Madrid NIST, USA IZ
Sozialwissenschaften, Bonn Associated
Members University of Hildesheim, University of
Twente, University of Tampere (?)
39
CLEF 2002Task Description
  • Similar to CLEF 2001
  • multilingual information retrieval
  • bilingual IR (not to English!)
  • monolingual (non-English) IR
  • domain-specific IR
  • interactive track
  • Plus feasibility study for spoken document track
    (within DELOS results reported at CLEF)
  • Possible cooordination with Amaryllis

40
CLEF 2002Schedule
  • Call for Participation - November 2001
  • Document release 1 February 2002
  • Topic Release 1 April 2002
  • Runs received - 15 June 2002
  • Results communicated 1 August 2002
  • Paper for Working Notes - 1 September 2002
  • Workshop - 19-20 September

41
Evaluation - Summing up
  • system evaluation is not a competition to find
    the best
  • evaluation provides opportunity to test, tune,
    and compare approaches in order to improve system
    performance
  • an evaluation campaign creates a community
    interested in examining the same issues and
    comparing ideas and experiences

42
Cross-Language Evaluation Forum
  • For further information see
  • http//www.clef-campaign.org
  •  
  • or contact
  • Carol Peters - IEI-CNR
  • E-mail carol_at_iei.pi.cnr.it
Write a Comment
User Comments (0)
About PowerShow.com