Evaluating Crosslanguage Information Retrieval Systems - PowerPoint PPT Presentation

1 / 42

About This Presentation

Title:

Evaluating Crosslanguage Information Retrieval Systems

Description:

What is an IR System Evaluation Campaign? An activity which tests the ... involved include word order, morphology, diacritic characters, language variants ... – PowerPoint PPT presentation

Number of Views:80

Avg rating:3.0/5.0

Slides: 43

Provided by: carol320

Category:

more less

Transcript and Presenter's Notes

Title: Evaluating Crosslanguage Information Retrieval Systems

1
Evaluating Cross-language Information Retrieval
Systems

Carol Peters
IEI-CNR

2
Outline

Why IR System Evaluation is Important
Evaluation programs
An Example

3
What is an IR System Evaluation Campaign?

An activity which tests the performance of
different systems on a given task (or set of
tasks) under standard conditions
Permits contrastive analysis of
approaches/technologies

4
How well does system meet information need?

System evaluation
how good are document rankings?
User-based evaluation
how satisfied is the user?

5
Why we need Evaluation

evaluation permits hypotheses to be validated and
progress assessed
evaluation helps to identify areas where more RD
is needed
evaluation saves developers time and money
CLIR systems are still in experimental stage
Evaluation is particularly important!

6
CLIR System Evaluation is Complex

CLIR systems consist of integration of
components and technologies
need to evaluate single components
need to evaluate overall system performance
need to distinguish methodological aspects from
linguistic knowledge

7
Technology vs. Usage Evaluation

Usage Evaluation
shows value of a technology for user
determines the technology thresholds that are
indispensable for specific usage
provides directions for choice of criteria for
technology evaluation
Influence of language and culture on usability of
technology needs to be understood

8
Organising an Evaluation Activity

select control task(s)
provide data to test and tune systems
define protocol and metrics to be used in results
assessment
Aim is an objective comparison between systems
and approaches

9
Test Collection

Set of documents - must be representative of task
of interest must be large
Set of topics - statement of user needs from
which system data structure (query) is extracted
Relevance judgments judgments vary by assessor
but no evidence that differences affect
comparative evaluation of systems

10
Using Pooling to Create Large Test Collections
Ellen Voorhees CLEF 2001 Workshop
11
Cross-language Test Collections

Consistency harder to obtain than for monolingual
parallel or comparable document collections
multiple assessors per topic creation and
relevance assessment (for each language)
must take care when comparing different language
evaluations (e.g., cross run to mono baseline)
Pooling harder to coordinate
need to have large, diverse pools for all
languages
retrieval results are not balanced across
languages
Taken from Ellen Voorhees CLEF 2001 Workshop

12
Evaluation Measures

Recall measures ability of system to find all
relevant items
recall
Precision measures ability of system to find
only relevant items
precision

no. of rel. items retrieved ----------------------
------------ no. of rel. items in collection
no. of rel. items retrieved ----------------------
------------ total no. of items retrieved
Recall-Precision Graph is used to compare systems
13
Main CLIR Evaluation Programs

TIDES sponsors TREC (Text REtrieval Conferences)
and TDT (Topic Detection and Tracking) -
Chinese-English tracks in 2000 TREC focussing on
English/French - Arabic in 2001
NTCIR Nat.Inst. for Informatics, Tokyo.
Chinese-English Japanese-English C-L tracks
AMARYLLIS focused on French 98-99 campaign
included C-L track 3rd campaign begins Sept.01
CLEF Cross Language Evaluation Forum - C-L
evaluation for European languages

14
Cross-Language Evaluation Forum

Funded by DELOS Network of Excellence for Digital
libraries and US National Institute for Standards
and Technology (200-2001)
Extension of CLIR track at TREC (1997-1999)
Coordination is distributed - national sites for
each language in multilingual collection

15
CLEF Partners (2000-2001)

Eurospider, Zurich, Switzerland (Peter Schäuble,
Martin Braschler)
IEEC-UNED, Madrid, Spain (Felisa Verdejo, Julio
Gonzalo)
IEI-CNR, Pisa, Italy (Carol Peters)
IZ Sozialwissenschaften, Bonn, Germany (Michael
Kluck)
NIST, Gaithersburg MD, USA (Donna Harman, Ellen
Voorhees)
University of Hildesheim, Germany (Christa
Womser-Hacker)
University of Twente, The Netherlands (Djoerd
Hiemstra)

16
CLEF - Main Goals

Promote research by providing an appropriate
infrastructure for
CLIR system evaluation, testing and tuning
comparison and discussion of results
building of test-suites for system developers

17
CLEF 2001Task Description

Four main evaluation tracks in CLEF 2001
multilingual information retrieval
bilingual IR
monolingual (non-English) IR
domain-specific IR
plus
experimental track for interactive C-L systems

18
CLEF 2001Data Collection

Multilingual comparable corpus of news agencies
and newspaper documents for six languages
(DE,EN,FR,IT,NL,SP). Nearly 1 million documents
Common set of 50 topics (from which queries are
extracted) created in 9 European languages
(DE,EN,FR,IT,NL,SPFI,RU,SV) and 3 Asian
languages (JP,TH,ZH)

19
CLEF 2001 Creating the Queries

Title European Industry
Description What factors damage the
competitiveness of European industry on the
world's markets?
Narrative Relevant documents discuss factors
that render European industry and manufactured
goods less competitive with respect to the rest
of the world, e.g. North America or Asia.
Relevant documents must report data for Europe as
a whole rather than for single European nations.
Queries are extracted from topics 1 or more
fields

20
CLEF 2001 Creating the Queries

Distributed activity (Bonn, Gaithersburg, Pisa,
Hildesheim, Twente, Madrid)
Each group produced 13-15 queries (topics), 1/3
local, 1/3 European, 1/3 international
Topic selection at meeting in Pisa (50 topics)
Topics were created in DE, EN,FR,IT,NL,SP and
additionally translated to SV,RU,FI and TH,JP,ZH
Cleanup after topic translation

21
CLEF 2001 Multilingual IR
Topics either DE,EN,FR,IT FI,NL,SP,SV, RU,ZH,JP,TH
documents
English
German
French
Italian
Spanish
Participants Cross-Language Information
Retrieval System
One result list of DE, EN, FR,IT and SP documents
ranked in decreasing order of estimated relevance
22
CLEF 2001 Bilingual IR

Task query English or Dutch target document
collections
Goal retrieve documents for target language,
listing results in ranked list
Easier task for beginners !

23
CLEF 2001 Monolingual IR

Task querying document collections in
FRDEITNLSP
Goal acquire better understanding of language-
dependent retrieval problems
different languages present different retrieval
problems
issues involved include word order, morphology,
diacritic characters, language variants

24
CLEF 2001Domain-Specific IR

Task querying a structured database from a
vertical domain (social sciences) in German
German/English/Russian thesaurus and English
translations of document titles
Monolingual or cross-language task
Goal understand implications of querying in
domain-specific context

25
CLEF 2001Interactive C-L

Task interactive document selection in an
unknown target language
Goal evaluation of results presentation rather
than system performance

26
CLEF 2001 Participation
34 participants, 15 different countries
N.America
Asia
Europe
27
Details of Experiments
28
Runs per Topic Language
29
Topic Fields
30
CLEF 2001Participation

CMU
Eidetica
Eurospider
Greenwich U
HKUST
Hummingbird
IAI
IRIT
ITC-irst
JHU-APL
Kasetsart U
KCSL Inc.

Medialab
Nara Inst. of Tech.
National Taiwan U
OCE Tech. BV
SICS/Conexor
SINAI/U Jaen
Thomson Legal
TNO TPD
U Alicante
U Amsterdam
U Exeter

U Glasgow
U Maryland (interactive only)
U Montreal/RALI
U Neuchâtel
U Salamanca
U Sheffield (interactive only)
U Tampere
U Twente ()
UC Berkeley (2 groups)
UNED (interactive only)

( also participated in 2000)
31
CLEF 2001Approaches

All traditional approaches used
commercial MT systems (Systran, Babelfish,
Globalink Power Translator, )
both query and document translation tried
bilingual dictionary look-up (on-line and
in-house tools)
aligned parallel corpora (web-derived)
comparable corpora (similarity thesaurus)
conceptual networks (Eurowordnet, ZH-EN wordnet)
multilingual thesaurus (domain-specific task)

32
CLEF 2001Techniques Tested

Text processing for multiple languages
Porter stemmer, Inxight commercial stemmer,
on-site tools
simple generic quickdirty stemming
language independent stemming
separate stopword lists vs single list
morphological analysis
n-gram indexing, word segmentation, decompounding
(e.g. Chinese, German)
use of NLP methods, e.g. phrase identification,
morphosyntactic analysis

33
CLEF 2001Techniques Tested

Cross-language strategies included
integration of methods (MT, corpora and MRDs)
pivot language to translate from L1 -gt L2 (DE -gt
FR,SP,IT via EN)
N-gram based technique to match untranslatable
words
prior and post-translation pseudo-relevance
feedback (query expanded by associating frequent
cooccurrences)
vector-based semantic analysis (query expanded by
associating semantically similar terms)

34
CLEF 2001Techniques Tested

Different strategies experimented for results
merging
This remains still an unsolved problem

35
CLEF 2001 Workshop

Results of CLEF 2001 campaign presented at
Workshop, 3-4 September 2001, Darmstadt, Germany
50 researchers and system developers from
academia and industry participated.
Working Notes containing preliminary reports and
statistics on CLEF2001 experiments distributed.

36
CLEF-2001 vs. CLEF-2000

Most participants were back
Less MT
More Corpus-Based
People really start to try each others
ideas/methods
corpus-based approaches (parallel web,
alignments)
n-grams
combination approaches

37
Effect of CLEF

Many more European groups
Dramatic increase of work in stemming/decompoundin
g (for languages other than English)
Work on mining the web for parallel texts
Work on merging (breakthrough still missing?)
Work on combination approaches

38
CLEF 2002
Accompanying Measure under IST programme
Contract No. IST-2000-31002. October 2001 CLEF
Consortium IEI-CNR, Pisa ELRA/ELDA, Paris
Eurospider, Zurich UNED, Madrid NIST, USA IZ
Sozialwissenschaften, Bonn Associated
Members University of Hildesheim, University of
Twente, University of Tampere (?)
39
CLEF 2002Task Description

Similar to CLEF 2001
multilingual information retrieval
bilingual IR (not to English!)
monolingual (non-English) IR
domain-specific IR
interactive track
Plus feasibility study for spoken document track
(within DELOS results reported at CLEF)
Possible cooordination with Amaryllis

40
CLEF 2002Schedule

Call for Participation - November 2001
Document release 1 February 2002
Topic Release 1 April 2002
Runs received - 15 June 2002
Results communicated 1 August 2002
Paper for Working Notes - 1 September 2002
Workshop - 19-20 September

41
Evaluation - Summing up

system evaluation is not a competition to find
the best
evaluation provides opportunity to test, tune,
and compare approaches in order to improve system
performance
an evaluation campaign creates a community
interested in examining the same issues and
comparing ideas and experiences

42
Cross-Language Evaluation Forum