Title: Diapositive 1
1PAROLE ELRA
(Some) (Shared) views on sharing resources Over
the last 15 years!!
Khalid CHOUKRI ELRA/ELDA choukri_at_elda.org http
//www.elda.org/ or http//www.elra.info/
2Outline
- ELRA/ELDA missions, activities, and more
- From archiving to Open Language Respources
Infrastructure(s) - Some facts figures about the outcomes of Parole
- Open issues .... Triggered by Parole resources
- Where to head ... New types of corpora ... ORI
3European Language Resource Association An
Improved infrastructure for Data sharing HLT
evaluation
(1) An Association of users of Language Resources
- A Repository Center
- Technical Logistic issues
- Commercial issues (prices, fees, royalties)
- Legal issues (Licensing, IPR)
- Information Dissemination
(2) Infrastructure for the evaluation of Human
Language Technologies providing resources,
tools, methodologies, logistics,
Exit strategies / Capitalization on
evaluation packages
4The Association
- Membership Drive
- ELRA is Open to European Non-European
Institutions - Resources are available to Members Non-Members
- Pay per Resource. Many are free for RD
- Some of the benefits of becoming a member
- Substantial discounts on LR prices (over 70),
- Substantial discounts on LREC registration fees
- Legal and contractual assistance with respect to
LR matters - Access to Validation and production manuals
(Quality assessment) - Figures and facts about the Market (results of
ELRA surveys) - Newsletter and other publications inc. JLREC ?
- Since 2005 Fidelity program earn miles
and get more benefits
5ELRA An efficient infrastructure to serve the
HLT Community 5 years of
establishment
- 1994-1995 RELATOR a European-wide consortium
with the support of the European Commission,
- striving to establish a
European repository of linguistic resources. - 1995 Set up of the European Language Resources
Association (ELRA) as an Archiving House, set up
of a - Distribution Agency (ELDA)
- 1996 Our first List of identified Language
Resources followed by a first Catalogue - 1997 Negotiation of Distribution rights, Start
of Distribution activities, ? PAROLE - RD projects on issues related
to LRs (co-funded EU, French agencies, etc.) -
- 1998 1st LREC,
- Introduction of the BLARK
Concept (with ELSNET), Later on ELARK - Production commissioning the
production of LRs (Internal network of
production units) - Market Analysis ,
Technologies-State of the Art - Evaluation Activities (Else
project Jan98), Integration of ASR MT (Systran
Dragon), TTS - 1999 Strong focus on Production (SpeechDat
family, LRs-PP, SpeeCon, etc.), 1st PAROLE
CONTRACT - Validation of LRs and LRs
quality assessment, Set up of validation units,
Bug reporting Correction patches
6ELRA An efficient infrastructure to serve the
HLT Community 10 . years
of consolidation
- 2000- The European HLT landscape HLT benchmark
(HLT players Directory - EUROMAP) - Start of an active partnership
with LDC Coordination of legal production
issues - Launch of EVALUATION Campaigns
(Amaryllis, Aurora, ) - 2001- Launch of CLEF Cross-Lingual Evaluation
Forum - Identification of New types of
Resources . Multimodal Resources (Isle project) - HLT LRs roadmaps, survey of
existing national activities, -
- 2003- Evaluation projects Technolangue, CHIL,
TC-STAR, Over 20 technology components - LangTech2003 (Paris),
- Boost of networking
actions/Partnership Europe Mediterranean
countries (NEMLAR) - Meta-Data for LRs cataloguing
(Intera) - 2004- Universal Catalogue
- 2005 HLT Portal Information, evaluation
packages, evaluation services - LRs Unification Project
- 2006-2010 more resources on evaluation and an
evaluation service department
7ELRA An efficient infrastructure to serve the
HLT CommunityStrategies for the next Decade
New ELRA status
- 2005- Extension of ELRAs official mission to
promote LRs Related Resources and evaluation
for the Human Language Technology (HLT) - The mission of the Association is to promote
language resources (henceforth LRs) and
evaluation for the Human Language Technology
(HLT) sector in all their forms and all their
uses - gt to coordinate and carry out identification,
production, validation, distribution,
standardization of LRs, as well as support for
evaluation of systems, products, tools, etc. -
related to language resources. Other resources
will be considered as well if developments of the
field make this desirable e.g. multimedia
resources both with and without language.
8ELRA/ELDA Role(s)
9Identification and distribution of LRs
- Identification of LRs
- Rights and licences/Legal issues
- ELRA Catalogue http//catalogue.elra.info/
- Universal Catalogue http//universal.elra.info/
- Types of LRs All modalities associated (or not)
with language
10ELRA Catalogue over the years
11Moderate growth and increased share of publicly
traded LR most likely future scenario
Moderate growth and increased share of publicly
traded LR most likely future scenario
- Spending on LRs will be driven by
- New applications
- Changing environments
- Number of languages
- Number of new customers
- Share of publicly traded LRs will be driven by
- Availability
- Differentiation rationales
12Production of LRs
- Production (internal) or coordination of LRs
production - National, European International projects
- Provide support to institutions, (commercial
academic) - Consultancy role
- Standards validation PCom VCom
- Technical aspects Format, Coding, Storage,
Description/Metadata -
- Recent Productions
- Speech Audio (LILA, Orientel...), parallel
corpora (MT SMT) - Multimodal (annotations of videos CHIL),
- corpus packages for evaluation (TC-STAR, CHIL,
EVALDA...) - ? New trends New languages e.g. SMS
13Evaluation Missions
- Infrastructure for technology Evaluation
- Production of LRs for Evaluation
- Conducting Evaluation Campaigns
- Capitalization ROI (Exit strategies)
14PAROLE ELRA or PAROLE _at_ ELRA
15PAROLE LRs
- Elaboration of the linguistic specifications
- developing and making available large, generic
and re-usable harmonised written language
resources, - corpora for 14 languages
- electronic lexica for 12 languages of the
European Union. - at least 1412 LRs !!
- In practice more
- Elaboration of Validation Procedures
- ELRA Validation Manuals VCom Validation
Centers
16PAROLE Resources distributedvia ELRA
- PAROLE Dutch Distributable Corpus
- PAROLE Dutch lexicon
- PAROLE English lexicon
- PAROLE French Corpus
- PAROLE Irish Distributable Corpus
- PAROLE Italian Corpus
- PAROLE-SIMPLE-CLIPS PISA Italian Lexicon -
Phonetic layer - PAROLE Portuguese Corpus - 250,000 annotated
words - PAROLE Portuguese Corpus - The whole corpus
- PAROLE Portuguese Lexicon
- PAROLE Spanish Lexicon
- (PAROLE) STO Danish Lexicon
- PAROLE Greek lexicon
17(Almost) 15 years of Parole
Resources distribution
- PAROLE Project (1996-1998)
- Licences Distribution agreement
- First contract signed on 05/05/1999
- Last one signed on 07/09/2006
- Distributions
- First copy sold on 2000-06
- Last time 2009-05
1815 years of Parole Resources distributions
- 10 copies of the Monolingual Lexicon
- 28 Copies of the Written corpus/corpora
- ? Over 80K revenues .
- ? Mainly from commercial institutions
- Not all (catalogued) languages were sold
- .. not all generated revenues
19Profiles of users
- Lexica
- 5 Copies for Research organizations
- 5 Copies for Commercial organizations
- Corpora
- 16 Copies for Research organizations
- 12 Copies for Commercial organizations
20More profiles Not for profit
- Fundação da Faculdade de Ciências da Universidade
de Lisboa - National Institute of Informatics
- ELLEPAP, Ioannina Branch (Hellenic Society of
Disabled Children) - University of Erlangen
- Polderland Language Speech Technology
- SINEQUA
- University of Hong Kong
- Faculté Polytechnique de Mons - T.C.T.S.
- IRISA/ENSSAT, Université de Rennes 1
- IRIT - Institut de Recherche en Informatique de
Toulouse, Université Paul Sabatier - Nagoya University
- National Institute of Informatics
- Open University
- Radboud University Nijmegen
- Toshiba Research Europe Ltd.
- T.D. Europe BV
21More profiles
- Ask Jeeves, Inc.
- Lexiquest
- Panasonic Speech Technology Laboratory
- Canon Inc.
- Vlingo Corporation
- IBM France
- Lexiquest
- Microsoft Corporation
- Panasonic Speech Technology Laboratory
- IBM Deutschland Entwicklung GmbH
22Details per language
23Some basic outcomes
- No bug reports !!
- Maintenance very rarely provided
- Validation initiated the quality trend within
LR producers - Unification Merging of LRs
24Where to head new trends new fashions
- New types of resources
- New annotation schemas (and features)
- Best practices Standards
- New production approches
- Rover ... Use of technologies to produce new data
and annotations (Combination of several
approaches) - (Semi)-automatic /(un)-supervised /......
- New distribution sharing mechanisms
- More open resources ....
- Free versus for-a-fee resources
- ? ORI