Diapositive 1 - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Diapositive 1

Description:

ELRA/ELDA : missions, activities, and more ... IBM Deutschland Entwicklung GmbH. Details per language. 38. 2. Spanish Lexicon. 1. Portuguese Lexicon ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 19
Provided by: khal46
Category:

less

Transcript and Presenter's Notes

Title: Diapositive 1


1
PAROLE ELRA
(Some) (Shared) views on sharing resources Over
the last 15 years!!
Khalid CHOUKRI ELRA/ELDA choukri_at_elda.org http
//www.elda.org/ or http//www.elra.info/
2
Outline
  • ELRA/ELDA missions, activities, and more
  • From archiving to Open Language Respources
    Infrastructure(s)
  • Some facts figures about the outcomes of Parole
  • Open issues .... Triggered by Parole resources
  • Where to head ... New types of corpora ... ORI

3
European Language Resource Association An
Improved infrastructure for Data sharing HLT
evaluation
(1) An Association of users of Language Resources
  • A Repository Center
  • Technical Logistic issues
  • Commercial issues (prices, fees, royalties)
  • Legal issues (Licensing, IPR)
  • Information Dissemination

(2) Infrastructure for the evaluation of Human
Language Technologies providing resources,
tools, methodologies, logistics,
Exit strategies / Capitalization on
evaluation packages
4
The Association
  • Membership Drive
  • ELRA is Open to European Non-European
    Institutions
  • Resources are available to Members Non-Members
  • Pay per Resource. Many are free for RD
  • Some of the benefits of becoming a member
  • Substantial discounts on LR prices (over 70),
  • Substantial discounts on LREC registration fees
  • Legal and contractual assistance with respect to
    LR matters
  • Access to Validation and production manuals
    (Quality assessment)
  • Figures and facts about the Market (results of
    ELRA surveys)
  • Newsletter and other publications inc. JLREC ?
  • Since 2005 Fidelity program earn miles
    and get more benefits

5
ELRA An efficient infrastructure to serve the
HLT Community 5 years of
establishment
  • 1994-1995 RELATOR a European-wide consortium
    with the support of the European Commission,
  • striving to establish a
    European repository of linguistic resources.
  • 1995 Set up of the European Language Resources
    Association (ELRA) as an Archiving House, set up
    of a
  • Distribution Agency (ELDA)
  • 1996 Our first List of identified Language
    Resources followed by a first Catalogue
  • 1997 Negotiation of Distribution rights, Start
    of Distribution activities, ? PAROLE
  • RD projects on issues related
    to LRs (co-funded EU, French agencies, etc.)
  • 1998 1st LREC,
  • Introduction of the BLARK
    Concept (with ELSNET), Later on ELARK
  • Production commissioning the
    production of LRs (Internal network of
    production units)
  • Market Analysis ,
    Technologies-State of the Art
  • Evaluation Activities (Else
    project Jan98), Integration of ASR MT (Systran
    Dragon), TTS
  • 1999 Strong focus on Production (SpeechDat
    family, LRs-PP, SpeeCon, etc.), 1st PAROLE
    CONTRACT
  • Validation of LRs and LRs
    quality assessment, Set up of validation units,
    Bug reporting Correction patches

6
ELRA An efficient infrastructure to serve the
HLT Community 10 . years
of consolidation
  • 2000- The European HLT landscape HLT benchmark
    (HLT players Directory - EUROMAP)
  • Start of an active partnership
    with LDC Coordination of legal production
    issues
  • Launch of EVALUATION Campaigns
    (Amaryllis, Aurora, )
  • 2001- Launch of CLEF Cross-Lingual Evaluation
    Forum
  • Identification of New types of
    Resources . Multimodal Resources (Isle project)
  • HLT LRs roadmaps, survey of
    existing national activities,
  • 2003- Evaluation projects Technolangue, CHIL,
    TC-STAR, Over 20 technology components
  • LangTech2003 (Paris),
  • Boost of networking
    actions/Partnership Europe Mediterranean
    countries (NEMLAR)
  • Meta-Data for LRs cataloguing
    (Intera)
  • 2004- Universal Catalogue
  • 2005 HLT Portal Information, evaluation
    packages, evaluation services
  • LRs Unification Project
  • 2006-2010 more resources on evaluation and an
    evaluation service department

7
ELRA An efficient infrastructure to serve the
HLT CommunityStrategies for the next Decade
New ELRA status
  • 2005- Extension of ELRAs official mission to
    promote LRs Related Resources and evaluation
    for the Human Language Technology (HLT)
  • The mission of the Association is to promote
    language resources (henceforth LRs) and
    evaluation for the Human Language Technology
    (HLT) sector in all their forms and all their
    uses
  • gt to coordinate and carry out identification,
    production, validation, distribution,
    standardization of LRs, as well as support for
    evaluation of systems, products, tools, etc. -
    related to language resources. Other resources
    will be considered as well if developments of the
    field make this desirable e.g. multimedia
    resources both with and without language.

8
ELRA/ELDA Role(s)
9
Identification and distribution of LRs
  • Identification of LRs
  • Rights and licences/Legal issues
  • ELRA Catalogue http//catalogue.elra.info/
  • Universal Catalogue http//universal.elra.info/
  • Types of LRs All modalities associated (or not)
    with language

10
ELRA Catalogue over the years
11
Moderate growth and increased share of publicly
traded LR most likely future scenario
Moderate growth and increased share of publicly
traded LR most likely future scenario
  • Spending on LRs will be driven by
  • New applications
  • Changing environments
  • Number of languages
  • Number of new customers
  • Share of publicly traded LRs will be driven by
  • Availability
  • Differentiation rationales

12
Production of LRs
  • Production (internal) or coordination of LRs
    production
  • National, European International projects
  • Provide support to institutions, (commercial
    academic)
  • Consultancy role
  • Standards validation PCom VCom
  • Technical aspects Format, Coding, Storage,
    Description/Metadata
  • Recent Productions
  • Speech Audio (LILA, Orientel...), parallel
    corpora (MT SMT)
  • Multimodal (annotations of videos CHIL),
  • corpus packages for evaluation (TC-STAR, CHIL,
    EVALDA...)
  • ? New trends New languages e.g. SMS

13
Evaluation Missions
  • Infrastructure for technology Evaluation
  • Production of LRs for Evaluation
  • Conducting Evaluation Campaigns
  • Capitalization ROI (Exit strategies)

14
PAROLE ELRA or PAROLE _at_ ELRA
15
PAROLE LRs
  • Elaboration of the linguistic specifications
  • developing and making available large, generic
    and re-usable harmonised written language
    resources,
  • corpora for 14 languages
  • electronic lexica for 12 languages of the
    European Union.
  • at least 1412 LRs !!
  • In practice more
  • Elaboration of Validation Procedures
  • ELRA Validation Manuals VCom Validation
    Centers

16
PAROLE Resources distributedvia ELRA
  • PAROLE Dutch Distributable Corpus
  • PAROLE Dutch lexicon
  • PAROLE English lexicon
  • PAROLE French Corpus
  • PAROLE Irish Distributable Corpus
  • PAROLE Italian Corpus
  • PAROLE-SIMPLE-CLIPS PISA Italian Lexicon -
    Phonetic layer
  • PAROLE Portuguese Corpus - 250,000 annotated
    words
  • PAROLE Portuguese Corpus - The whole corpus
  • PAROLE Portuguese Lexicon
  • PAROLE Spanish Lexicon
  • (PAROLE) STO Danish Lexicon
  • PAROLE Greek lexicon

17
(Almost) 15 years of Parole
Resources distribution
  • PAROLE Project (1996-1998)
  • Licences Distribution agreement
  • First contract signed on 05/05/1999
  • Last one signed on 07/09/2006
  • Distributions
  • First copy sold on 2000-06
  • Last time 2009-05

18
15 years of Parole Resources distributions
  • 10 copies of the Monolingual Lexicon
  • 28 Copies of the Written corpus/corpora
  • ? Over 80K revenues .
  • ? Mainly from commercial institutions
  • Not all (catalogued) languages were sold
  • .. not all generated revenues

19
Profiles of users
  • Lexica
  • 5 Copies for Research organizations
  • 5 Copies for Commercial organizations
  • Corpora
  • 16 Copies for Research organizations
  • 12 Copies for Commercial organizations

20
More profiles Not for profit
  • Fundação da Faculdade de Ciências da Universidade
    de Lisboa
  • National Institute of Informatics
  • ELLEPAP, Ioannina Branch (Hellenic Society of
    Disabled Children)
  • University of Erlangen
  • Polderland Language Speech Technology
  • SINEQUA
  • University of Hong Kong
  • Faculté Polytechnique de Mons - T.C.T.S.
  • IRISA/ENSSAT, Université de Rennes 1
  • IRIT - Institut de Recherche en Informatique de
    Toulouse, Université Paul Sabatier
  • Nagoya University
  • National Institute of Informatics
  • Open University
  • Radboud University Nijmegen
  • Toshiba Research Europe Ltd.
  • T.D. Europe BV

21
More profiles
  • Ask Jeeves, Inc.
  • Lexiquest
  • Panasonic Speech Technology Laboratory
  • Canon Inc.
  • Vlingo Corporation
  • IBM France
  • Lexiquest
  • Microsoft Corporation
  • Panasonic Speech Technology Laboratory
  • IBM Deutschland Entwicklung GmbH

22
Details per language
23
Some basic outcomes
  • No bug reports !!
  • Maintenance very rarely provided
  • Validation initiated the quality trend within
    LR producers
  • Unification Merging of LRs

24
Where to head new trends new fashions
  • New types of resources
  • New annotation schemas (and features)
  • Best practices Standards
  • New production approches
  • Rover ... Use of technologies to produce new data
    and annotations (Combination of several
    approaches)
  • (Semi)-automatic /(un)-supervised /......
  • New distribution sharing mechanisms
  • More open resources ....
  • Free versus for-a-fee resources
  • ? ORI
Write a Comment
User Comments (0)
About PowerShow.com