Title: PREVIOUS EVENTS
1PREVIOUS EVENTS
- Panel on International Co-operation (LREC -
Granada) - Panel of the Funding Agencies (LREC - Granada)
- Post-LREC Workshop on Multilingual Information
Management Current Levels and Future Abilities
(Closing Session - Granada)
2- International Conference on New Vistas in
Transatlantic Scientific and Technical
Cooperation - (Washington DC - June 98)
- HLT Session
- LREC (Athens) 2000 - Panel on International
Cooperation - COLING (Saarbruecken) 2000 - Panel on
International Cooperation
3- INTERNATIONAL CO-OPERATION
- In previous discussions (Granada, Washington,
1998 Athens, 2000) the following areas of HLT
have emerged as being in urgent need of
international co-operation - standards (de facto, best practices)
- Language Resources (LR)
- evaluation
- core technologies
- selected vertical domains.
4LR are a central infrastructural component and a
central issue for international coordination
LR are essential components of HLT activity,
supporting research, system development and
training, and evaluation in both the mono- and
multilingual context Political/cultural aspects
of LR endangered languages A key enabling
condition of integration of different
technologies and languages requires that LR are
shared among different sectors and
applications The richness of the multilingual
capabilities associated with a language depends
on the number of languages for which adequate LR
exist
5The high cost and effort of the production of LR
should be shared, in order to make them more
affordable. The creation of multilingual LR
requires agreement on a coordination policy, to
ensure the reuse of existing monolingual
resources and to facilitate access to native
speakers of the various languages.
6- MULTILINGUAL LR
- In particular, the production of multilingual LR
poses - research issue and challenges
- organisational problems
- ? who has the responsability of promoting -
and how - the co-operation of RD communities
speaking different languages?
7- Different situation for
- types of LR corpora, lexicons, etc.
- large/general multilingual LR
- applications specific LR
- customisation
- different types of information (data VS
analitical/interpretative features).
8- - for different phases of LR development
(research standards specificationsconstructionm
aintenance acquisition (pre)tuning
customization updatingtechnology transfer
etc.) - DIFFERENT strategies in various continents
- ROLES/RESPONSABILITIES of various actors
- CHALLENGES
9- MODALITIES
- ?Structured international co-operation (For
ex. US-EU Transatlantic agreement current/past
initiatives, lessons to be learned, perspectives
for the future, perspectives/opportunities) -
10- ?Other forms of international
- co-operation experiences
- advantages/disadvantages/recovery
measures/consequences suggestions for the
short/medium/long term. - ETC.
11- Why international co-operation is more
difficult than in other sciences and needs
institutional support? - What are the consequences of the links with
social goals, national identities and commercial
interests which characterize HLT? - In which areas is international co-operation
necessary / appropriate / inappropriate? - Which activities need international co-operation
(for instance, evaluation, multilingual LR etc.)?
12- What can we learn from recent experiences of
co-operation supported by the Funding Agencies in
the frame of the Transatlantic Scientific and
Technical Co-operation Agreement? What are the
obstacles? Which strategies have been successful? - What kind of initiatives for co-operation with
other countries (Asian, South-American etc.) can
be taken and what their possibilities / benefits
/ priorities could be?
13European Language Resource Association An
Improved infrastructure for Data sharing
Centralized Non-for-profit organization for the
collection, distribution, and validation of
speech, text, and terminology resources and
tools,
An Association of users of Language Resources
- A Repository Center
- Technical Logistic issues
- Commercial issues (prices, fees, royalties)
- Legal issues (Licensing, IPR)
- Information Dissemination
An operational company European Language
Resource Distribution Agency (ELDA)
14The Association
- Membership Drive
- ELRA is Open to European Non-European
Institutions - Resources are available to Members Non-Members
- Pay per Resource
- Some of the benefits of becoming a member
- Substantial discounts on LR prices (over 70)
- Legal and contractual assistance with respect to
LR matters - Access to Validation and production manuals
(Quality assessment) - Figures and facts about the Market (results of
ELRA surveys) - Newsletter and other publications
15ELRA CATALOGUE -- Identification of LRs
16Legal Issues- Licensing
Provider-User Agreements
17Legal Issues- Licensing
Distribution Agreement
18DISTRIBUTION ACTIVITIES distributed resources
Number of resources distributed to members
non-members Periods of 12 months
19Other Technical ACTIVITIES
- Market analysis surveys
- LR production, packaging funding
- Validation of resources - Validation networks
- Language Resources for Evaluation purposes
20Some aspects of LREC2000
Connecting industrials with academic partners
21Some facts on LREC2000 Participants
22Some facts on LREC2000
23International Standards for Language Engineering
- Project Participants
- EU
- Consorzio Pisa Ricerche
I CO - University of Southern Denmark
DK CR - Université de Genève
CH CR - USA
- University of Pennsylvania Computer and
Information Sciences - University of Pennsylvania Linguistic Data
Consortium - New York University
- Information Sciences Institute University of
Southern California
24Objectives
- To develop HLT standards within EU-US
International Research Cooperation - To build on the successful EAGLES (Expert
Advisory Group for Language Engineering
Standards) - To tackle innovative areas where standards are
strongly and urgently required - To support HLT RTD and National projects, and HLT
industry - To promote EAGLES as an internationally active
body for HLT standardisation - To contribute to all IST thematic programmes
25Organisation
- Work is organised in
- 3 EAGLES Working Groups and
- several subgroups of experts, drawn from academia
and industry, - to build consensus around international workshops
- ISLE standards and guidelines will be
- validated in RTD and National projects,
- disseminated widely, with exemplary data
- to yield maximum impact for minimum cost, and
- enhance user experience of the information
society through standards-based HLT
26Multilingual Computational Lexicons - CLWG
- extend EAGLES work on lexical semantics,
necessary to establish inter-language links - design standards for multilingual lexicons
- develop a prototype tool to implement lexicon
guidelines and standards - create exemplary EAGLES conformant sample
lexicons and tag exemplary corpora for validation
purposes - develop standardised evaluation procedures for
lexicons
27Natural Interactionand Multimodality NIMM - WG
- A rapidly innovating domain urgently requiring
early standardisation. ISLE will develop
guidelines for - the creation of NIMM data resources
- multilayer annotation of NIMM data, including
spoken dialogue in NIMM contexts - meta-data descriptions of multimodal resources
- a specification - and first implementation - of
extension of annotation tools to multimodal
capability
28Evaluation of HLT Systems E - WG
- Evaluation methodology of HLT products and
systems - based on ISO standards
- accompanied by practical case studies
- quality models for Machine Translation systems
- maintenance of previous guidelines - in an ISO
based framework (ISO 9126, ISO 14598)
29- 86 Grosseto Workshop
- LEX Projects (Acquilex)
- EuroWordNet
- Common Top Ontologies
- Common Basic Concepts
- ILI Interlingual Index to English WordNet
- Italian, Spanish, Dutch,
- WordNet Association announced in Athens (2nd
LREC, June 2000) - Speechdat, Speechdat-Car, Specon, Sala, etc.
30PAROLE/SIMPLE
A set of comparable corpora (20M words) and
computational lexica (20K entries) Encoded at
morpho, syntactic, semantic level according to
common specifications compatible with EAGLES
recommendations For all the EC languages Subsidiar
ity principle Initial harmonized nuclei
(financially supported by the EC) to be enlarged
to real-life size through national funds
(already 7 national projects)
31ENABLER European National Activities for Basic
Language Engineering Resources
- -- A new Initiative
- Identification of existing resources (Universal
Catalogue) - The Basics (e.g. Standards, tools, evaluation
procedures, ) -
- Survey of existing national activities
- Fostering common research and compatibility of LR
- Suggestion for and contribution to international
- cooperation
32- INITIAL TOPICS for an INTERNATIONAL COOPERATION
STRUCTURE in the FIELD OF LR - to identify LR available for different languages
- to define a minimal set of basic LR to be
promoted for as many languages as possible - to establish an international effort vs. the WIPO
for promoting an adequate legislation for the
provision of LR - to establish for written LR an initiative
parallel to COCOSDA and an umbrella for
the two initiatives
33To set up truly international umbrella
organization for LR
To define The form, nature, ... To formulate A
complete workplan To identify Possible
affiliation/funding sources We will call a
meeting in the framework of ELSNET LR Task Group
Enabler before the end of the year Taking into
account the overview emerging from todays survey