Title: Using OLIF, The Open Lexicon Interchange Format
1Using OLIF,The Open Lexicon Interchange Format
- Susan McCormick
- OLIF2 Consortium
- October 1, 2004
2The OLIF Format
- The Open Lexicon Interchange Format
- XML-compliant standard
- Supports exchange of lexical and terminological
data for language technology applications - Handles basic exchange as well as more complex
applications such as MT lexicons
3The OLIF2 Consortium
- OLIF v.2 was developed by the OLIF2 Consortium, a
group of language technology companies and
organizations interested in issues of MT
data/term data exchange - Led by SAP
- Members include Xerox, Microsoft, Trados, IBM,
Systran, IAI, DFKI and Comprendium
4Developing OLIF v.2
- Based on OLIF prototype
- Developed in EC-funded OTELO project proposing
standards for users of disparate language tools - Original purpose of OLIF was to facilitate
terminology exchange for industrial users of MT
5Developing OLIF v.2
- Version 2 adapted from OLIF prototype using input
from - Developers/users of 3 MT systems
- Developers/users of terminology management
systems - Other language standards projects
- EAGLES
- SALT
- ISLE
- MARTIF, TBX
6OLIF Version 2
- Released as open standard in 2002
- XML-compliant
- Covers 6 European languages
- English, German, French, Spanish, Danish,
Portuguese - Includes options for modeling administrative,
morphological, syntactic and semantic data
7Available to Users
- XML implementation of OLIF specification in a DTD
- Available from OLIF2 Consortium web site
- www.olif.net
8The OLIF File
- Follows Terminology Markup Framework (TMF)
structure - Header
- Body
- Shared resources
9The OLIF Entry
- Collection of monolingual data on a specified
sense of a word or phrase - Optional links for cross-reference and transfer
- Transfer is bilingual and unidirectional
- Multiple transfers in multiple languages possible
for single word sense
10Key Data Categories
- The OLIF entry is uniquely identified by 5 key
data categories - Canonical form
- Language
- Part of speech
- Subject field
- Semantic reading
11Basic Well-Formed OLIF Entry
ltentrygt ltmonogt ltkeyDCgt
ltcanFormgttablelt/canFormgt
ltlanguagegtenlt/languagegt
ltptOfSpeechgtnounlt/ptOfSpeechgt
ltsubjFieldgtgenerallt/subjFieldgt
ltsemReadinggt86lt/semReadinggt lt/keyDCgt
lt/monogt lt/entrygt
12- ltentrygt
- ltmonogt
- ltkeyDCgt
- ltcanFormgttablelt/canFormgt
- ltlanguagegtenlt/languagegt
- ltptOfSpeechgtnounlt/ptOfSpeechgt
- ltsubjFieldgtgenerallt/subjFieldgt
- ltsemReadinggt86lt/semReadinggt
- lt/keyDCgt
- ltmonoDCgt
ltmonoAdmingt
ltoriginatorgtWeberlt/originatorgt
ltadminStatusgtverlt/adminStatusgt
lt/monoAdmingt
ltmonoMorphgt
ltinflectiongtlike book,bookslt/inflectiongt
lt/monoMorphgt
ltmonoSyngt
ltsynTypegtcntlt/synTypegt
ltsynFramegtgencomp-optlt/synFramegt
lt/monoSyngt
ltmonoSemgt
ltsemTypegtinformlt/semTypegt
lt/monoSemgt
13OLIF Entry with Cross-Reference
- ltentrygt
- ltmonogt
- ltkeyDCgt
- ltcanFormgttablelt/canFormgt
- ltlanguagegtenlt/languagegt
- ltptOfSpeechgtnounlt/ptOfSpeechgt
- ltsubjFieldgtgenerallt/subjFieldgt
- ltsemReadinggt86lt/semReadinggt
- lt/keyDCgt
- lt/monogt
-
ltcrossRefergt ltkeyDCgt
ltcanFormgtrowlt/canFormgt
ltlanguagegtenlt/languagegt
ltptOfSpeechgtnounlt/ptOfSpeechgt
ltsubjFieldgtgenerallt/subjFieldgt
ltsemReadinggt69lt/semReadinggt lt/keyDCgt
ltcrLinkTypegthas-meronymlt/crLinkTypegt lt/crossRefergt
14OLIF Entry with Transfer
- ltentrygt
- ltmonogt
- ltkeyDCgt
- ltcanFormgttablelt/canFormgt
- ltlanguagegtenlt/languagegt
- ltptOfSpeechgtnounlt/ptOfSpeechgt
- ltsubjFieldgtgenerallt/subjFieldgt
- ltsemReadinggt86lt/semReadinggt
- lt/keyDCgt
- lt/monogt
-
lttransfergt ltkeyDCgt
ltcanFormgtTabellelt/canFormgt
ltlanguagegtdelt/languagegt
ltptOfSpeechgtnounlt/ptOfSpeechgt
ltsubjFieldgtgenerallt/subjFieldgt
ltsemReadinggt86lt/semReadinggt
lt/keyDCgt lt/transfergt
15Data Category Values
- Allowed values specified by OLIF
- Administrative, terminological, linguistic values
based on - General industry standards
- E.g., allowed values for date derived from
recommendations from ISO 86011988 - MT/Terminology standards
- E.g., suggested values for subject field adapted
from EC - Widely-recognized linguistic standards
- E.g., allowed values for gender based on
longstanding gender description for European
languages
16User Extensions The OLIF Data Category Registry
- Users may declare and use their own values for
certain data categories - Subject field
- Semantic reading
- Morphological structure
- Part of speech
- Inflection
- Aspect
- Syntactic type
- Syntactic frame
- Semantic type
- Concept hierarchy
17Organizing Based on Concept
- Users may link monolingual entries via a concept
identifier - These IDs can be used to organize entries as
equivalent word senses associated with the same
concepts rather than source word senses
associated with transfers.
18Entries Linked by Concept
- ltentry ConceptUserId
- 0731F16CCCD2D3119B4Dgt
- ltmonogt
- ltkeyDCgt
- ltcanFormgttablelt/canFormgt
- ltlanguagegtenlt/languagegt
- ltptOfSpeechgtnounlt/ptOfSpeechgt
- ltsubjFieldgtgenerallt/subjFieldgt
- ltsemReadinggt86lt/semReadinggt
- lt/keyDCgt
- lt/monogt
- lt/entrygt
ltentry ConceptUserId
0731F16CCCD2D3119B4Dgt ltmonogt
ltkeyDCgt ltcanFormgtTabellelt/canFormgt
ltlanguagegtdelt/languagegt
ltptOfSpeechgtnounlt/ptOfSpeechgt
ltsubjFieldgtgenerallt/subjFieldgt
ltsemReadinggt86lt/semReadinggt
lt/keyDCgt lt/monogt lt/entrygt
19Whats Available to the OLIF User?
- On www.olif.net
- Complete XML DTD for download
- Hyperlinked DTD for viewing
- Graphical view of structure of DTD
- Current specification for OLIF v.2
- Formalization of OLIF data categories
- Alphabetic list of XML elements and attributes
- Fixed and recommended values for elements and
attributes - Guidelines for formulating canonical forms
- Sample OLIF entries
20(No Transcript)
21Using OLIF
- Some applications
- SAP has implemented an OLIF converter to exchange
terminological data from its central termbase
SAPterm - MT developers in OLIF2 Consortium currently
developing OLIF converters (Comprendium, Systran) - OLIF User Forum 60 members
22Whats New XML Schema
- OLIF XSD offers
- 40 built-in data types
- Allows creation of user-defined data types
- Supports inheritance
23Whats New The OLIF API
- Based on OLIF XSD, Java classes created
- Supports
- Converting .csv files to OLIF
- Converting from XML format to OLIF
- Creating OLIF documents from scratch
- Modifying OLIF documents
24What to Expect this Year from OLIF
- OLIF XSD and API are available to the user from
www.olif.net - OLIF web site upgraded, updated
- Requirements for modeling Japanese entries
integrated
25OLIF User Forum
- Users of OLIF can access and post questions,
messages and sample data from the OLIF group
site - http//groups.yahoo.com/group/olifConsortium/