Hypermedia Lexica and Lexicon Metadata - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Hypermedia Lexica and Lexicon Metadata

Description:

Hypermedia Lexica and Lexicon Metadata The MetaLex model in the ModeLex project Dafydd Gibbon U Bielefeld Europe E-MELD Workshop, Detroit, August 2002 – PowerPoint PPT presentation

Number of Views:111
Avg rating:3.0/5.0
Slides: 35
Provided by: emeldOrgwo
Learn more at: https://linguistlist.org
Category:

less

Transcript and Presenter's Notes

Title: Hypermedia Lexica and Lexicon Metadata


1
Hypermedia Lexica andLexicon Metadata
The MetaLex model in the ModeLex project Dafydd
GibbonU BielefeldEurope E-MELD Workshop,
Detroit, August 2002
2
Overview
Metalex goals Background DATR, Hyprlex, Speech,
Language DocumentationMetalex design theory and
practice Lexical documents metadocuments Lexic
al objects, properties, structuresMetalex
implementation Ivory Coast encyclopaedia
project Ega documentation model project The
Modelex (multimodal lexicon) project Ivory Coast
Nigeria documentation curriculum
projectExtending metalex Modalities
submodalities Data-driven lexicography Data
structures algorithms trees, lattices
induction, inference
3
Metalex goals background
  • General objectives
  • Versatile high quality spoken language
    lexicography
  • Motivated balance of high-tech low tech
  • Good resources are data-driven and
    theory-informed
  • Specific project objectives
  • DATR/ILEX formal lexicon theory and
    implementation
  • VerbMobil integrated HyprLex dissemination model
  • HyprLex encyclopaedia model for Ivory Coast
    Languages
  • Ega endangered language documentation model
  • Modelex - theory and design of multimodal lexica
  • Ivory Coast and Nigeria curricula for language
    documentation

4
(No Transcript)
5
(No Transcript)
6
Metalex design data and theory
  • Data-driven data metadata acqusition
  • Systematic metatext derived from and supporting
    ...
  • Computational fieldwork
  • Induction of lexica
  • Theory-informed data metadata acquisition
  • Integrated Lexicon (ILEX) consisting of ...
  • Abstract Lexicon (ALEX) - "theory" in the
    mathematical sense
  • Object Lexicon (OLEX) - "model" in the
    mathematical sense

7
Metalex design data
  • Data-driven acquisition
  • Computational fieldwork
  • Portable metadatabase with restricted vocabulary
    and general metatext, and
  • Definition of and support for transcription
    annotation
  • Portable support for scenarios, scripts
  • Portable support for lexicon processing
  • Induction of lexica
  • Lexicon tools for
  • Extraction of macrostructural elements (lexeme
    elements)
  • Induction of microstructural information (media
    concordance, POS, ...)
  • Induction of mesostructural regularities and
    subregularities (grammar, ...)

8
Metalex design theory
  • Theory-informed formalisation
  • Abstract Lexicon (ALEX) - "theory" in the
    mathematical sense
  • Decomposition (componential A-V description)
  • Generalisation (inheritance)
  • Composition (multilinear operations)
  • Object Lexicon (OLEX) - "model" in the
    mathematical sense
  • XML archiving and dissemination formats
  • object-relational database acquisition and
    processing formats
  • Integrated Lexicon (ILEX)

9
(No Transcript)
10
(No Transcript)
11
Metalex implementationarchitecture
  • Data model Ç Theory shared lexicon
    architecture
  • Macrostructure declarative and procedural
    components
  • Lexicon architecture relational, inheritance,
    text, ...
  • Lexical objects entry types
  • Lexical access fact query, semasiological /
    onomasiological indexing
  • Mesostructure
  • Generalisations grammar, phonetics, cultural
    background, ...
  • Composition of lexicon object types idioms,
    words, morphemes, ...
  • Lexical access inferential query
  • Microstructure
  • Lexical entry (article, lemma structure - atom,
    string, tree, ...)
  • Types of lexical information - standardly
    "lexicon model"

12
Metalex implementationmicrostructure
  • Microstructure specification philosophy
  • Anybody can specify any kind of unpredictable
    detail
  • Questionnaire / Experiment / Corpus / Archive
    dependence
  • Lexicon architecture relational, inheritance,
    text, ...
  • Intelligent (semi-)automatic classification, not
    fixed attributes
  • Theory-informed coarse grouping is possible
  • Media attributes visual, auditory, tactile, ...
  • Meaning attributes definition, gloss, lexical
    relations, ...
  • Composition attributes context/category, parts,
    operations
  • Use attributes style, register, concordance,
    media illustrations, ...
  • Micrometadata attributes lexicographer DB
    indices, source (e.g. fieldwork metadata) DB
    indices, modification, ...

13
Metalex implementationfieldwork metadata source
(1)
  • Situation dimensions
  • participant fieldworker, partners, contacts
  • channel modalities, media
  • locale indoor/outdoor, spatial configuration
  • temporal date, time, calendar event
  • functional affiliation, role, occasion
    observation (prompt, metadata management)
  • Language dimension
  • affiliation
  • discourse level discourse type, genre prosody
  • phrase level recursive phrasal
    categories/relations prosody
  • word level clitics, inflexion, word formation
    prosody

14
Metalex implementationfieldwork metadata source
(2)
  • Technical dimension
  • physical characteristics of participants age,
    sex, health
  • physical characteristics of locale
    indoor/outdoor, spatial configuration, temporal
    sequence, date (season), time (of day)
  • audio mike type, position, room A/D
    channels, fsample, resolution formats
  • video camera microphone type,
    analogue/digital filters, lenses audio
    formats
  • other sensors laryngograph, airflow, data glove,
    ...
  • Metalinguistic dimension
  • empirical method introspection, experiment,
    corpus elicitation
  • materials questionnaire, experiment layout,
    corpus scenario
  • metadata specification index, metatext type,
    metacatalogue type

15
Metalex implementationfieldwork metadata entry
tool
  • LREC 2002, Workshop on Portability Issues

16
Metalex implementationfieldwork metadata entry
tool
HanDBase DBMS for PalmOS
17
Metalex objectsin conjunction with work in ISLE
CLWG(Computational Lexicon Working Group)
  • (see Gibbon in reading list)
  • LEXICON
  • lt Macrostructure gt , lt Mesostructure gt
  • Macrostructure Ordering( ENTRY, ... )
  • Mesostructure lt FrontmatterMetadata,
    Descriptions gt
  • ENTRY
  • lt Microstructure, HousekeepingMetadata gt

18
The LEXICON object
  • Front Matter Metadata
  • Bibliographical creator, publisher, title, date,
    ...
  • Medium / format paper, CD-ROM/DVD, web, ...
  • Macrostructure type
  • access semasiological/onomasiological,
  • n-lingual/langue(s),
  • special taxonomy (thesaurus), concordance
  • structure, e.g. tabular f(type,attrib)value

19
The ENTRY object metadata
  • Entry Metadata (see Gibbon al. in reading
    list)
  • Entry type (wrt macrostructure specification)
  • encyclopaedic
  • multiword unit, word, ...
  • Microstructure data model specification
  • entry structure flat, tree, graph (net), ...
  • dta categories specification (atribute, field,
    information type)
  • DC groups - structural skeleton
  • DCs
  • DC substructure - homography, homophony, polysemy
    ...

20
The ENTRY object DC groups
  • Media ("surface")
  • acoustic (phonetic, earcon, sonification,),
    visual (orthography, icon, gesture, ...)
  • Composition (structure)
  • part (e.g. morphology for words), context (e.g.
    POS, subcat for words)
  • Meaning (definition, illustration)
  • semantic (components, relations, senses,
    ontology)
  • pragmatic (speech act, dialogue, disfluency, ...)
  • Use typically media (e.g. audio) concordance,
    ...
  • Metadata lexicographer, ...

21
The ENTRY object DCs
  • Countless Data Category models (see reading
    list)
  • every existing dictionary
  • linguistic "types of lexical information"
  • several European projects
  • (GENELEX, MULTILEX, ACQUILEX, ...)
  • ISO terminology norms (cf. MARTIF etc. ...)

22
The ENTRY object DC structures
  • Computationally relevant properties of fields
  • type (atomic, complex tree, string,
    xyz-formatted text)
  • character encoding spec. ASCII, Unicode, xyz
  • tree (or other graph/net)
  • finite depth
  • flat, disjunctive disjunctive tree
  • recursive graph (net)
  • table, non-tree graph, anchor/link/index
    structure
  • generated text
  • print, hypertext (compiled vs. dynamic (generated
    on the fly)

23
Metalex microstruture application
  • Media ("surface")
  • phonemic tonemic transcription (SAMPA ASCII -
    still waiting for Unicode...)
  • Composition (structure)
  • morphemic substructure, category subcategory
  • Meaning (definition, illustration)
  • glosses (English, French, German)
  • definitions, senses, relations, components
    audio-visual illustration
  • Use genres examples (e.g. concordance link)
    free text notes
  • Metadata first record last field

24
Metalex field lexicon microstruture
  • Anouman_1
  • Media attributes
  • Phonemic tier an'Um'a
  • Skeletal tier VNVNV
  • Tonal tier L H LH
  • Signal tier Audio
  • Meaning attributes
  • F-gloss Oiseau
  • E-gloss Bird
  • G-gloss Vogel
  • Definition avis
  • Homophone full Anouman_2 grandchild
  • Homophone phonemic Anouman_3 yesterday
  • Use
  • lt Concordance pointer gt
  • Genre narrative
  • Metadata
  • Lexicographer S. Adouakou
  • Source Bielefeld-Anyi-Corpus, Adaou village, CI
  • Date March 2002

25
Metalex portable lexical database
  • Relational database
  • Metalex specs flattened
  • structure re-constitution via metalex specs
  • HanDBase for PalmOS
  • Features
  • standard full RelDBMS
  • XML, CSV, text export
  • export/import via GSM
  • inexpensive (wrt laptop)
  • stylus, keyboard, sync input
  • light weight
  • low power consumption
  • inconspicous in use
  • interfaces to Scheme, C


26
Metalex extensionThe Modelex project"Theory
and Design of Multimodal Lexica"
  • Goals
  • Data-driven, theory-informed lexicon models
  • Formal properties of abstract data models for
    multimodal lexica
  • Interpretation of abstract data models in XML
  • Integration of parallel annotation lattices for
    modalities and submodalities
  • Development of a prototype multimodal lexicon

27
The Modelex domainmodalities and submodalities
28
Modelex data driven lexicography
29
Modelex gesture annotation
  • Time Aligned Signal
  • Corpus System
  • (Java, GPL)
  • Jan-Torsten Milde, U Bielefeld
  • TASX annotator
  • Phonological tier
  • ToBI tiers
  • Gesture tier
  • Speech Act tier
  • Anyi, Ega, German

30
Model-theoretic compilation in ILEXINTERPRETATIO
N ( ALEX ) OLEX
31
Metalex in the Modelex projectMultimodal
concordance as microstructure DC
  • Prototype http//www.spectrum.uni-bielefeld.de/la
    ngdoc/PAX/

32
Metalex in the Modelex projectunderspecified
ALEX microstructure for gesture coordinates
  • Hand
  • ltpartsgt "Palm" "Digit"
  • ltvectorgt "ltnamegt" ltcoord "ltnamegt"gt
  • ltcoordgt "ltx1gt" "lty1gt" "ltx2gt" "lty2gt"
  • ltgt
  • .
  • Palm
  • ltpartsgt ltvectorgt
  • ltnamegt palm
  • ltwidthgt pw
  • ltheightgt ph
  • ltx1 foregt ltx1gt
  • ltx1 middlegt ( ltx1gt ( ltx2gt - ltx1gt ) / 3 )
  • ltx1 ringgt ( ltx1gt ( ltx2gt - ltx1gt ) 2 / 3 )
  • ltx1 pinkygt ltx2gt
  • ltx1gt px1
  • lty1gt py1
  • ltx2gt ( ltx1gt ltwidthgt )
  • lty2gt ( lty1gt ltheightgt )

33
Metalex in the Modelex projectfully specified
ALEX microstructure for gesture coordinates
  • Handltpartsgt
  • palm px1 py1 ( px1 pw ) ( py1 ph )
  • thumb px1 py1 ( px1 - lt ) py1
  • fore px1 py1 px1 ( py1 - lf )
  • middle ( px1 ( ( px1 pw ) - px1 ) / 3 ) py1
    ( px1 ( ( px1 pw ) - px1 ) / 3 ) ( py1 - lm )
  • ring ( px1 ( ( px1 pw ) - px1 ) 2 / 3 )
    py1 ( px1 ( ( px1 pw ) - px1 ) 2 / 3 ) (
    py1 - lr )
  • pinky ( px1 pw ) py1 ( px1 pw ) ( py1 - lp )

34
Metalex conclusion prospects
  • User complexity
  • demands an open, data-driven approach
  • Domain
  • demands a theory-informed approach
  • with computational acquisition inference
  • Data-driven and theory-informed lexica
  • are possible (METALEX)
  • need integrated model-theoretic approach (ILEX)
  • INTERPRETATION (ALEX) OLEX
  • a formal problem remains differing complexity of
  • trees (archive) simulation of other graphs via
    semantics only
  • annotation lattices (data), tables (lexica)
  • regular relations if non-recursive, indexed
    grammars if recursive?
Write a Comment
User Comments (0)
About PowerShow.com