A Common Ontology for Linguistic Concepts - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

A Common Ontology for Linguistic Concepts

Description:

As many as half of the world's languages are in danger of ... as part of an interlingua designed for machine translation systems. Contact Info. Scott Farrar ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 16
Provided by: terry226
Category:

less

Transcript and Presenter's Notes

Title: A Common Ontology for Linguistic Concepts


1
A Common Ontology for Linguistic Concepts
  • Scott Farrar
  • University of Arizona

2
Endangered Languages
  • As many as half of the worlds languages are in
    danger of disappearing LaPolla (1998)
  • Including Many languages in the Americas (Hopi),
    Africa, Australia (), and Southeast Asia (Biao
    Min).

3
EMELD
  • EMELD (Electronic Metastructure for Endangered
    Languages Data)
  • One of Application of EMELD Make endangered
    languages available on the Semantic Web

4
Linguistic Field Work
  • Linguists collect data
  • Datasets (grammars, dictionaries, or glossed
    corpora)
  • Hopi example of kachina
  • sivu-ikwiw-ta-qa
  • vessel-carry on back-DUR-REL

5
Problems Concerning Data Interoperability
  • Dataset can vary according to
  • markup
  • theoretical style
  • natural language semantics
  • Az épület-be mégy-ek.
  • the building-IllativeCase go-1P/SING
  • I am going into the building.

6
Problems Concerning Data Interoperability
  • Linguistic Data is Dynamic
  • New data is collected.
  • Datasets are revised.
  • Theory changes.

7
Standardization is not Viable
  • Text Encoding Initiative (TEI) (Sperberg-McQueen
    and Burnard 1994)
  • Corpus Encoding Standard (CES) (Ide and Romary
    2000)

8
Towards a Solution
  • Data Storage and Distributionlocal or
    distributed?
  • Data model for linguistic datasets
  • Linguistic ontology

9
EMELD Architecture
10
Linguistic Ontology
  • Conceptual Model for the Linguistics domain
  • (special focus on morpho-syntax)
  • Built on top of the Standard Upper Merged
    Ontology (SUMO) (Niles and Pease 2001)
  • already includes a number of concepts relating to
    semiotics and linguistics
  • incorporates concepts from a number of top-level
    ontologies
  • peer-reviewed and freely available

11
Backbone Taxonomy
  • Entity
  • Physical
  • Object
  • ContentBearingObject
  • Icon
  • SymbolicString
  • LinguisticExpression
  • WrittenLinguisticExpression
  • Text
  • Sentence
  • Phrase
  • Word
  • Morpheme
  • SpokenLinguisticExpression
  • Dialogue
  • Sentence
  • Phrase
  • Word
  • Morpheme

12
Backbone Taxonomy (continued)
Abstract Class Relation Predicate Gramma
ticalRelation Aspect Tense Case
Agreement Attribute GrammaticalAttribute G
ender Person Number
13
Morphosyntactic Case
  • Case
  • InherentCase
  • Spatio-KineticCase
  • PositionalCase
  • InessiveCase
  • DirectionalCase
  • IllativeCase
  • ExistentialCase
  • AbessiveCase
  • PartitiveCase
  • InstrumentalCase
  • StructuralCase
  • GenitiveCase
  • ErgativeCase
  • NominativeCase

14
Future directions
  • Include the domains of phonology and discourse
    analysis.
  • The linguistics ontology has applications beyond
    the immediate EMELD project
  • as part of an expert system for reasoning about
    language data
  • as part of an interlingua designed for machine
    translation systems

15
Contact Info
  • Scott Farrar
  • Will Lewis
  • Terry Langendoen
  • farrar, wlewis, langendoen
  • _at_u.arizona.edu
Write a Comment
User Comments (0)
About PowerShow.com