osis linguistic annotation - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

osis linguistic annotation

Description:

osis linguistic annotation definitions and requirements kirk e. lowery westminster hebrew institute sbl computer-assisted research group the context why osis ... – PowerPoint PPT presentation

Number of Views:134
Avg rating:3.0/5.0
Slides: 30
Provided by: bible3
Category:

less

Transcript and Presenter's Notes

Title: osis linguistic annotation


1
osis linguistic annotation
  • definitions and requirements

kirk e. lowerywestminster hebrew institute sbl
computer-assisted research group
2
the context
  • why osis linguistic annotation?

3
the goal of osis
  • to exchange electronic bibles
  • any language, medium, presentation style
  • to add meta-information to those texts
  • keywords link, hierarchy, pyramid
  • to easily transform these texts
  • the target transformation is unknown
  • to cut costs production, presentation,
    distribution of bibles plus meta-data
  • time, money, people

4
why exchange bible texts?
  • coordination within organizations
  • cooperation between organizations and between
    individuals
  • publish in multiple formats and media from one
    canonical source
  • long-term archival
  • the changing definition of publish
  • documents have a life cycle!

5
who wants to exchange texts?
  • bible publishers
  • commercial publishing houses
  • denominations bible societies
  • bible translators
  • translation teams editors
  • consultants supervisors
  • bible scholars
  • original languages, text criticism
  • text analysis and commentary

6
text meta-data
  • what informationneeds to be captured?

7
translatorsmanaging the translation process
  • document versions responsibility
  • comments corrections by editors
  • handling presentation issues
  • script direction
  • rubies
  • linking source, relay target translations
  • linking supplementary information
  • notes, glossaries, maps

8
translators scholarsfocus on the text
  • manuscript collation description
  • text criticism establishment of the original
  • linguistic analysis
  • text segmentation
  • segment id from phoneme to text structures
  • linguistic mapping of source target
  • alignment parallel synoptic texts

9
linguistic annotation
  • how can we capturethe information?

10
required
  • a way to segment the text
  • a mechanism for associating labels with an
    arbitrary text-span
  • a means to declare labels used in analysis
  • a common linguistic vocabulary
  • language-specific grammar terms
  • a protocol for user redefinition

11
segmenting text
  • ltseg id"gn11,1.1"gtB.lt/seggtltseg
    id"gn11,1.2"gtR")IYTlt/seggtltseg
    id"gn11,2.1"gtB.FRF)lt/seggtltseg
    id"gn11,3.1"gt)ELOHIYM lt/seggtltseg
    id"gn11,4.1"gt)"Tlt/seggtltseg id"gn11,5.1"gtHAlt
    /seggtltseg id"gn11,5.2"gt.FMAYIMlt/seggtltseg
    id"gn11,6.1"gtWlt/seggtltseg id"gn11,6.2"gt)"Tlt
    /seggtltseg id"gn11,7.1"gtHFlt/seggtltseg
    id"gn11,7.2"gt)FREClt/seggt

start tag unique identification hebrew text end
tag
12
adding annotation (1)
ltseg id"gn11,1.1"gtB. ltlemmagtB.lt/lemmagt ltparti
cle type"preposition" /gtlt/seggtltseg
id"gn11,1.2"gtR")IYT ltlemmagtR")IYTlt/lemmagt ltn
oun type"common" features"fsa"
/gtlt/seggtltseg id"gn11,2.1"gtB.FRF) ltlemma
homonym"1"gtB.R)lt/lemmagt ltverb stem"q"
conjugation"p" pgn"3ms"
/gtlt/seggtltseg id"gn11,3.1"gt)ELOHIYM
ltlemmagt)ELOHIYMlt/lemmagt ltnoun type"common"
features"mpa" /gtlt/seggtltseg
id"gn11,4.1"gt)"T ltlemma homonym"1"gt)"Tlt/lemmagt
ltparticle type"object_marker" /gtlt/seggt
content tag milestone tag
13
adding annotation (2)
ltseg id"gn11,5.1"gtHA ltlemmagtHlt/lemmagt ltparticl
e type"article" /gtlt/seggtltseg
id"gn11,5.2"gt.FMAYIM ltlemmagtFMAYIMlt/lemmagt lt
noun type"common" features"mpa"
/gtlt/seggtltseg id"gn11,6.1"gtW ltlemmagtWlt/lemmagt
ltparticle type"conjunction" /gtlt/seggtltseg
id"gn11,6.2"gt)"T ltlemma homonym"1"gt)"Tlt/lemmagt
ltparticle type"object_marker" /gtlt/seggtltseg
id"gn11,7.1"gtHF ltlemmagtHlt/lemmagt ltparticle
type"article" /gtlt/seggtltseg id"gn11,7.2"gt)FREC
ltlemmagt)EREClt/lemmagt ltnoun type"common"
features"fsa" /gtlt/seggt
content tag milestone tag
14
the hard part linguistic labels
  • must be standard
  • must be applicable to any conceivable language
  • labels are the linguistic inventory
  • must be compatible with current and future
    linguistic theories
  • labels must be linguistic theory-neutral
  • must be redefinable by the user

15
standard solutions labels
  • expert advisory group on language engineering
    standards (eagles)
  • lthttp//www.ilc.pi.cnr.it/EAGLES/home.htmlgt
  • an initiative of the european commission (1993)
  • standard grammar labels of morphology and syntax
    for european languages
  • create osis standard labels for hebrew, aramaic
    and greek

16
standard solutions mechanism
  • the text encoding initiative (tei) guidelines
  • chapter 14 linking, segmentation, alignment
  • chapter 16 feature structures
  • chapter 26 feature system declaration
  • stand-off markup (xlink) or up-close-and-person
    al (inline)?
  • separate meta-data about the text from the text
    itself?
  • either-or or both-and?

17
formal requirements
  • what we must do, exactly

18
labels
  • claims made about the data itself vs claims about
    the claims that can be made!
  • the linguistic model vs the analysis allowed by
    the model
  • example does Hebrew have adverbs?
  • a library of labels as comprehensive as possible
  • definitions to clarify what thing is being
    labeled
  • labels are names for grammatical objects

19
labels as objects
  • grammatical objects have attributes or
    features
  • features can vary over a range of values
  • objects features have defaults that could be
    changed
  • objects features could be easily extended
  • objects features can be arranged linearly or
    hierarchically

20
mechanism
  • user language declaration
  • all labels and their relationships
  • done by exclusion, not inclusion
  • sensitive to linguistic theory
  • levels of language resolution of ambiguity
  • lexical, semantic, phonemic, morphologic,
    phrase-, clause-, discourse-, theological levels
  • context-free and context-bound analysis
  • part-of-speech resolution

21
tei feature structures
  • the feature element
  • the most basic markup
  • requires a label and any number of values
  • ltf t"feature name" value"feature value"gt
  • the feature structure element
  • ltfs name"feature structure name"gt
  • may contain any number of nested ltfgt and ltfsgt
  • models some grammatical object

22
tei feature example
ltf name"conjugation"gt ltvAlt mutExcl"Y"gt
ltsym id"pf" value"perfect"
/gt ltsym id"impf" value"imperfect"
/gt ltsym id"qppt"
value"qal_passive_participle" /gt ltsym
id"wc" value"wayyiqtol" /gt
ltsym id"impv" value"imperative"
/gt ltsym id"inf" value"infinitive"
/gt ltsym id"pt" value"participle"
/gt lt/vAltgt lt/fgt
23
tei feature structure example
ltfs type"common noun features"gt ltf
name"gender" org"set" fVal"gm gf gn" /gt ltf
name"number" org"set" fVal"ns np nd" /gt ltf
name"state" org"set" fVal"sa sc" /gt lt/fsgt
24
tei feature library example
ltfvLib id"g" type"gender feature values"gt
ltvAlt mutExcl"N"gt ltsym id"gm"
value"masculine"/gt ltsym id"gf"
value"feminine" /gt ltsym id"gn"
value"neuter" /gt lt/vAltgt lt/fvLibgt
25
a different approach
Dictionary of Packard-Style Greek Morphology Codes
ltdiv type"x-tag" osisID"A_APFC" divTitle"A
APFC"gt ltpgtPart of speech adjectivelt/pgt
ltpgtCase accusativelt/pgt ltpgtNumber
plurallt/pgt ltpgtGender femininelt/pgt
ltpgtDegree comparativelt/pgt lt/divgt
26
what can we do with feature structure marked up
text?
  • self-organizing topic maps
  • compare linguistic hypotheses with actual usage
  • XSLT transforms
  • automated tagging of new features
  • comparative linguistic study
  • source?target language grammar mapping

27
conclusions
  • where do we go from here?

28
in the short-term
  • complete a first pass of language modeling
  • mark up real biblical text with annotation
  • distribute to translators and scholars for
    feedback
  • does this meet your needs?
  • is it practical enough that you will use it?
  • is it flexible enough for your language(s) and
    linguistic theories

29
in the long-term
  • determine if tei feature structures are
    sufficient
  • decide whether to require inline or standoff
    markup, or to allow either
  • determine the best way of integrating linguistic
    markup with the osis core tag set
  • explore ideas for authoring software or, at
    least, linguistic annotation utility programs
Write a Comment
User Comments (0)
About PowerShow.com