Treebanks: Layering the Annotation - PowerPoint PPT Presentation

About This Presentation
Title:

Treebanks: Layering the Annotation

Description:

Title: PowerPoint Presentation Last modified by: hajic Created Date: 1/1/1601 12:00:00 AM Document presentation format: P edv d n na obrazovce – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 7
Provided by: washi107
Category:

less

Transcript and Presenter's Notes

Title: Treebanks: Layering the Annotation


1
Treebanks Layering the Annotation
  • Jan Hajic
  • Institute of Formal and Applied Linguistics
  • School of Computer Science
  • Faculty of Mathematics and Physics
  • Charles University, Prague
  • Czech Republic

2
Layering the PDT 2.0
  • 4 (5) stand-off layers
  • Deep structure (t)
  • Syntax semnatics
  • Dependecy non-dep. links
  • Surface structure (a)
  • Dependency, function
  • Morphology (m)
  • Lemma, tag (detailed)
  • Word (token) (w)
  • Audio/auto transcript (z)
  • PML Scheme (XML based)

z-layer
3
The Links
  • Within t-layer
  • Co-reference links
  • Pronoun to antecedent, (future full coref
    chains)
  • Complement to 2nd governor, etc.
  • Lexicon links
  • Verbs, nouns, adjectives, adverbs to dictionary
    entry
  • Word sense disambiguated, valency/frame-based
  • t-layer to a-layer
  • Which a-node the t-node comes from
  • No restrictions (crossing, many-to-many, )

4
The Questions I
  • Influence of choices made in the underlying
    annotation influenced upper layer choices?
  • Minimal or none
  • thanks to stand-off annotation style, and
    many-to-many references/links allowed (XML IDs)
  • Added annotation (over surface syntax)
  • Node order (information structure), deep
    dependencies, 30 node labels (time, modalities,
    semantic POS, number, pronoun classes, ),
    co-reference, valency dictionary ( frame
    files) links (word sense annotation), empty
    nodes (args),

5
The Questions II
  • Hard to circumvent syntactic choices?
  • Not really (again, thanks to XML stand-off)
  • Only 1 label at surface syntactic level
    (function)
  • Dependency(-only) no problem (no need to refer to
    phrases all represented by subtrees)
  • but there will be a problem with the t-layer
  • When referring from some higher (logic)
    layer
  • (Probably) need to refer to labels (attributes)
  • Solution
  • Add IDs to attributes (should be easy, in fact
    XML ID)

6
The Questions III
  • Desirable characteristics for adding layers
  • Stand-off annotation
  • Proper IDs for in-, between-layer reference
  • In advance, if possible, but usually can be added
    later
  • Quality Control
  • !! Easier with layers - cross-layer constraints
  • Invisible to annotators -gt catch random errors
  • Links (between-layer type) can be pre-annotated
  • PS vs. dep. impact on additional annotation
  • Not observed
Write a Comment
User Comments (0)
About PowerShow.com