Transcribing and annotating spoken language with EXMARaLDA - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Transcribing and annotating spoken language with EXMARaLDA

Description:

LREC-Workshop on XML-based richly annotated corpora, Lisbon, 29 May 2004 ... Intonation contour. Tone. Finite State Machine (HIAT, GAT, DIDA, CHAT, ...) Segmentation ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 25
Provided by: Thom325
Category:

less

Transcript and Presenter's Notes

Title: Transcribing and annotating spoken language with EXMARaLDA


1
Transcribing and annotating spoken language with
EXMARaLDA
LREC-Workshop on XML-based richly annotated
corpora, Lisbon, 29 May 2004
  • Thomas Schmidt
  • Sonderforschungsbereich 538 Mehrsprachigkeit
  • University of Hamburg

2
  • Richly annotated corpora?
  • Richly annotable corpora?
  • Corpus creation
  • Exchangeability
  • Framework for things to be annotated?
  • ? Framework for annotations

CHAT Corpus
HIAT-DOS Corpus
WordBase Corpus
Verbmobil Corpus
syncWriter Corpus
Transcription framework
Annotation framework
3
Partitur Transcriptions
4
Partitur Transcriptions
  • Structural relations
  • Temporal sequence

5
Partitur Transcriptions
  • Structural relations
  • Temporal sequence
  • Simultaneity

6
Partitur Transcriptions
  • Structural relations
  • Temporal sequence
  • Simultaneity
  • Equivalence (Flat annotation)

7
Single timeline, multiple tiers
8
Single timeline, multiple tiers
9
Single timeline, multiple tiers
10
EXMARaLDA Partitur-Editor
Graphical User Interface
11
EXMARaLDA Partitur-Editor
Manipulating tiers, the timeline and events
12
EXMARaLDA Partitur-Editor
Visualization as a wrapped partitur
... as a line transcript
... in column notation
13
TASX-Annotator
14
PRAAT
15
ELAN
16
Variants of single timeline, multiple tiers
EXMARaLDA TASX Praat ELAN
Tier classification Types, Categories and speakers Tier names Tier names Stereotypes, Linguistic Types and Participants
Timeline Relative and/or absolute Absolute Absolute Relative and/or absolute
Overlap within tier No Yes No Yes (Bulldozer mode)
Link to media Optional (Audio only) Required (Video and Audio) Required (Audio only) Optional (Video and Audio)
Extensions Segmented Transcription TASX Level 2 None Symbolic subdivisions, symbolic associations
17
Beyond the single timeline
18
Beyond the single timeline
  • Simple annotation Part of speech tagging
  • each word a single entity
  • add suitable points to the timeline

19
Beyond the single timeline
Determine order of words (syllables, phonemes,
...) in overlaps or Allow bifurcations of the
timeline
20
Segmentation
  • EXMARaLDA Basic Transcription
  • Single timeline, multiple tiers
  • Intuitive transcription of verbal and non-verbal
    behaviour
  • Visualization
  • Exchange with TASX, PRAAT and ELAN
  • Simple (utterance level) annotation, e.g.
  • Utterance translation
  • Prosody (Dynamic Modulation etc.)

Finite State Machine (HIAT, GAT, DIDA, CHAT, ...)
  • EXMARaLDA Segmented Transcription
  • Bifurcated timeline, multiple tiers
  • Advanced (word, syllable, phoneme level)
    annotation, e.g.
  • POS-Tagging
  • Morphological transliteration
  • Intonation contour
  • Tone

21
Meta Data
EXMARaLDA Corpus Manager (CoMa) Annotation of
speakers and whole interactions
22
Summary
  • EXMARaLDA Transcription Framework
  • Single timeline, multiple tiers data model
  • Common basis for different existing
    transcription system
  • Intuitive, efficient data model suitable for
  • User-friendly input
  • Flexible visualization
  • Simple flat annotations
  • Exchange with other tools
  • Extended data model Segmented transcription
  • Automatically generated from Basic
    transcription
  • More advanced flat annotations
  • Meta data annotation

23
Open questions 1
  • Limitations
  • Hierarchal annotation (e.g. Phrase structure)?
  • Discontinued constituents (e.g. German particle
    verbs)?
  • Cross level ( cross tier) annotation?
  • Visualization?

Exchange
EXMARaLDA Basic Transcription
TASX Level 1
PRAAT
ELAN Abstract Corpus Model
EXMARaLDA Segmented Transcription
TASX Level 2
?
?
?
?
?
Annotation graphs
24
Open questions 2
  • Hierarchy Based Data Models
  • XML standardized storage
  • DTDs/Schemas validity check
  • XSLT transformation
  • XPath / XQuery query
  • DOM / NOM in-memory representation
  • Time based data models
  • XML standardized storage
  • How to check validity?
  • How to transform?
  • How to query?
  • AGLIB?

First step Understand differences and
commonalities between existing time- based data
models Second step Harmonize existing time
based models
Write a Comment
User Comments (0)
About PowerShow.com