Sign Language Representation for Machine Translation - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Sign Language Representation for Machine Translation

Description:

... chosen for its representation, is largely dictated by the society and culture ... segmentation algorithms for separating words, phrases and sentences ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 27
Provided by: saramor
Category:

less

Transcript and Presenter's Notes

Title: Sign Language Representation for Machine Translation


1
Sign Language Representation for Machine
Translation
  • Sara Morrissey
  • NCLT/CNGL Seminar Series
  • 1st April, 2009

2
Why is there no writing system?
  • Social reasons
  • Variation and demographic spread
  • Political reasons
  • Recognition
  • Linguistic reasons
  • Visual-gestural-spatial languages, simultaneous
    phoneme production

3
Implications of the lack of writing system
  • for Deaf people
  • Forced use language not native
  • for the languages
  • social acceptance ? standardisation (Pizzuto,
    2006)
  • for MT
  • Limits availability of domain-specific corpora
  • No standards, difficult to compare systems
  • Significance of results on small datasets
  • Difficult to use NLP tools developed for spoken
    langs

4
Sign Language Representation Formats
  • Linear
  • Stokoe Notation, HamNoSys
  • Multi-level
  • Gloss, Partition/Constitute, Movement-Hold, SiGML
  • Iconic
  • SignWriting

5
Linear Symbolic Notations
Stokoe Notation dont know
HamNoSys Notation nineteen
6
Multi-level Representations
Movement-Hold
Partition/Constitute
lt?xml version"1.0"encoding"iso-8859-1"?gt lt!DOCTY
PE sigml SYSTEM "http//..."gt ltsigmlgt lthamgesturea
l sign gloss"going to DGS"gt ltsign manual both
hands"true"gt lthandconfig handshape"finger2
thumbpos"out"/gt lthandconfig extfidir"uo
palmor"1"/gt
SiGML
Gloss Annotation
7
Iconic
Sign Writing
8
But different groups, different requirements
  • (Pizzuto et al, 2006)
  • the aspect of a language chosen for its
    representation, is largely dictated by the
    society and culture developing the writing system
    and what purpose and settings such communication
    is required for.
  • Deaf, linguists, language processors

9
Requirements for MT
  • large bilingual domain-specific corpus of good
    quality digital data
  • gold standard reference
  • segmentation algorithms for separating words,
    phrases and sentences
  • alignment methodologies for these units.
  • searching the source and target texts
  • acceptable capturing of the language for output

10
Discussion of current methods
  • Stokoe (Stokoe, 1960)
  • Difficult to capture classifiers and NMFs
  • Decontextualised signs only
  • ASCII version (Mandel, 1993)
  • HamNoSys (Prillwitz, 1989)
  • NMFs included
  • Subsection of 150 symbols for handwriting
    purposes
  • Mac usage, Windows font

11
Discussion of current methods (2)
  • Gloss Annotation (Leeson et al., 2006, Neidle et
    al., 2002)
  • Most commonly used in MT and by linguists
  • No universal conventions
  • Extensible
  • Using one language to describe another
  • Allows for simultaneous timed logging of features
  • Tools widely available
  • SL and linguistic knowledge a requirement
  • No knowledge of supplementary symbolic system
    required

12
Discussion of current methods (3)
  • Partition/Constitute (Huenerfauth, 2005)
  • Captures movement, classifier and spatial info
  • Comprehensive, hierarchical repn
  • Implicit use of gloss terms
  • Movement-Hold (Liddell Johnson, 1989)
  • Numerically-encoded handshapes
  • Multi-layer
  • Used with recognition technology (Vogler
    Metaxas, 2004)

13
Discussion of current methods (4)
  • SiGML (Elliott et al., 2004)
  • Describes HamNoSys for animation (ViSiCAST)
  • Double representation
  • SignWriting (Sutton, 1995)
  • Compact icons
  • Information displayed in one place
  • Advocated by SL linguists and growing Deaf
  • Not currently machine readable

14
Worked Example
  • Data-driven Machine Translation for Sign
    Languages (Morrissey, 2008)
  • MaTrEx MT system
  • Glossed Annotations of Irish Sign Language (ISL)
    and German Sign Language (DGS)
  • Air Traffic Information System corpus of 600
    sentences
  • Translated and signed by native Deaf signers

15
Hand-crafted gloss annotation corpus
16
Translation Directions
17
MaTrEx Experiments
  • ISL gloss-to-English text
  • Baseline
  • SMT
  • EBMT 1
  • EBMT 2
  • Distortion limit

18
ISL-EN MaTrEx Experiments
19
EN-ISL MaTrEx Experiments
20
Other experiments
  • ISL?DE, DGS?DE, DGS?EN
  • ISL? EN best scores, by 6.38 BLEU
  • EBMT 1 chunks improves for ISL-DE only
  • EBMT 2 chunks improves for ISL-DE only
  • DE?ISL, DE?DGS, EN?DGS
  • EN?DGS best scores, by 1.3 BLEU
  • EBMT 1 chunks improves for EN?DGS EN?ISL
  • EBMT 2 chunks improves for all
  • Comparison with RWTH system
  • Were better! JC 2-6 BLEU
  • ISL video recognition
  • Speech output

21
ISL Animation
  • Poser software
  • Hand-crafted 66 videos, 50 sentences
  • Played in sequence
  • 4 Deaf evaluators
  • 2 x 4-point scale
  • 82 - intelligibility
  • 72 - fidelity
  • Questionnaire

Demo
22
Thesis Conclusions
  • Good results can be obtained
  • Glossing most appropriate, but not going forward
  • Allowed linguistic-based alignment
  • Linear, easily accessible format
  • Lack of NMF detail, time-consuming, not
    considered adequate representation of language
  • EBMT chunks show potential but require more
    development
  • Development of animation module

23
Where do we go from here?(the words are coming
out all weird)
  • What is the most appropriate SL representation
    for MT?
  • Adequately represents the language,
  • Animation production,
  • Facilitates the translation process.

24
Repn overview, redux
  • Glossing machine readable, doesnt adequately
    represent the language or facilitate animation
  • Stokoe ASCII version, not adequate repn
  • Partition/Constitute multi-layered, uses glosses
  • Movement-Hold multi-layered, uses glosses
  • Sign Writing compact icons, accepted, potential
    readability, not machine readable at present
  • HamNoSys SiGML machine readable, comprehensive
    description, adapted for animation, suited to SMT

25
The Future
  • Explore HamNoSys in practice
  • MT in medical domain, Health Ireland Partner GP
    work group questionnaire
  • Human Factors
  • Minority Language MT

26
Thank you for listening
Yep, its the end!
I hope it wasnt too long
Any questions?
Write a Comment
User Comments (0)
About PowerShow.com