Title: Sign Language Representation for Machine Translation
1Sign Language Representation for Machine
Translation
- Sara Morrissey
- NCLT/CNGL Seminar Series
- 1st April, 2009
2Why is there no writing system?
- Social reasons
- Variation and demographic spread
- Political reasons
- Recognition
- Linguistic reasons
- Visual-gestural-spatial languages, simultaneous
phoneme production
3Implications of the lack of writing system
- for Deaf people
- Forced use language not native
- for the languages
- social acceptance ? standardisation (Pizzuto,
2006) - for MT
- Limits availability of domain-specific corpora
- No standards, difficult to compare systems
- Significance of results on small datasets
- Difficult to use NLP tools developed for spoken
langs
4Sign Language Representation Formats
- Linear
- Stokoe Notation, HamNoSys
- Multi-level
- Gloss, Partition/Constitute, Movement-Hold, SiGML
- Iconic
- SignWriting
5Linear Symbolic Notations
Stokoe Notation dont know
HamNoSys Notation nineteen
6Multi-level Representations
Movement-Hold
Partition/Constitute
lt?xml version"1.0"encoding"iso-8859-1"?gt lt!DOCTY
PE sigml SYSTEM "http//..."gt ltsigmlgt lthamgesturea
l sign gloss"going to DGS"gt ltsign manual both
hands"true"gt lthandconfig handshape"finger2
thumbpos"out"/gt lthandconfig extfidir"uo
palmor"1"/gt
SiGML
Gloss Annotation
7Iconic
Sign Writing
8But different groups, different requirements
- (Pizzuto et al, 2006)
- the aspect of a language chosen for its
representation, is largely dictated by the
society and culture developing the writing system
and what purpose and settings such communication
is required for. - Deaf, linguists, language processors
9Requirements for MT
- large bilingual domain-specific corpus of good
quality digital data - gold standard reference
- segmentation algorithms for separating words,
phrases and sentences - alignment methodologies for these units.
- searching the source and target texts
- acceptable capturing of the language for output
10Discussion of current methods
- Stokoe (Stokoe, 1960)
- Difficult to capture classifiers and NMFs
- Decontextualised signs only
- ASCII version (Mandel, 1993)
- HamNoSys (Prillwitz, 1989)
- NMFs included
- Subsection of 150 symbols for handwriting
purposes - Mac usage, Windows font
11Discussion of current methods (2)
- Gloss Annotation (Leeson et al., 2006, Neidle et
al., 2002) - Most commonly used in MT and by linguists
- No universal conventions
- Extensible
- Using one language to describe another
- Allows for simultaneous timed logging of features
- Tools widely available
- SL and linguistic knowledge a requirement
- No knowledge of supplementary symbolic system
required
12Discussion of current methods (3)
- Partition/Constitute (Huenerfauth, 2005)
- Captures movement, classifier and spatial info
- Comprehensive, hierarchical repn
- Implicit use of gloss terms
- Movement-Hold (Liddell Johnson, 1989)
- Numerically-encoded handshapes
- Multi-layer
- Used with recognition technology (Vogler
Metaxas, 2004)
13Discussion of current methods (4)
- SiGML (Elliott et al., 2004)
- Describes HamNoSys for animation (ViSiCAST)
- Double representation
- SignWriting (Sutton, 1995)
- Compact icons
- Information displayed in one place
- Advocated by SL linguists and growing Deaf
- Not currently machine readable
14Worked Example
- Data-driven Machine Translation for Sign
Languages (Morrissey, 2008) - MaTrEx MT system
- Glossed Annotations of Irish Sign Language (ISL)
and German Sign Language (DGS) - Air Traffic Information System corpus of 600
sentences - Translated and signed by native Deaf signers
15Hand-crafted gloss annotation corpus
16Translation Directions
17MaTrEx Experiments
- ISL gloss-to-English text
- Baseline
- SMT
- EBMT 1
- EBMT 2
- Distortion limit
18ISL-EN MaTrEx Experiments
19EN-ISL MaTrEx Experiments
20Other experiments
- ISL?DE, DGS?DE, DGS?EN
- ISL? EN best scores, by 6.38 BLEU
- EBMT 1 chunks improves for ISL-DE only
- EBMT 2 chunks improves for ISL-DE only
- DE?ISL, DE?DGS, EN?DGS
- EN?DGS best scores, by 1.3 BLEU
- EBMT 1 chunks improves for EN?DGS EN?ISL
- EBMT 2 chunks improves for all
- Comparison with RWTH system
- Were better! JC 2-6 BLEU
- ISL video recognition
- Speech output
21ISL Animation
- Poser software
- Hand-crafted 66 videos, 50 sentences
- Played in sequence
- 4 Deaf evaluators
- 2 x 4-point scale
- 82 - intelligibility
- 72 - fidelity
- Questionnaire
Demo
22Thesis Conclusions
- Good results can be obtained
- Glossing most appropriate, but not going forward
- Allowed linguistic-based alignment
- Linear, easily accessible format
- Lack of NMF detail, time-consuming, not
considered adequate representation of language - EBMT chunks show potential but require more
development - Development of animation module
23Where do we go from here?(the words are coming
out all weird)
- What is the most appropriate SL representation
for MT? - Adequately represents the language,
- Animation production,
- Facilitates the translation process.
24Repn overview, redux
- Glossing machine readable, doesnt adequately
represent the language or facilitate animation - Stokoe ASCII version, not adequate repn
- Partition/Constitute multi-layered, uses glosses
- Movement-Hold multi-layered, uses glosses
- Sign Writing compact icons, accepted, potential
readability, not machine readable at present -
- HamNoSys SiGML machine readable, comprehensive
description, adapted for animation, suited to SMT
25The Future
- Explore HamNoSys in practice
- MT in medical domain, Health Ireland Partner GP
work group questionnaire - Human Factors
- Minority Language MT
26Thank you for listening
Yep, its the end!
I hope it wasnt too long
Any questions?