Title: Parsing and Analysis for SpeechtoSpeech MT
1Parsing and Analysisfor Speech-to-Speech MT
- Discussion Leader Alon Lavie
- Carnegie Mellon University
2Main Issues
- Target Representation
- Robustness Issues
- Speech disfluencies and spontaneity
- Dealing with output from Speech Recognizers
- Portability and Adaptation Issues
- Domain portability
- Language portability
- Language diversity Issues
3What is the Target Representation?
- Different MT approaches require different
analysis representations - Interlingua deep and detailed vs. shallow
- Transfer approaches syntactic representation
- Some MT approaches dont require real parsing
at all - Statistical MT
- Example-based MT
4Robustness Issues
- Speech disfluencies and spontaneity
- Spoken language is very different than written
text short turns, back-channels, looser notion
of grammar, fixed expressions/constructions,
elided words and constituents - (wanna go eat? gt Do you want to go to eat?)
- Disfluencies are common
- Filled pauses uh, um, you know
- False starts and repairs I got to I think I
have to be there at - Dealing with output of Speech Recognizers
- High word error rates
- Lack of punctuation and sentence boundaries
5Robustness Issues
- How do we build parsers/analyzers that can deal
with such language? - Focus more on semantics rather than syntax
- Focus on language properties of the domain (fixed
expressions, task oriented language) - Clean-up the language
- Detect and remove disfluencies
- Detect SDUs (semantic dialogue units)
- Use prosodic information pauses, intonation
6Portability Issues
- Can we develop parsing/analysis methods that can
be rapidly adapted to new languages and new
domains? - Some trends
- Focus on machine learning and trainable
approaches data-driven MT approaches, grammar
induction, transfer-rule learning - Multi-engine systems that can adapt to the
resources that are available
7Language Diversity Issues
- Different languages have very different
characteristics - how do we deal with the
diversity of phenomena? - Morphologically synthetic to analytic
- word order free to fixed
- diverse dialects within languages
- Some approaches more portable in nature, less
sensitive to these differences - Related to language and domain portability
8Example CMUs JANUS
- Target Representation shallow task-oriented
interlingua representation - I would like to take a vacation in
val-di-fiemme" - cgive-informationdispositiontrip
- (disposition(whoi, desire),
- visit-spec(identifiabilityno,vacation),
- location(place-nameval_di_fiemme))
- Hybrid Statistical/Rule-based analysis approach
- Semantic phrases are parsed using a robust parser
- Trainable classifier used for selection of domain
action
9Example CMUs MilliRADD Rapidly Adaptable Data
Driven MT With Limited Resources
EBMT
The reporter interviewed the ambassador in
America.
?????? ?????
Multi-Engine Integration
SMT
iRBMT
NBNB PP DE" NB -gt NB PP ((X3Y1)
Alignments (X1Y2) ((X0 ppadj) X1) X-side
constraints (X0 X3) (Y0 X0) Transfer
constraints (x2 (x0 ppadj)) Generation
constraints (x1 x0)) PPPP ZAI" NP PREP
-gt PREP NP ((X3Y1) Alignments (X2Y2)
((X3 loc) c ) X-side constraints ((X0 obj)
X2) (X0 X3) (Y0 X0) Transfer constraints
(X2 (x0 obj)) Generation constraints (x1
x0))
Automatic Learning of Xfer Rules
10Example Parsing the CHILDES Database
- Goal syntactic annotation of transcribed
conversations of children and their parents - Target Representation complete syntactic
feature-structures - MOT you kicked it .
- mor proyou vkick-PAST proit .
- fst ((mood declarative) (tense past) (index
2) - (subject ((cat pro) (num sg) (pers 2) (case
nom) - (index 1) (root you)))
- (object ((cat pro) (sum sg) (pers 3) (case
acc) - (index 3) (root it)))
- (root kick) (cat v))
-
- cst (sentence (decl (np (pro you))
- (vp (vbar (v kicked)
- (np (pro it)))))
- (period .))
- Analysis Approach multi-pass robust parser
(LCFlex) that gradually relaxes constraints - Sagae, Lavie MacWhinney, IWPT-01
11Discussion Topics
- Sharing of available data resources and
components - Robust parsers
- Annotated data and bilingual corpora
- Coding schemes interlingua, other,
12Discussion Topics
- Approaches to Parsing/Analysis
- Focus on specific approaches or support a wide
variety of different approaches? - Common target representations?
- Example the C-STAR model common interlingua
representation, but independent analysis
approaches - Multi-Engine systems how do we put together an
effective combined system? - Models of collaboration
13Discussion Topics
- Language and Domain Portability
- How do we encourage approaches that are more
suited for fast adaptation to new languages and
domains?