Parsing and Analysis for SpeechtoSpeech MT - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Parsing and Analysis for SpeechtoSpeech MT

Description:

Disfluencies are common: Filled pauses: uh, um, 'you know' False starts and repairs: 'I got to... Common target representations? ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 14
Provided by: AlonL
Category:

less

Transcript and Presenter's Notes

Title: Parsing and Analysis for SpeechtoSpeech MT


1
Parsing and Analysisfor Speech-to-Speech MT
  • Discussion Leader Alon Lavie
  • Carnegie Mellon University

2
Main Issues
  • Target Representation
  • Robustness Issues
  • Speech disfluencies and spontaneity
  • Dealing with output from Speech Recognizers
  • Portability and Adaptation Issues
  • Domain portability
  • Language portability
  • Language diversity Issues

3
What is the Target Representation?
  • Different MT approaches require different
    analysis representations
  • Interlingua deep and detailed vs. shallow
  • Transfer approaches syntactic representation
  • Some MT approaches dont require real parsing
    at all
  • Statistical MT
  • Example-based MT

4
Robustness Issues
  • Speech disfluencies and spontaneity
  • Spoken language is very different than written
    text short turns, back-channels, looser notion
    of grammar, fixed expressions/constructions,
    elided words and constituents
  • (wanna go eat? gt Do you want to go to eat?)
  • Disfluencies are common
  • Filled pauses uh, um, you know
  • False starts and repairs I got to I think I
    have to be there at
  • Dealing with output of Speech Recognizers
  • High word error rates
  • Lack of punctuation and sentence boundaries

5
Robustness Issues
  • How do we build parsers/analyzers that can deal
    with such language?
  • Focus more on semantics rather than syntax
  • Focus on language properties of the domain (fixed
    expressions, task oriented language)
  • Clean-up the language
  • Detect and remove disfluencies
  • Detect SDUs (semantic dialogue units)
  • Use prosodic information pauses, intonation

6
Portability Issues
  • Can we develop parsing/analysis methods that can
    be rapidly adapted to new languages and new
    domains?
  • Some trends
  • Focus on machine learning and trainable
    approaches data-driven MT approaches, grammar
    induction, transfer-rule learning
  • Multi-engine systems that can adapt to the
    resources that are available

7
Language Diversity Issues
  • Different languages have very different
    characteristics - how do we deal with the
    diversity of phenomena?
  • Morphologically synthetic to analytic
  • word order free to fixed
  • diverse dialects within languages
  • Some approaches more portable in nature, less
    sensitive to these differences
  • Related to language and domain portability

8
Example CMUs JANUS
  • Target Representation shallow task-oriented
    interlingua representation
  • I would like to take a vacation in
    val-di-fiemme" 
  • cgive-informationdispositiontrip
  • (disposition(whoi, desire),
  • visit-spec(identifiabilityno,vacation),
  • location(place-nameval_di_fiemme))
  • Hybrid Statistical/Rule-based analysis approach
  • Semantic phrases are parsed using a robust parser
  • Trainable classifier used for selection of domain
    action

9
Example CMUs MilliRADD Rapidly Adaptable Data
Driven MT With Limited Resources
EBMT
The reporter interviewed the ambassador in
America.
?????? ?????
Multi-Engine Integration
SMT
iRBMT
NBNB PP DE" NB -gt NB PP ((X3Y1)
Alignments (X1Y2) ((X0 ppadj) X1) X-side
constraints (X0 X3) (Y0 X0) Transfer
constraints (x2 (x0 ppadj)) Generation
constraints (x1 x0)) PPPP ZAI" NP PREP
-gt PREP NP ((X3Y1) Alignments (X2Y2)
((X3 loc) c ) X-side constraints ((X0 obj)
X2) (X0 X3) (Y0 X0) Transfer constraints
(X2 (x0 obj)) Generation constraints (x1
x0))
Automatic Learning of Xfer Rules
10
Example Parsing the CHILDES Database
  • Goal syntactic annotation of transcribed
    conversations of children and their parents
  • Target Representation complete syntactic
    feature-structures
  • MOT you kicked it . 
  • mor proyou vkick-PAST proit .
  • fst ((mood declarative) (tense past) (index
    2)
  • (subject ((cat pro) (num sg) (pers 2) (case
    nom)
  • (index 1) (root you)))
  • (object ((cat pro) (sum sg) (pers 3) (case
    acc)
  • (index 3) (root it)))
  • (root kick) (cat v))
  •  
  • cst (sentence (decl (np (pro you))
  • (vp (vbar (v kicked)
  • (np (pro it)))))
  • (period .))
  • Analysis Approach multi-pass robust parser
    (LCFlex) that gradually relaxes constraints
  • Sagae, Lavie MacWhinney, IWPT-01

11
Discussion Topics
  • Sharing of available data resources and
    components
  • Robust parsers
  • Annotated data and bilingual corpora
  • Coding schemes interlingua, other,

12
Discussion Topics
  • Approaches to Parsing/Analysis
  • Focus on specific approaches or support a wide
    variety of different approaches?
  • Common target representations?
  • Example the C-STAR model common interlingua
    representation, but independent analysis
    approaches
  • Multi-Engine systems how do we put together an
    effective combined system?
  • Models of collaboration

13
Discussion Topics
  • Language and Domain Portability
  • How do we encourage approaches that are more
    suited for fast adaptation to new languages and
    domains?
Write a Comment
User Comments (0)
About PowerShow.com