SCILL: Spoken Conversational Interaction for Language Learning - PowerPoint PPT Presentation

About This Presentation
Title:

SCILL: Spoken Conversational Interaction for Language Learning

Description:

Hub. Galaxy. Architecture. Language. Generation. Speech. Recognition ... Spanish: Algunas tormentas posiblement acompanadas por vientos racheados y granizo ... – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 23
Provided by: sen7
Category:

less

Transcript and Presenter's Notes

Title: SCILL: Spoken Conversational Interaction for Language Learning


1
SCILL Spoken Conversational Interaction for
Language Learning
  • Stephanie Seneff (seneff_at_csail.mit.edu)
  • Jim Glass (jrg_at_csail.mit.edu)
  • Spoken Language Systems Group
  • MIT Computer Science and Artificial Intelligence
    Lab
  • Steve Young (sjy_at_eng.cam.ac.uk)
  • Speech Group
  • CUED Machine Intelligence Lab

2
Conversational Interfaces
3
Conversational Interfaces
Language Generation
Speech Synthesis
Dialogue Management
Audio
Database
Speech Recognition
Context Resolution
Language Understanding
4
Bilingual Weather Domain Video Clip
5
Computer Aids through Conversational Interaction
  • Language teachers have limited time to interact
    with students in dialogue exchanges
  • Computers provide non-threatening environment in
    which to practice communicating
  • Three-phase interaction framework is envisioned
  • Preparation practice phrases, simulated
    dialogues
  • Conversational Interaction
  • Telephone conversation with graphical support
  • Seamless translation aid
  • Assessment
  • Review dialog interaction
  • Feedback and fluency scores

6
SCILL A Spoken Computer Interface for Language
Learning
Conversational systems for interactive
environment for language learning
Can provide translations for both user queries
and system responses
  • Speaks only target language.
  • Has access to information sources.

7
Technology Requirements
  • Robust recognition and understanding of
    foreign-accented speech
  • If recognition is too poor, student may become
    frustrated
  • Customize vocabulary and linguistic constructs to
    lesson plans
  • High quality cross-lingual language generation
  • Natural and fluent speech synthesis
  • Ability to automatically generate simulated
    dialogues
  • System should be able to generate multiple
    dialogues based on a given lesson
    topic on the fly
  • Allows the student to see example sentence
    constructs for a particular lesson
  • Ability to reconfigure quickly and easily to new
    lessons
  • Automatic scoring for fluency, pronunciation,
    tone quality, use of vocabulary, etc.

8
SCILL System Overview
9
Bilingual Spoken Dialogue Interaction Current
Status
  • Initial version of end-to-end system is in place
    for the weather domain
  • Rain, snow, wind, temperature, warnings (e.g.,
    tornado), etc.
  • MIT Recognizer supports both English and Mandarin
  • Seamless language switching
  • English queries are translated into Mandarin
  • Mandarin queries are answered in Mandarin
  • User can ask for a translation into English of
    the response at any time
  • Currently using off-the-shelf Mandarin
    synthesizer from ITRI
  • Plan to develop high quality domain-dependent
    Mandarin synthesis using our Envoice tools
  • System can be configured as telephone-only or as
    telephone augmented with a Web-based GUI
    interface

10
Bilingual Recognizer Construction
English corpus
Create Mandarin corpus by automatically
translating existing English corpus
Automatically induce language model for both
English and Mandarin recognizers using NL grammar

Two recognizers compete in common search space
11
HTK Mandarin Speech Recognizer
  • Except
  • Standard PLP front-end augmented with
    F0derivatives (F0 added after HLDA
    transformation)
  • 46 phone acoustic model set with long final
    phones split eg uang -gt ua ng
  • Questions about tone added to decision tree
    context clustering

12
HMM-Based Pronunciation Scoring
  • Basic approach
  • estimate posterior probabilities (ie confidence
    score) of each phone or syllable given acoustics
  • map confidence scores to good/bad decision using
    data labelled by experts

13
Multilingual Translation Framework
  • Common meaning representation semantic frame

14
Content Understanding and Translation
English Some thunderstorms may be accompanied by
gusty winds and hail
Frame indexed under weather, wind, rain, storm,
and hail
15
Audio Demonstration
  • User asks Will it rain tomorrow in Boston?
  • System paraphrases query, then responds in
    Chinese
  • Please repeat that in English or Chinese
    interpreted identically
  • System repeats response in Chinese
  • User speaks query in English seamless language
    switching
  • System paraphrases, then translates query into
    Chinese
  • User attempts to repeat translation
  • Recognition error hallucinates an erroneous
    date (February 30) which will be remembered
  • System supplies known cities in England
  • User chooses London
  • System has no weather for London on February 30
  • User asks how about today?
  • System provides Londons weather today
  • User asks for a translation into English, which
    is provided

16
Proposed Translation Procedure
c wh_question topic q name
poss you auxil link complement
q object trace what
c wh_question topic q name pro
you verb call complement q object
trace what
If generated query fails to parse, simplify
interlingua and generation
what is your name
ni3 jiao4 shen2_me5 ming2_zi4
17
Proposed Exercise using Typed Inputs
Input Da2 la2 si4 hui4 xia4 yu3 ming2 tian1 ma5?
System is able to parse query in spite of tone
errors and (limited) syntax errors
Next Los Angeles wind
Saturday
Next Dallas rain
tomorrow
Query Da2 la1 si1 ming2 tian1 hui4 xia4
yu3 ma5?
System color codes errors in tone and in
syntactic constructs
Response Da2 la1 si1 ming2 tian1 xia4 wu3 xia4
te4 da4 yu3
18
Testing the Effectiveness of Training on Typed
Input Proposed Measures
  • Compare the quality of spoken dialogue recorded
    before and after a Web-based training session
  • Measures of fluency
  • Syntactic well-formedness
  • Tone production accuracy
  • Frequency of pauses, edits, and filler words
  • Phonetic quality , etc.
  • Measures of communication success
  • Frequency of usage of translation assistance
  • Understanding error rate
  • Task completion
  • Time to completion, etc.

19
Technology Goal
Automated Language Understanding
Once translation ability exists from English to
target language, can create reverse system almost
effortlessly
English Sentence
Corpus Pairs
Utilizes English parse tree and Mandarin
generation lexicon to induce Mandarin parse tree
20
Building NxN Translation Efficiently
Japanese
Mandarin
Arabic
French
Spanish
Urdu
Korean
Automatic Grammar Induction
21
Future Plans (Near Term and Long Term)
  • Install current version of system at Cambridge
    University
  • Incorporate CU Mandarin recognizer
  • Add support for audio input at the computer
  • Build high quality synthesis capability
  • Improve understanding, dialogue, and translation
    performance
  • Collect and transcribe data from language
    learners and assess both system and students
  • Develop various scoring algorithms for student
    fluency
  • Refine all aspects of system based on collected
    data

22
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com