Title: Developing Spoken Dialogue Systems in the Communicator / RavenClaw Framework
1Developing Spoken Dialogue Systems in the
Communicator / RavenClaw Framework
- Sphinx Lunch Talk
- Carnegie Mellon University, October 2004
- Presented by Dan Bohus
- Special appearances Antoine Raux,
- Jahanzeb Sherwani,
- Thomas Harris
2Examples
- RoomLine
- conference room reservations within SCS system
can access schedules of 13 conf rooms in
Wean-Hall and NSH - Lets Go! Bus Information System
- bus schedule information system for Port
Authority buses in Oakland and Squirrel Hill
Lets Go! Project - Sublime
- personalized information management system
- TeamTalk
- an investigation into human and multi-robot
spoken language communication in unstructured
environments
3Examples
- RoomLine
- conference room reservations within SCS system
can access schedules of 13 conf rooms in
Wean-Hall and NSH - Lets Go! Bus Information System
- bus schedule information system for Port
Authority buses in Oakland and Squirrel Hill
Lets Go! Project - Sublime
- personalized information management system
- TeamTalk
- an investigation into human and multi-robot
spoken language communication in unstructured
environments
4Examples
- RoomLine
- conference room reservations within SCS system
can access schedules of 13 conf rooms in
Wean-Hall and NSH - Lets Go! Bus Information System
- bus schedule information system for Port
Authority buses in Oakland and Squirrel Hill
Lets Go! Project - Sublime
- personalized information management system
- TeamTalk
- an investigation into human and multi-robot
spoken language communication in unstructured
environments
5Examples
- RoomLine
- conference room reservations within SCS system
can access schedules of 13 conf rooms in
Wean-Hall and NSH - Lets Go! Bus Information System
- bus schedule information system for Port
Authority buses in Oakland and Squirrel Hill
Lets Go! Project - Sublime
- personalized information management system
- TeamTalk
- an investigation into human and multi-robot
spoken language communication in unstructured
environments
6More Systems
- LARRI
- multimodal system that assists F/A-18 aircraft
maintenance personnel throughout the execution of
procedural tasks Symphony - Madeleine
- text-based prototype for medical diagnosis system
MITRE workshop - Eureka
- dialogue interface to the Vivisimo web search
engine -
7The Communicator / RavenClaw Spoken Dialogue
Systems Framework
- Examples
- Overall Architecture
- System Development
- Components Resources
- Miscellaneous
- Current Research
examples architecture development
components miscellaneous research
8Overall Architecture
- Classical pipeline architecture
Lang. Understand. PHOENIX/HELIOS
Dialog Manag. RAVENCLAW
Back-end (various)
Lang. Generation ROSETTA
examples architecture development
components miscellaneous research
9Galaxy HUB
- Generic centralized, message-passing
communication architecture - Developed at MIT, used in Communicator program
- Competitor OAA
Lang. Understand. PHOENIX/HELIOS
Recognition SPHINX
Dialog Manag. RAVENCLAW
Back-end (various)
HUB
Galaxy
Lang. Generation ROSETTA
Synthesis THETA
examples architecture development
components miscellaneous research
10Getting Even Closer
Lang. Understand. PHOENIX/HELIOS
Recognition SPHINX
Dialog Manag. RAVENCLAW
Back-end (perl)
HUB
Language Gen. ROSETTA
Synthesis THETA
examples architecture development
components miscellaneous research
11Getting Even Closer
Multiple, parallel decoders
SPHINX
SPHINX
SPHINX
Recognition Server
Dialog Manag. RAVENCLAW
Back-end (perl)
HUB
Lang. Generation ROSETTA
Synthesis THETA
examples architecture development
components miscellaneous research
12The Communicator / RavenClaw Spoken Dialogue
Systems Framework
- Examples
- Overall Architecture
- System Development
- Components Resources
- Miscellaneous
examples architecture development
components miscellaneous research
13Building a Spoken Dialogue System
Language, Acoustic, Lexical Models
Grammar
Lang. Understand. PHOENIX/HELIOS
Dialog Manag. RAVENCLAW
Back-end (perl)
Back-end (perl)
RavenClawDialogTaskSpecification
Lang. Generation ROSETTA
Templates
(Limited Domain) Voice
examples architecture development
components miscellaneous research
14So How Long Will It Take?
- MITRE Workshop on Dialogue Management (Fall
2003) - Develop a Text-based SDS formedical diagnosis
(provided backend) - Madeleine (22 hours)
Language, Acoustic, Lexical Models
Grammar
Lang. Understand. PHOENIX/HELIOS
Dialog Manag. RAVENCLAW
Back-end (perl)
Back-end (perl)
RavenClawDialogTaskSpecification
Lang. Generation ROSETTA
(Limited Domain) Voice
Templates
examples architecture development
components miscellaneous research
15Okay, How Long Will It Really Take?
- To get a system running with a reasonable
performance poll amongst 3 RavenClaw developers - 1 month to get a working system up and running
- 1 month to fine-tune performance
- Further iterative improvements will continue as
more data accumulates
examples architecture development
components miscellaneous research
16The Communicator / RavenClaw Spoken Dialogue
Systems Framework
- Examples
- Overall Architecture
- System Development
- Components Resources
- Miscellaneous
examples architecture development
components miscellaneous research
17Components Resources
Language, Acoustic Models
Grammar
Lang. Understand. PHOENIX/HELIOS
Dialog Manag. RAVENCLAW
Back-end (perl)
Back-end (perl)
RavenClawDialogTaskSpecification
Lang. Generation ROSETTA
Templates
Limited Domain Voice
examples architecture development
components miscellaneous research
18Components Resources
Language, Acoustic Models
Grammar
Lang. Understand. PHOENIX/HELIOS
Recognition SPHINX
Dialog Manag. RAVENCLAW
Back-end (perl)
Back-end (perl)
RavenClawDialogTaskSpecification
Lang. Generation ROSETTA
Synthesis THETA
Templates
Limited Domain Voice
examples architecture development
components miscellaneous research
19SPHINX II
- Semi-continuous acoustic models
- Off-the-shelf 8kHz, 11.025kHz, 16kHz models
- Scripts for building your own
- PLSA adapted models perform better
- Language models
- 2-gram 3-gram model
- CMU-Cambridge SLM Toolkit
- Generate from Phoenix Grammar
- Finite state grammar
- Sphinx supports state-specific LMs
- Dictionary (lexical models)
- CMU Dictionary
examples architecture development
components miscellaneous research
20Sphinx II - continued
- Multiple parallel decoders e.g., male female
- Multiple hypothesis forwarded, selection done
later - Typical WER 15-30
- With pronounced differences native vs. non-native
- Lowered by retuning acoustic and language models
to the domain - Migration to SPHINX 3.x in the near future
- Expected big improvement in WER
- Concern real-time performance
21Components Resources
Language, Acoustic Models
Grammar
Lang. Understand. PHOENIX/HELIOS
Dialog Manag. RAVENCLAW
Back-end (perl)
Back-end (perl)
RavenClawDialogTaskSpecification
Lang. Generation ROSETTA
Templates
Limited Domain Voice
examples architecture development
components miscellaneous research
22Phoenix Parser / Grammar
room_size_spec (rss_large) (rss_small)
(rss_larger) (rss_smaller)
(rss_smallest) (rss_largest) rss_large
(large) (big) (huge) rss_larger (the
larger) (the bigger) (too small) rss_large
st (the largest) (the biggest) rss_small
(small) (little)
- Phoenix Robust Parser
- CFG Grammar
- Manually-generated domain-specific grammar rules
- Reusable, generic sub-grammars
- Yes, No, Number, DateTime, Help,
Repeat, Suspend, etc
DO YOU HAVE SOMETHING A BIT LARGER? NeedRoom (
_i_want (DO YOU HAVE SOMETHING)
) RoomSizeSpec ( room_size_spec (
rss_larger (LARGER)))
- Parses all incoming hypotheses and passes all
parses along
examples architecture development
components miscellaneous research
23Helios / Confidence Annotation
- Builds accurate confidence scores using features
from 3 sources of knowledge - Speech recognition
- Language understanding
- Dialogue management
- Selects hypothesis with maximum confidence score
- Research in progress on hypothesis-selection, and
transferability across domains
examples architecture development
components miscellaneous research
24Components Resources
Language, Acoustic Models
Grammar
Lang. Understand. PHOENIX/HELIOS
Dialog Manag. RAVENCLAW
Back-end (perl)
Back-end (perl)
RavenClawDialogTaskSpecification
Lang. Generation ROSETTA
Templates
Limited Domain Voice
examples architecture development
components miscellaneous research
25RavenClaw Architecture
- Captures all domain-specific dialog (task) logic
using a hierarchical description - The authoring effort is focused entirely here
Dialog Task (Specification)
Domain-independent Dialog Engine
- Manages dialog by executing the dialog task
specification - Provides a large number of domain-independent
conversational strategies
examples architecture development
components miscellaneous research
26RavenClaw Architecture
- Captures all domain-specific dialog (task) logic
with a hierarchical description - The authoring effort is focused entirely here
Dialog Task (Specification)
Domain-independent Dialog Engine
- Manages dialog by executing the dialog task
specification - Provides a large number of domain-independent
conversational strategies
examples architecture development
components miscellaneous research
27RavenClaw Dialogue Task Specification
Madeleine
ELoadSymptoms
GeneralFeel
Diagnose
IWelcome
RHowAreYou?
IGlad
ISorry
Fever
Travel
RAskFever
EMeasureTemp
IInformFever
- Tree of dialog agents
- Terminals Inform, Request, Expect, Execute
- Non-terminals / Dialog agency plans execution of
child nodes - Basically a Hierarchical Task Execution Network
each agent - Preconditions effects
- Success failure criteria
- Trigger (focus) criteria
- Effects
examples architecture development
components miscellaneous research
28Sample DTS Code
GeneralFeel
RHowAreYou?
IGlad
ISorry
- // /Madeleine/GeneralFeel
- DEFINE_AGENCY(CGeneralFeel,
- DEFINE_CONCEPTS(
- STRING_USER_CONCEPT(general_feeling,
none)) - DEFINE_SUBAGENTS(
- SUBAGENT(HowAreYou, CHowAreYou)
- SUBAGENT(Glad, CGlad)
- SUBAGENT(Sorry, CSorry))
- SUCCEEDS_WHEN(COMPLETED(Glad)
COMPLETED(Sorry))) - // /Madeleine/GeneralFeel/HowAreYou
- DEFINE_REQUEST_AGENT(CHowAreYou,
- REQUEST_CONCEPT(general_feeling)
- GRAMMAR_MAPPING("!Yesgtgood,
!FeelingGoodgtgood, " - "!FeelingSoSogtsoso,
!FeelingBadgtbad"))) - // /Madeleine/GeneralFeel/Glad
- DEFINE_INFORM_AGENT(CGlad,
- PRECONDITION(C("general_feeling")
CString("good"))
examples architecture development
components miscellaneous research
29RavenClaw Execution
Madeleine
ELoadSymptoms
GeneralFeel
Diagnose
IWelcome
RHowAreYou?
IGlad
ISorry
Fever
Travel
RAskFever
EMeasureTemp
IInformFever
Dialog Stack
Expectation Agenda
examples architecture development
components miscellaneous research
30RavenClaw Execution
Madeleine
ELoadSymptoms
GeneralFeel
Diagnose
IWelcome
RHowAreYou?
IGlad
ISorry
Fever
Travel
RAskFever
EMeasureTemp
IInformFever
Dialog Stack
Expectation Agenda
Madeleine
examples architecture development
components miscellaneous research
31RavenClaw Execution
Madeleine
ELoadSymptoms
GeneralFeel
Diagnose
IWelcome
RHowAreYou?
IGlad
ISorry
Fever
Travel
RAskFever
EMeasureTemp
IInformFever
Dialog Stack
Expectation Agenda
Welcome
Madeleine
examples architecture development
components miscellaneous research
32RavenClaw Execution
Madeleine
ELoadSymptoms
GeneralFeel
Diagnose
IWelcome
RHowAreYou?
IGlad
ISorry
Fever
Travel
RAskFever
EMeasureTemp
IInformFever
Dialog Stack
Expectation Agenda
Hi, this is Madeleine, the automated
Madeleine
examples architecture development
components miscellaneous research
33RavenClaw Execution
Madeleine
ELoadSymptoms
GeneralFeel
Diagnose
IWelcome
RHowAreYou?
IGlad
ISorry
Fever
Travel
RHeadache
R
R
R
RAskFever
EMeasureTemp
IInformFever
Dialog Stack
Expectation Agenda
Hi, this is Madeleine, the automated
LoadSymptoms
Madeleine
examples architecture development
components miscellaneous research
34RavenClaw Execution
Madeleine
ELoadSymptoms
GeneralFeel
Diagnose
IWelcome
RHowAreYou?
IGlad
ISorry
Fever
Travel
RHeadache
R
R
R
RAskFever
EMeasureTemp
IInformFever
Dialog Stack
Expectation Agenda
Hi, this is Madeleine, the automated
Madeleine
examples architecture development
components miscellaneous research
35RavenClaw Execution
Madeleine
ELoadSymptoms
GeneralFeel
Diagnose
IWelcome
RHowAreYou?
IGlad
ISorry
Fever
Travel
RHeadache
R
R
R
RAskFever
EMeasureTemp
IInformFever
Dialog Stack
Expectation Agenda
Hi, this is Madeleine, the automated
GeneralFeel
Madeleine
examples architecture development
components miscellaneous research
36RavenClaw Execution / Input Pass
Madeleine
ELoadSymptoms
GeneralFeel
Diagnose
IWelcome
GeneralFeel
RHowAreYou?
IGlad
ISorry
Fever
Travel
RHeadache
R
R
R
IGlad
ISorry
RAskFever
EMeasureTemp
IInformFever
Dialog Stack
Expectation Agenda
Hi, this is Madeleine, the automated
general_feeling good, bad, soso
How are you feeling today?
general_feeling good, bad, soso
Not so good, I think I have a fever
general_feeling good, bad,
sosohave_fever fever. !yes,
!noheadache headache, !yes, !nocough
cough, !yes, !no
soso(not so good)fever(I think I have a
fever)
HowAreYou
GeneralFeel
GeneralFeel
Madeleine
examples architecture development
components miscellaneous research
37RavenClaw Execution
Madeleine
ELoadSymptoms
GeneralFeel
Diagnose
IWelcome
RHowAreYou?
IGlad
ISorry
Fever
Travel
RHeadache
R
R
R
RAskFever
EMeasureTemp
IInformFever
Dialog Stack
Expectation Agenda
Hi, this is Madeleine, the automated
How are you feeling today?
Not so good, I think I have a fever
soso(not so good)fever(I think I have a
fever)
GeneralFeel
Madeleine
examples architecture development
components miscellaneous research
38RavenClaw Execution
Madeleine
ELoadSymptoms
GeneralFeel
Diagnose
IWelcome
RHowAreYou?
IGlad
ISorry
Fever
Travel
RHeadache
R
R
R
RAskFever
EMeasureTemp
IInformFever
Dialog Stack
Expectation Agenda
Hi, this is Madeleine, the automated
How are you feeling today?
Not so good, I think I have a fever
soso(not so good)fever(I think I have a
fever)
Sorry
GeneralFeel
Oh, Im sorry to hear that
Let me take your temperature
Madeleine
examples architecture development
components miscellaneous research
39RavenClaw Other features
- Dialogue Engine transparently provides a set of
conversational skills - Universal dialogue mechanisms
- Repeat, Suspend / Resume, Quit
- Help
- Help!, Where are we?, What can I say?
- Error handling
- Explicit and implicit confirmations
- Strategies for recovering from non-understandings
- Dynamic dialogue task generation
- Dynamic dialogue control policy
40Components Resources
Language, Acoustic Models
Grammar
Lang. Understand. PHOENIX/HELIOS
Recognition SPHINX
Dialog Manag. RAVENCLAW
Back-end (perl)
Back-end (perl)
RavenClawDialogTaskSpecification
Lang. Generation ROSETTA
Synthesis THETA
Templates
Limited Domain Voice
examples architecture development
components miscellaneous research
41Backend Domain Agents
- Various problem-specific solutions
- RoomLine
- Connects to a static Perl database or to the CMU
CorporateTime server - Lets Go! Bus Information system
- Connects to a PostGRES database
- Sublime
- Connects to a MySQL database also functions as a
web-server DTW search domain agent - Basically, build your own we provide a stub for
interfacing with the Galaxy-Hub
examples architecture development
components miscellaneous research
42Components Resources
Language, Acoustic Models
Grammar
Lang. Understand. PHOENIX/HELIOS
Recognition SPHINX
Dialog Manag. RAVENCLAW
Back-end (perl)
Back-end (perl)
RavenClawDialogTaskSpecification
Lang. Generation ROSETTA
Synthesis THETA
Templates
Limited Domain Voice
examples architecture development
components miscellaneous research
43Rosetta Language Generation
- Template- and stochastic-based language
generation - Input (act, object, slotvalue)
- Output text (tagged with concepts)
welcome to the system welcome gt Welcome
to RoomLine, the automated conference room .
reservation system., greet user
greet_user gt (Hi, ltuser_namegt.,
Hi, ltuser_namegt, good to hear from you
again.), inform the user that the system has
misunderstood the times (order)
wrong_time_order gt sub my args _at__
my time_interval_as_string
get_wrong_time_interval_as_string(\args, ro
om_query.date_time.time) my answer
I'm sorry, I must have misunderstood the .
time you needed the room. answer
. I heard time_interval_as_string.
return answer So, let's see ... ,
answer So, let's try this again ... ,
answer So, let's try this once more ...
,
examples architecture development
components miscellaneous research
44Components Resources
Language, Acoustic Models
Grammar
Lang. Understand. PHOENIX/HELIOS
Recognition SPHINX
Dialog Manag. RAVENCLAW
Back-end (perl)
Back-end (perl)
RavenClawDialogTaskSpecification
Lang. Generation ROSETTA
Synthesis THETA
Templates
Limited Domain Voice
examples architecture development
components miscellaneous research
45Synthesis
- Cepstral Theta synthesis
- Open-domain unit-selection synthesis
- SSML tags
- Currently working on barge-in location
- Festival synthesis
- Diphone synthesis Open-domain, Limited-domain
unit-selection synthesis - SABLE tags
- Server running separately on a Linux box
examples architecture development
components miscellaneous research
46The Communicator / RavenClaw Spoken Dialogue
Systems Framework
- Examples
- Overall Architecture
- System Development
- Components Resources
- Miscellaneous
- Current Research
examples architecture development
components miscellaneous research
47Miscellaneous Documentation
- Transmitted largely by oral tradition )
- A bit of documentation available
- Research papers, slides
- WIKI http//hap.speech.cs.cmu.edu/commwiki
- mostly for developers, postings of updates,
recent developments - hopefully more introductory materials soon.
- More under work
- Tutorials 2 available, but a bit outdated
examples architecture development
components miscellaneous research
48Miscellaneous Portability
- Current systems work on PC Windows platforms
- Galaxy has Linux version
- Components are C, C, (Visual Studio 6.0, Visual
Studio.NET), Perl - How about using different input / output
components? - Modify RavenClaw DMInterface class
- Has been done for the Gemini parser / language
generator
examples architecture development
components miscellaneous research
49Miscellaneous Research Platform
- Communicator / RavenClaw framework is a research
platform! - Constantly evolving
- Modular
- Easy to change, develop and test new technologies
- Research on variety of topics in a real-world,
full-blown system - Recognition, Language understanding, Dialogue
management, Language generation, Synthesis - Your work can be evaluated / reused easily across
multiple existing systems
examples architecture development
components miscellaneous research
50Miscellaneous - Download
- www.cs.cmu.edu/dbohus/RavenClaw
- Download a version of RoomLine
- An installation script can seed your own project
from this RoomLine version
examples architecture development
components miscellaneous research
51Miscellaneous RavenClaw Team
- RavenClaw Team
- Dan Bohus (dbohus_at_cs)
- Antoine Raux (antoine_at_cs)
- Jahanzeb Sherwani (jsherwan_at_cs)
- Thomas Harris (tkharris_at_cs)
- Satanjeev Banerjee (satanjeev_at_cs)
- Brian Langner (blangner_at_cs)
- More users / developers / documentation writers
are always welcome!! - Dialogs on Dialogs Reading Group
- www.cs.cmu.edu/dod
examples architecture development
components miscellaneous research
52The Communicator / RavenClaw Spoken Dialogue
Systems Framework
- Examples
- Overall Architecture
- System Development
- Components Resources
- Miscellaneous
- Current Research
examples architecture development
components miscellaneous research
53Error awareness and recovery
- Problem lack of robustness when faced with
understanding errors - Solution build mechanisms for acting robustly at
the dialogue management level - Error awareness
- Building better confidence annotators, hypothesis
selection transference across domains - Error recovery strategies
- Recovery from non-understandings
- Error handling decision process
- Scalable, adaptable, task-independent
architecture for making error handling decisions
examples architecture development
components miscellaneous research
54Lets Go! Research
- Speech Recognition acoustic adaptation on
non-native speech WER 50 ? 30 - Speech Synthesis flexible and natural F0
modeling (F0 unit selection) Emphasis on
erroneous/uncertain words for utterance
confirmation
examples architecture development
components miscellaneous research
55Sublime
- Interface for personalized information management
- Narrow functionality in unrestricted domains
- Currently, handle information without
understanding it - Eventually, learn relationships and a shallow
ontology
examples architecture development
components miscellaneous research
56Thats all, folks!