Title: SIMONE: Spoken Interaction for Mobile Networked Ecosystems
1SIMONE Spoken Interaction for Mobile Networked
Ecosystems
NRC Cambridge MIT CSAIL Spoken Language
Systems October 2, 2007
2The Premise
- Small devices need speech
- Current interfaces are challenged
- Spoken language is natural and efficient
Cancel my Thursday meeting with Tom
- Dialogue is the crucial element
- Interaction is more than recognition
- Understanding, dialogue and generation must be
incorporated
Play another song by that group
Find the pictures I took at Michelles wedding
3Project Summary
- Spoken dialogue to simplify the mobile device
interface - To structured information (e.g., calendar)
- To loosely structured data (e.g., photos)
- Technology requirements
- Portability (e.g., applications, platforms)
- Personalization (e.g., adapting to the user)
- Flexibility (e.g., open-ended input/retrieval)
- Multilinguality
Language Generation
Dialogue Planning
Speech Synthesis
Speech Recognition
Context Resolution
Language Understanding
4Outline
Spoken Access to Applications
Content Annotation and Retrieval
Small Platforms
5Spoken Access to Applications
- Personalized vocabularies
- Data collection platform
- Portability developments
6Personalized Vocabularies
Events
Dynamic Classes
Recognition
Contacts
Understanding
7Example Dialogue
May 23
May 26
May 24
11-12
1-2
2-3
2-4
- Spoken language technology capabilities
- Speaker-independent speech understanding
- Speech generation to support display
- Dialogue support for complex queries
- Confirmation sub-dialogues
- Negotiation for conflict resolution
- Support for anaphoric references (e.g., this
meeting)
8Content Annotation and Retrieval
- Flexible understanding
- Data collection platform
9Speech-based Photo Tagging Retrieval
Julia with Pluto at Disney World
Creating
Finding
Show me the photo of Julia and Pluto at Disney
World from December of 2006.
10Photo Tagger/Browser Architecture
Verbal Annotation
Speech Hypotheses
AnnotationRecognizer
Annotation Indexer
Photo plus meta-data (date taken, owner, etc.)
Term Index
Meta Data
List of Photos
Annotation Terms
Meta-Data Terms
Query Recognizer
Spoken Query
11Small Platforms
- Speech recognition
- N800 infrastructure
- Future plans
12Small Platform Development
- We are migrating our Galaxy spoken dialog
components from x86 workstations to small devices
such as the N800
Recognition
Understanding
Generation
Synthesis
Audio
Dialogue
XML-RPC
Galaxy Proxy
Galaxy Proxy
- Current progress
- Proxies on workstation and N800 support hybrid
dialogue systems - Access to streamed audio for recording and
playback on N800 - Integrated small-platform speech recognizer
- Other Galaxy messages accessible via event-based
interface - Debian packages and Python wrappers support
application development - Prototype Weather forecasts with local speech
recognition
Demo
13Small Platform Development Next Steps
- Plan to port understanding and generation
components - Leverage Nokia speech synthesis development
effort if possible
Understanding
Generation
Recognition
Synthesis
Audio
Dialogue
XML-RPC
Galaxy Proxy
Galaxy Proxy
- Understanding involves parsing and semantic frame
creation
14The Next Steps
- Spoken dialogue is a viable modality for mobile
devices - A natural and efficient means of communication
for small devices - Many possible applications on either the local
device or via network - Many resources needed to transfer technology
- Technology development and multilingual support
- User interface developers and application
integrators
15