Title: Media Manager Mail Access Unified Messaging
1Media Manager Mail AccessUnified Messaging
- Barbara Hohlt
- UC Berkeley
- Ericsson Presentation
- August 22, 2000
2Messages from many sources
???
3Project Overview
- Make messages more accessible
- Get all types of messages
- Access from different devices with different
capabilities - Enable faster browsing of many voicemails
- Media Mail services
- A unified messaging infrastructure
- Voicemail is email encoded in MIME
- Transcoding services
- Enhance voicemail interaction
- Includes skimmed audio, transcript, text/audio
summary, and outline
4Related Work
- Universal Inboxes/Unified Messaging
- onebox.com
- CoolMail.net
- Lucent/Octel Unified Messenger
- Stanford Mobile People Architecture
- Audio Content Extraction Techniques
- SpeechSkimmer, MITs MultiMedia Lab Arons95
- Auto-Summarization, Microsoft Research
- CueVideo, IBM
5Architecture
6Applications
- Conventional GUIs
- Context-Aware Applications
- Iceberg Universal Inbox Component
A conventional desktop gui can contact the Media
Manager directly and request messages as text.
The Media Manager will return emails and
voicemails as text.
7Context-Aware Application
Redirection Proxy
8Iceberg Universal Inbox
9Architecture
Client
Client
Folder Store
Client
- Transcoder Service
- Voicemail-gtText Transcript
- Voicemail-gtText Summary
- Voicemail-gtText Outline
- Email -gtPlain Audio
- Email -. GSM Audio
- Voicemail -gt GSM Summary
- Voicemail-gtAudio Summary
- Voicemail-gtSkimmed Audio
Media Manager Interface
Media Manager Service
10MediaManagerServiceIF
- getFolders( ) and getFoldersAs( )
- Given a username, returns a list of folder names
- Returns the list as audio or gsm
- getList( ) and getListAs( )
- Given a username, foldername, and count
- Returns a list of messages (sendername, title,
date) - Returns the list as audio or gsm
- getMessage( )
- Given a Message Ref, returns the entire message
- getMessageContent( )
- Given a Content ID and return type
- Returns one part of the message as the return type
11Messages and Content Objects
- Media Message
- Media Reference id
- Array of Content Objects
- Content Object
- Content ID
- Data
- Content ID
- Media Reference id
- Content Part index
- Content Type
12Interface Example
- User asks for list of messages as GSM
- Media Manager returns a list of message headers
- Cell Phone sends a Content ID back
- Media Manager sends a voicemail Content Object
13Audio Tools
- Speech Recognition/Synthesis
- Transcribe voicemail to text
- IBM ViaVoice SDK and custom audio libs
- Natural Language Processing
- Directed word spotting by understanding content
- ViaVoice SRCL
- Pitch
- Detecting important words by emphasized pitch
- Pause
- Compression through pause removal
- Spurts
- Retrieve sentence structure of voicemail
14Transcoding Techniques
Voice Mail -gt Text Transcript Speech recognition
Voice Mail -gt Text Summary NLP, pitch detection and recognition
Voice Mail -gt Text Outline Pause detection and speech recognition
E Mail -gt Plain Audio Speech synthesis
E Mail -gt GSM Audio Speech synthesis and toast
Voice Mail -gt Skimmed Audio Pause detection
Voice Mail -gt Audio Summary Text summary and speech synthesis
Voice Mail -gt GSM Summary Audio summary and toast
15Examples
Original Voicemail
Hello, This is Barbara. How are you and the
cats doing? I was wondering if you would feed
them a little more the first time in case they
eat too much. My number is (713) 465-5155. You
can call me anytime. Have a very good holiday.
Bye bye
Processed Voicemail
(Skimmed)
(Just pitch)
(Pitch emphasized words in green)
16Examples continued...
Original Voicemail
Faced with a seemingly inevitable engineering
task authors tend to adopt one of two strategies
for adding new services to the Internet
landscape inflexible, highly tuned,
hand-constructed services.
Processed Voicemail
(Skimmed)
(Just pitch)
- Faced with a seemingly inevitable engineering
task authors tend to adopt what it to strategies
for adding new services to the internet
landscape. - Inflexible, highly Tate, had constructed
services.
(Pitch emphasized words in green)
17Results
- Pause detection
- Worked well for given applications
- Playback speedup by 50-70
- Pitch detection
- Problems due to high pitch sounds and transitions
- Speech recognition
- Performance decrease in conversational settings
- Natural Language Processing
- Performed well with small grammar
18Example Adding GSM Acess
- Define a specific types, ie GSMAudio, GSMSummary
- Optionally create new Content Objects
- Add Content Object definition to MediaManager
- Add add gsm transcoder to TranscoderService
19Detail Adding GSM Access
- Add Content Object definition to MediaManager
- Define GSMAUDIO and GSMSUMMARY
- Add cases to createObject() in Content Object
- Add cases to Media Manager
- Add GSM to Transcodeer
- Add method toGSM() to Transcoder
- Edit .config file
- External.transcoder.gsm rungsm
- Edit related transcoders
- speechSynthesizer and audioSummary()
20Implementing Other Mail Stores
- Examples IMAP, POP, Microsoft Exchange Server
- Implement MailAccessIF
- String getMAFolders( userName )
- MediaMessage getMAList( userName, folderName,
count ) - MediaMessage getMAMessage( MediaRef )
- ContentObject getMAMessageContent( ContentID )
- Add new protocol to Media Manager protocol table
- Optionally add protocol for users in to
FolderStore
21Conclusion
- Overall
- System useful as navigational hints
- To achieve total comprehension, need better voice
recognition - What works well
- Skimming using pause removal
- Detecting spurts for structure
- What needs work
- Speech detection in conversational settings
- Pitch emphasis needs refining
- Future Directions
- Implementing more mail stores
- Enhancing interfaces
- Pause detection/word boundaries using speech
detection - Developing voicemail grammars
- Using NLP feedback with pitch emphasis detection
- Improved speech detection in noisy environments
22(No Transcript)
23MediaManagerServiceIF
- String getFolders( userName )
- byte getFoldersAs( userName, returnType )
- MediaMessage getList( userName, folderName,
count ) - byte getListAs( userName, folderName, count,
returnType ) - MediaMessage getMessage( MediaRef )
- ContentObject getMessageContent( ContentID,
returnType )
24Pitch Detection
- The Idea
- A speakers pitch naturally changes when
introducing topics or emphasizing words
Hirshberg92 - Use pitch increases as hints for important
words - Algorithm Aaron95
- Determine pitch for each 20 ms frame (FFT with
SHS) - Set emphasis threshold to be top 1 of pitch
values (by histogram) - Mark 1 sec interval as emphasized if contains gt3
emphasized frames
25Pause Detection
- Why is pause detection useful?
- Removing pauses speedups playback
- Typically, 50-70 of original time Foulke71
- Long pauses signify groups (talk spurts)
- Noise and soft sounds create difficulties
- Algorithm Smoothed Histogram Lamet81
- Calculate energy per 10 ms frame
- Threshold based on smoothed histogram (5 dB after
first peak) - Use heuristics to remove artifacts
26Results
- Pause detection
- Worked well for given applications
- Playback speedup by 50-70
- Pitch detection
- Problems due to high pitch sounds and transitions
- Speech recognition
- Performance decrease in conversational settings
- Natural Language Processing
- Performed well with small grammar
27Conclusion
- Overall
- System useful as navigational hints
- To achieve total comprehension, need better voice
recognition - What works well
- Skimming using pause removal
- Detecting spurts for structure
- What needs work
- Speech detection in conversational settings
- Pitch emphasis needs refining
- Future Directions
- Implementing more mail stores
- Enhancing interfaces
- Pause detection/word boundaries using speech
detection - Developing voicemail grammars
- Using NLP feedback with pitch emphasis detection
- Improved speech detection in noisy environments
28Works Cited
- Arons95 B. Arons. Interactively Skimming
Recorded Speech, Ph.D. dissertation, MIT 1985. - Foulke71 E. Foulke The Perception of Time
Compressed Speech. Ch 4 in Perception of
Language, edit by P.M. Kjeldergaaid, D.L. Horton,
and J.J. Jenkins, Charles E. Merill Publishing
Company, 1971. pp. 79-107 - Hirshberg92 J. Hirschberg and B. Grosz.
Intonational Features of Local and Global
Discourse. In Proceedings of the Speech and
Natural Language workshop (Harriman, NY, Feb.
23-26). Morgan Kaufman Publishers, 1992. pp.
441-446. - Lamel81 L.F. Lamel, L.R. Rabiner, A.E.
Rosenberg, and J.G. Wilpson. An Improved
Endpoint Detector for Isolated Word Recognition.
IEEE Transactions on Acoustics, Speech, and
Signal Processing ASSP-29, 4. (Aug, 1981),
771-785.
29Architecture
Client
Client
- Transcoder Service
- Voicemail-gtText Transcript
- Voicemail-gtText Summary
- Voicemail-gtText Outline
- Email -gtPlain Audio
- Email -. GSM Audio
- Voicemail -gt GSM Summary
- Voicemail-gtAudio Summary
- Voicemail-gtSkimmed Audio
Folder Store
Client