Automatic Content Extraction for Voicemail Using Ninja - PowerPoint PPT Presentation

1 / 9
About This Presentation
Title:

Automatic Content Extraction for Voicemail Using Ninja

Description:

Use Ninja infrastructure. Augment existing Ninja email services to treat ... Ninja iSpace service which handles email, voicemail encoded in MIME, and ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 10
Provided by: stevencz
Category:

less

Transcript and Presenter's Notes

Title: Automatic Content Extraction for Voicemail Using Ninja


1
Automatic Content Extraction for Voicemail Using
Ninja
Steven Czerwinski and Barbara Hohlt
Goal Make voicemail more accessible
  • Enable faster browsing of many voicemails
  • Access from different devices with different
    capabilities

Our solution
  • Use Ninja infrastructure
  • Augment existing Ninja email services to treat
    voicemails as MIME encoded email
  • Create Ninja services to process and interpret
    voicemails
  • Specialized transcoding services
  • Extracts high level information from voicemails
  • Includes audio, skimmed audio, transcript,
    text/audio summary, and outline

2
Voicemail Processing Techniques
  • Speech recognition/synthesis
  • Transcribe voicemail to text
  • IBM ViaVoice SDK and custom audio libs
  • Natural language processing
  • Directed word spotting by understanding content
  • ViaVoice SRCL
  • Pitch
  • Detecting important words by emphasized pitch
  • Pause
  • Compression through pause removal
  • Spurts
  • Retrieve sentence structure of voicemail

3
Architecture
Client
Client
  • Transcoder Service
  • Voicemail-gtText Transcript
  • Voicemail-gtText Summary
  • Voicemail-gtText Outline
  • Email -gtPlain Audio
  • Voicemail-gtAudio Summary
  • Voicemail-gtSkimmed Audio

Folder Store
Client
Mail Access Interface
Mail Access Interface
Mail Access Interface
NinjaMail
POP
IMAP
4
System Components
  • Media manager service
  • Ninja iSpace service which handles email,
    voicemail encoded in MIME, and specialized
    transcoding of voicemail
  • Users can access all their mail across different
    mail protocols with different types of devices
  • Transcoder service
  • Ninja iSpace service which transforms data
  • Folder store
  • Stores user protocol information
  • Mail access interface
  • A common interface to generalize access to
    different mail protocols

5
Pitch Detection
  • The idea
  • A speakers pitch naturally changes when
    introducing topics or emphasizing words
    hirshberg92
  • Use pitch increases as hints for important
    words
  • Algorithm aaron95
  • Determine pitch for each 20 ms frame (FFT with
    SHS)
  • Set emphasis threshold to be top 1 of pitch
    values (by histogram)
  • Mark 1 sec interval as emphasized if contains gt3
    emphasized frames

6
Pause Detection
  • Why is pause detection useful?
  • Removing pauses speedups playback
  • Typically, 50-70 of original time foulke71
  • Long pauses signify groups (talk spurts)
  • Noise and soft sounds create difficulties
  • Algorithm smoothed histogram lamet81
  • Calculate energy per 10 ms frame
  • Threshold based on smoothed histogram (5 db after
    first peak)
  • Use heuristics to remove artifacts

7
Examples
Original Voicemail
Hello, This is Barbara. How are you and the
cats doing? I was wondering if you would feed
them a little more the first time in case they
eat too much. My number is (713) 465-5155. You
can call me anytime. Have a very good holiday.
Bye bye
Processed Voicemail
(Skimmed)
(Just pitch)
Translated Talk spurts
  • Phyllis Barbara
  • Area in the cat staring
  • And then if you run but feed them
  • A little more the first time in case they eat too
    much
  • On my number is (713) 465-5155
  • You can call me anytime.
  • Have every holiday
  • Of light

(Pitch emphasized words in green)
8
Results
  • Pause detection
  • Worked well for given applications
  • Playback speedup by 50-70
  • Pitch detection
  • Problems due to high pitch sounds and transitions
  • Speech recognition
  • Performance decrease in conversational settings
  • Natural language processing
  • Performed well with small grammar

9
Conclusion
  • Overall
  • System useful as navigational hints
  • To achieve total comprehension, need better voice
    recognition
  • What works well
  • Skimming using pause removal
  • Detecting spurts for structure
  • What needs work
  • Speech detection in conversational settings
  • Pitch emphasis needs refining
  • Future directions
  • Pause detection/word boundaries using speech
    detection
  • Developing voicemail grammars
  • Using NLP feedback with pitch emphasis detection
  • Improved speech detection in noisy environments
Write a Comment
User Comments (0)
About PowerShow.com