Technical Aspects of the CALO Recorder

About This Presentation

Title:

Technical Aspects of the CALO Recorder

Description:

One of the component of CAMPER. The four: CALO recorder. Speechalizer. End-pointing Information ... Several processing needs to be concurrently. VU meter ... – PowerPoint PPT presentation

Number of Views:9

Avg rating:3.0/5.0

Slides: 20

Provided by: csC76

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Technical Aspects of the CALO Recorder

1
Technical Aspects of the CALO Recorder

By
Satanjeev Banerjee
Thomas Quisel
Jason Cohen
Arthur Chan
Yitao Sun
David Huggins-Daines
Alex Rudnicky

2
Role of the CALO recorder

A centralized mechanism to collect all perceptual
events.
Speech, Text
CMU provides technology on
On Event Recording
On Speech Recognition

3
Role of the CALO Recorder

One of the component of CAMPER
The four
CALO recorder
Speechalizer
End-pointing Information
Prosodic Information
Speech Recognition
CAMSeg
Speech Segmentation
Understanding

4
An Architecture Diagram (Client Side)
Audio Capturing
Text Capturing through Keyboard
Other Events
Ring Buffers
End-Pointer
VU Meter
Speech Decoder
Storage
5
Persistence of Data

Background Intelligent Transfer System (BITS)
Use to transfer data off-line

6
Technical Challenges in the Recorder

Threading
Audio Buffering
Time-synchronization
Real-time processing
End-pointing
Speech processing
Portability
Maintenance/Distribution

7
Threading

Several processing needs to be concurrently
VU meter
Speech Processing and Higher-level Understanding
Graphical User Interface
Long development time was invested to make the
communication between to be correct.
(By Thomas Quisel) See Architecture Diagram next
slides
Example Issues In some platforms, WX
implementation will make GUI thread disallow
other threads to call its drawing functions.

8
(No Transcript)
9
Audio Buffering

Sphinx 2, 3.X libaudio require,
Capture audio
Do processing on the audio buffer.
If the processing thread is slightly slower than
1xRT
Audio will be lost
(By Jason Cohen) A ring buffer structure is
implemented.

10
Time Synchronization

By David Huggins
Simple NTP (SNTP) is used in getting universal
time coordinate (UTC) from arbitrary NTP server
Clone of standard NTP implementation
Internal Synchronization
Synchronization time between machines
50-60ms
Major challenge is the delay imposed by OS/audio
capturing software.

11
Real-time Processing

Role of End-pointing and Recognition
After long-time debate
Two stage end-pointing and recognition
architecture is chosen
By Ziad
High performance end-pointing routine is created
Gaussian Mixture Model-based
End-pointer implemented as a frames voter within
segments
The parameters are further manually tuned.
Speed optimized.
Now in s3ep, a customized version of Sphinx

12
(No Transcript)
13
Speech Recognizer

Resulting output is fed to the recognizer
Speech Recognition in meeting
Regards as one of the biggest challenge in the
field
Results largely varied from meeting style, number
of attendants, topics, disfluencies of the
speakers.

14
Accuracy Performance, still under heavy work,
Currently