Sphinx 3.4 Development Progress Report in February

About This Presentation

Title:

Sphinx 3.4 Development Progress Report in February

Description:

CALO and S3.5 Development. Which features should be there to ... (In Phoenix) Better semantic parsing. Resource Acquisition and ... Phoenix's source code is ... – PowerPoint PPT presentation

Number of Views:36

Avg rating:3.0/5.0

Slides: 19

Provided by: Arthu61

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Sphinx 3.4 Development Progress Report in February

1
Sphinx 3.4 DevelopmentProgress Report in February

Arthur Chan, Jahanzeb Sherwani
Carnegie Mellon University
Mar 1, 2004

2
This Presentation

S3.4 Development Progress
Speed-up
Language Model facilities
CALO and S3.5 Development
Which features should be there to make CALO
better?
Schedule for next three months

3
Review of Last Month Progress

Last month
Wrote a speed-up version of s3.
Completed some coding of s3.4 speed-up task.
This month
Backbone of speed-up functionalities s3.4
completed and tested.
Basic LM facilities completed and smoked-tested.

4
Current Systems Specifications(without Gaussian
Selection)
5
Speed-up Facilities in s3.3
GMM Computation
Seach
Lexicon Structure
Tree.
Pruning
Standard
Heuristic Search Speed-up
Not Implemented
Frame-Level
Not implemented
Senone-Level
Not implemented
Gaussian-Level
SVQ-based GMM Selection Sub-vector constrained
to 3
Component-Level
SVQ code removed
6
Speed-up Facilities in s3.4
GMM Computation
Seach
Lexicon Structure
Tree
Pruning
(New) Improved Word-end Pruning
Heuristic Search Speed-up
(New) Phoneme-Look-ahead
Frame-Level
(New) Naïve Down-Sampling (New) Conditional
Down-Sampling
Senone-Level
(New) CI-based GMM Selection
Gaussian-Level
(New) VQ-based GMM Selection (New) Unconstrained
no. of sub-vectors in SVQ-based GMM Selection
Component-Level
(New) SVQ code enabled
7
S3.4 Speed Performance in Communicator Task
8
Issues in Speed Optimization

Implementation Issues
Beams applied on GMM causing many techniques hard
to be implemented
Some facilities were hardwired for specific
purpose.
Performance Issues
Each techniques reduced computation by 40-50
with lt5 degradation.
However, they didnt add-up
Reduction in computation has certain lower bound
(usually 75-80 reduction is max.)
Overhead is huge in some techniques
E.g. VQ-based Gaussian Selection take 0.25xRT

9
Language Model Facilities

S3.3 only accept single LM without class in
binary format
So far, S3.4 is able to accept multiple
class-based LMs in binary format.
One major modification of codes
Affect 6-7 files.
Caveats
Not perfect implementation.
Text format is not yet supported. Backward
compatibility is an issue.
Lack of test-cases. Only slightly smoke-tested
1 more week work

10
Problems with s3.4 (valid for Feb 29th, 2004)

Only accept DMP file.
Txt format reader is very complex in Sphinx 2.
Straight conversion is not clean.
LMs are all loaded into memory
We can work on this.
Lexical tree are all built at the beginning
We tried to avoid the overhead of rebuilding tree
in every utterance.

11
Summary in Sphinx 3.4 Development

Derivative s3.3
With Speed Optimization
Better LM facilities
Algorithmic Optimization is 90 completed
Still need to improve overhead performance.
Tree-based GMM selection is desirable.
Improvement for individual technique.
Go-through the major hurdle of multiple LMs and
class-based LMs.
Need more time to make it more stable.
Expected internal release time March 8, 2004

12
Sphinx 3.4 and CALO

Which pieces are missing?
Sphinx 3.4s decoding is still not streamlined gt
Continuous Listening is not yet enabled.
Sphinxs speed may still not be ideal.
From s3 to s3.3, 10 degradation.
Sphinx 3.4 doesnt learn from data yet.

13
Sphinx 3.5. What should we do in next 3 months?

Expected release time (May June)
Interfaces
Streamlined front-end and decoding
(?) Portaudio based audio routine.
Speed/Accuracy
Improved lexical tree search
Machine optimization of Gaussian computation.
Combination of multiple recognizers
Learning
Acoustic Model adaptation
(?) Language Model adaptation
(In Phoenix) Better semantic parsing
Resource Acquisition and Load Balancing

14
Highlight I Speed/Accuracy

Improved lexical tree search
Current implementation used single lexical tree.
May be desirable to create tree copies.
Machine Optimization of Gaussian Computation
SIMD (Single Implementation Multiple Data)
Require help from assembly language experts.
(Jason/Thomas)

15
Highlight II Multiple Recognizer Combination and
Resource Acquisition

Research by Rong suggests combination of multiple
recognizer can improve accuracy
Speed worsen by 100 if we run two recognizers.
An interesting solution
Computation can be shared by other machines in
the meeting.
Inspired by routing implementation.
A very natural solution in meeting scenario
because usually only one person will be speaking.
Challenges Bandwidth and Load Balancing

16
Highlight III