Title: Sphinx 3.4 Development Progress Report in February
1Sphinx 3.4 DevelopmentProgress Report in February
- Arthur Chan, Jahanzeb Sherwani
- Carnegie Mellon University
- Mar 1, 2004
2This Presentation
- S3.4 Development Progress
- Speed-up
- Language Model facilities
- CALO and S3.5 Development
- Which features should be there to make CALO
better? - Schedule for next three months
3Review of Last Month Progress
- Last month
- Wrote a speed-up version of s3.
- Completed some coding of s3.4 speed-up task.
- This month
- Backbone of speed-up functionalities s3.4
completed and tested. - Basic LM facilities completed and smoked-tested.
4Current Systems Specifications(without Gaussian
Selection)
5Speed-up Facilities in s3.3
GMM Computation
Seach
Lexicon Structure
Tree.
Pruning
Standard
Heuristic Search Speed-up
Not Implemented
Frame-Level
Not implemented
Senone-Level
Not implemented
Gaussian-Level
SVQ-based GMM Selection Sub-vector constrained
to 3
Component-Level
SVQ code removed
6Speed-up Facilities in s3.4
GMM Computation
Seach
Lexicon Structure
Tree
Pruning
(New) Improved Word-end Pruning
Heuristic Search Speed-up
(New) Phoneme-Look-ahead
Frame-Level
(New) Naïve Down-Sampling (New) Conditional
Down-Sampling
Senone-Level
(New) CI-based GMM Selection
Gaussian-Level
(New) VQ-based GMM Selection (New) Unconstrained
no. of sub-vectors in SVQ-based GMM Selection
Component-Level
(New) SVQ code enabled
7S3.4 Speed Performance in Communicator Task
8Issues in Speed Optimization
- Implementation Issues
- Beams applied on GMM causing many techniques hard
to be implemented - Some facilities were hardwired for specific
purpose. - Performance Issues
- Each techniques reduced computation by 40-50
with lt5 degradation. - However, they didnt add-up
- Reduction in computation has certain lower bound
(usually 75-80 reduction is max.) - Overhead is huge in some techniques
- E.g. VQ-based Gaussian Selection take 0.25xRT
9Language Model Facilities
- S3.3 only accept single LM without class in
binary format - So far, S3.4 is able to accept multiple
class-based LMs in binary format. - One major modification of codes
- Affect 6-7 files.
- Caveats
- Not perfect implementation.
- Text format is not yet supported. Backward
compatibility is an issue. - Lack of test-cases. Only slightly smoke-tested
- 1 more week work
10Problems with s3.4 (valid for Feb 29th, 2004)
- Only accept DMP file.
- Txt format reader is very complex in Sphinx 2.
- Straight conversion is not clean.
- LMs are all loaded into memory
- We can work on this.
- Lexical tree are all built at the beginning
- We tried to avoid the overhead of rebuilding tree
in every utterance.
11Summary in Sphinx 3.4 Development
- Derivative s3.3
- With Speed Optimization
- Better LM facilities
- Algorithmic Optimization is 90 completed
- Still need to improve overhead performance.
Tree-based GMM selection is desirable. - Improvement for individual technique.
- Go-through the major hurdle of multiple LMs and
class-based LMs. - Need more time to make it more stable.
- Expected internal release time March 8, 2004
12Sphinx 3.4 and CALO
- Which pieces are missing?
- Sphinx 3.4s decoding is still not streamlined gt
Continuous Listening is not yet enabled. - Sphinxs speed may still not be ideal.
- From s3 to s3.3, 10 degradation.
- Sphinx 3.4 doesnt learn from data yet.
13Sphinx 3.5. What should we do in next 3 months?
- Expected release time (May June)
- Interfaces
- Streamlined front-end and decoding
- (?) Portaudio based audio routine.
- Speed/Accuracy
- Improved lexical tree search
- Machine optimization of Gaussian computation.
- Combination of multiple recognizers
- Learning
- Acoustic Model adaptation
- (?) Language Model adaptation
- (In Phoenix) Better semantic parsing
- Resource Acquisition and Load Balancing
14Highlight I Speed/Accuracy
- Improved lexical tree search
- Current implementation used single lexical tree.
- May be desirable to create tree copies.
- Machine Optimization of Gaussian Computation
- SIMD (Single Implementation Multiple Data)
- Require help from assembly language experts.
(Jason/Thomas)
15Highlight II Multiple Recognizer Combination and
Resource Acquisition
- Research by Rong suggests combination of multiple
recognizer can improve accuracy - Speed worsen by 100 if we run two recognizers.
- An interesting solution
- Computation can be shared by other machines in
the meeting. - Inspired by routing implementation.
- A very natural solution in meeting scenario
because usually only one person will be speaking.
- Challenges Bandwidth and Load Balancing
16Highlight III
- Learning
- Acoustic Model
- Maximum Likely Linear Regression (MLLR)
- Will be responsible by Jahanzeb
- (?)Language Model
- How?
- Cached-based LM?
- (?)Improved Robust Parsing
- Better parsing based on previous command history
- ? Phoenixs source code is not easy to trace
- Thomas Harriss implementation may be a good
place to start.
17Arthur and Jahanzebs Proposed Schedule
18Cont.