Title: 2nd Progress Meeting For Sphinx 3.6 Development
12nd Progress Meeting For Sphinx 3.6 Development
- Arthur Chan,
- David Huggins-Daines,
- Yitao Sun
- Carnegie Mellon University
- Jun 7, 2005
2This meeting (2nd Progress Meeting of 3.6)
- Purpose of this meeting
- A working progress report on various aspects of
the development - A briefing on embedded sphinx2. (by David)
- A briefing on sphinx3s crazy branch (by
Arthur) - As a branch in CVS
- Include several interesting features
- Include bunches of mild changes
- Discussion before another check-in.
3Outline of this talk
- Review of 1st Progress Meeting
- Progress of Embedded version of Sphinx 2 (by
Dave, 7-10 pages) - Progress of Sphinx 3s crazy branches (15-20
pages) - Architecture Diagram of Sphinx 3.6
- Changes in search abstraction (7 pages)
- Progress on search implementation (8 pages)
- GMM Computation
- FSG mode, Word Switching Tree Search mode
- Mild re-factoring (Not gentle any more) (3
pages) - LM
- S3.0 family of tools
- Hieroglyph (1 page)
4Review of 1st Progress Meeting
- Last time..
- Two separate layers were defined
- Low-Level Implementation of Search and
- Possible abstractions of Search
- Just introduced, its advantage was not yet
revealed. - Implementation of Mode 5 was still under
developed (only 10 Completion) - Just modularize libs3decoder to 8 sub-modules
5Progress of Architecture in Sphinx 3.6
6Motivation of Architecting Sphinx 3.X
- Need of new search algorithms
- New search algorithm development could have risk.
- We dont want to throw away the old one.
- Mere replacement could cause backward
compatibility problem. - Code has grown to a stage where
- Some changes could be very hard.
- Multiple programmers become active at the same
time - CVS conflict could become often if things are
controlled by if-else structure
7Architecture of Sphinx 3.X (Xlt6)
- Batch sequential Architecture (Shaw 96)
- Each executable would customize the sub-routines
decode
livepretend
Decode_anytopo
align
allphone
Initialization 1 (kb and kbcore)
Initialization 2
Initialization 3
Initialization 4
GMM Computation 1 approx_cont_mgau
GMM Computation 2 (Using gauden senone Method 1)
GMM Computation 3 (Using gauden senone Method 2)
GMM Computation 4 (Using gauden senone Method
3)
Search 1
Search 2
Search 3
Search 4
Process Controller 1
Process Controller 2
Process Controller 3
Process Controller 4
Command Line 1
Command Line 2
Command Line 3
Command Line 4
8Pros/Cons of Batch Sequential Architecture
- Pros
- Great flexibility for individual programmers
- No assumption, data structure are usually
optimized for the application. - Align and allphone have optimization.
- Crafting in individual application has high
quality - Cons
- Tremendous difficulty in maintenance
- Most changes need to be carried out for 5-6
times. - Spread disease of code duplication
- Code with functionality was duplicated multiple
times - Scared a lot of programmers in the past
- Beginners tend to love general architecture
9Big Picture of Software Architecture in Sphinx 3.6
- Layered and Object Oriented
- Implemented in C
- Major high level routines
- Initializer (kb.c or kbcore.c)
- A kind of clipboard for other controllers
- Process controller (corpus.c)
- Govern the protocol of processing a sentence
- Search abstraction routine (srch.c)
- Govern how search is done
- Implemented as piplines and filters with shared
memory - Each filter can be overridden, similar to what OO
language do - Command line processor (cmd_ln_macro.c and
cmd_ln.c) implemented as macros.
10Software Architecture Diagram of Sphinx 3.6
User Defined Applications
Fast Single Stream GMM Computation
livedecode API
Dictionary Library
livepretend
Search Library
Multi Stream GMM Computation
Search Controller
dag
LM Library
decode (anytopo)
Mode 0 Align
Process Controller
AM Library
Mode 1 Allphone
decode
Utility Library
Mode 2 FSG
Search Initializer
allphone
Mode 3 Anytopo
Feature Library
align
Mode 4 Magic Wheel
Command Line Processor
Miscellaneous Library
astar
Mode 5 WSFT
Controllers/ Abstractions
Applications
Implementations
Libraries
11Search Abstraction
- Search abstraction is implemented as objects
- Search operations are implemented as filters with
shared memory - Each filter, a kind of unique operation for
search - Ideally, each filter or a set of filter can be
replaced.
Select Active CD Senone
Compute Detail GMM Score (CD senone)
Compute Detail HMM Score (CD)
Propagate Graph (Phone- Level)
Rescoring At word End using High-Level KS (e.g.
LM)
Propagate Graph (Word- Level)
Compute Approx. GMM Score (CI senone)
Search For One Frame
12Different ways to implement Search
implementations
- 1, Use Default implementation
- Just specify all atomic search operations (ASOs)
provided - 2, Override search_one_frame
- Only need to specify GMM computation and how to
search_one_frame - 3, Override the whole mechanism
- For people who dislike the default so much
- Override how to search
13Concrete Examples
- Mode 4 (Magic Wheel) and Mode 5 (WST) are using
the default implementation - Mode 2 (FSG)
- override search_one_frame implementation
- But share GMM implementation.
- Likely, Mode 0 (align),1 (allphone) and 3 (flat
lexicon decoding) will also do the same.
14Future work
- Align, allphone and decode_anytopos re-factoring
are not yet completed. - Search abstraction need to consider
- More flexible mechanisms
- Do the search backward. (for backward search)
- Approximate search in the first stage (for
phoneme and word look-ahead) - (Optional) Parallel and distributed decoding
- Command-line and internal modules could still
have mismatch - Might learn from mechanisms of Sphinx 2 and
Sphinx 4 - Controlling how an utterance could require 5
different files - A better control format?
- Not yet fully anticipate fixed point front-end
and GMM computation in Sphinx 2
15Progress of Search Implementation in Sphinx 3.6
16GMM Computation
- Decode can now use SCHMM
- specify by .semi.
- Implemented and tested by Dave
- GMM Computation in align, allphone, decode,
livepretend are now common - Not yet incorporate Sphinx 2 Fixed-point version
of GMM computation - It looks very delicious.
17Finite State Machine Search (Mode 2)
-Implementation
- Largely Completed (Completion 70)
- Recipe
- Search function pointer implementation
- adapted from Sphinx 2 FSG_ family of routines
- GMM computation
- Use Sphinx 3 GMM computation
- Already allows CIGMMS
18Finite State Machine Search (Mode 2) Problems
for the Users
- Not yet seriously tested
- Finding test cases are hard
- Still dont have a way to write grammar
- Yitaos goal in Q3 and Q4 2005
- Either directly incorporate the CFGs score into
the search - Or implement an approximate converter from CFG to
FSM (HTKs method)
19Finite State Machine Search (Mode 2) Other
Problems
- Problems inherited from Sphinx2 (copied from
Ravis slide) - No lextree implementation (What?)
- Static allocation of all HMMs not allocated on
demand (Oh, no! ) - FSG transitions represented by NxN matrix (You
cant be serious!! ) - Other wish list
- No histogram pruning (Houston, weve got a
problem.) - No state-based implementation (Wilson! I am
sorry!! ) - We need it for unifyication of BW, alignment,
allphone and FSG search.
20Time Switching Tree Search (Mode 4)
- Name changes
- It was lucky wheel
- Now is magic wheel
- In last check-in, after test-full, results are
exactly the same for 6 corpora - We could sleep.
- Future work
- Change the word end triphone implementation
- from composite triphone to full triphones
21Word Switching Tree Search (Mode 5)
- Now could run for the Communicator task
- With the same performance as mode 4
- Major reasons why it doesnt approach
decode_anytopos result - Bigram probability is not yet factored
- Not an easy task. Still considering howto.
- Triphones implementation is not yet exact
- Completion 30
22Future work on Mode 5
- N-gram Look-ahead
- Full trigram tree implementation
- Phoneme and Word Look-ahead
- Share full triphone implementation with mode 4 in
future.
23Big picture of All Search Implementations
- Finite state machine data structure could unify
- align,
- allphone,
- Baum-Welch,
- FSG search
- Time will show whether it is also applicable in
tree search. - Search implementation has more short-term demand.
- Mode 5 will be our new flag ship
- By Oct, 3 out of 4 goals in mode 5 should be
completed. - Between different searches, code should be shared
as much as possible
24Some other mild refactorings
25Summary of Re-factorings
- Not gentle any more
- But it is mild
- Several useful things to know
- Language model routine revamping
- S3.0 family of tools
- Overall status of merging
26LM routine
- Current capability
- Read both text-based and DMP-based LM
- Allow switching of LM
- Allow inter-conversion between text and DMP
format of LM - Provide single interface to all applications
- Tool of the month lm_convert
- lm3g2dmp
- Will be the application for future language model
inter-conversion - Other formats? CMULMTKs format?
27S3.0 family of tools
- Architecture drives many changes in the code
- Align, allphone and decode_anytopo now use
- kbcore
- Same version of multi-stream GMM Computation
routine - Simplified search structure.
- ctl_process mechanism
- Next step is to use srch.c interface.
- All tools are now sharing
- Sets of common command-line macros
28Code Merging
- Sphinx3.0, Sphinx 3.X and share are now unified.
- Alex Its time to fix the training algorithms!
- Ravi Its time to add full n-gram and full
n-phones to the recognizer!! - Dave Its time to work on pronunciation
modeling! - Yitao Its time to implement a CFG-based
search!! - Evandro Its time to do more regression test!
- Alan Dont merge Sphinx with festival!!
- Next step
- Its time to clean up SphinxTrain.
- We will keep the pace to be lt4 tools
check-in/month.
29Hieroglyphs
- Halves of Chapter 3 and 5 are finished
- Chapter 3 Introduction to Speech Recognition
- Missing Description of DTW, HMM and LM
- Chapter 5 Roadmap of building speech
recognition system - Missing
- How to evaluate the system?
- How to train a system? (Evandros tutorial will
be perfect) - Still 4 chapters (out of 12) of material to go
before 1st draft is written
30Conclusion
- We have done something.
- Embedded Sphinx 2
- Its completion will benefit both sphinx 2 and
sphinx 3 - Sphinx 3.6
- Its completion will benefit
- long term development
- Short term need in funded projects
- Tentative deadline Beginning of October