Title: 3rd Progress Meeting For Sphinx 3.6 Development
13rd Progress Meeting For Sphinx 3.6 Development
- Arthur Chan,
- David Huggins-Daines,
- Yitao Sun
- Carnegie Mellon University
- Jan 25, 2006
2This meeting
- 3rd Progress report on 3.6 development (40 pages)
- Agenda
- What happened in Fall 2005? (4 slides)
- Progress of Sphinx Development in Fall 2005 (17
slides) - Summary of Progress in 2005 (10 slides)
- Discussion Should we create one release
candidate? (1 slide)
3What happened in FALL 2005?
4What happened in Fall 2005?
- Major Events in Sphinx Development
- We participate GALE in Oct 2006
- Conformance of the recognizers (sphinx 3 and
sphinx 4) become an issue - Lack of advanced acoustic modeling techniques
become very glaring - Sphinx 3 and 4 have gone through bug fixes.
- CALO effort are now split to two
- Off-line recognizer require major improvement in
LM and AM. - AM Issue is shared with GALE
- On-line recognizer (CALO jargon Smartnote)
- Now have new LM and AM
- Require significant development work
5Time distribution (Estimated)
- Arthur
- 50 on GALE, 20 on CALO, 30 on Sphinx
- Dave
- 65 CALO, 30 on PocketSphinx, 5 on Sphinx
- Yitao
- 90 CALO, 10 on Sphinx
6The Two Funded Projects
- Upside
- They point to issues that need to be solved
- Need significant reprioritization of tasks
- Balance of effort on the 2 projects is now
achieved - Downside
- Code development of Sphinx becomes a slower
process - Also, we havent released s3 for a while
- gt Should we release the code now?
- Tired students and staffs can be found everywhere
7Progress of Sphinx 3.6 in FALL 2005
8Overview
- Work on second-stage
- Merging of bestpath search in the 2-nd stage of
tree search - IBM lattice generation
- word confidence estimation
- Behavior changes and bug fixes
- Treatment of acoustic scores
- Assertion in vithist.c
- Attempts in search algorithm improvements
- Mode 3 Flat lexicon decoding
- Mode 4 Tree lexicon decoding
- Sphinx on Mandarin and coded language.
- New tools conf, dp
9Work Schedule
- Sep 1 to Oct 1
- Implementation of triphones in flat lexicon
decoder - Oct 1 to Nov 1
- Implementation of triphones on tree lexicon
decoder (incomplete) - Nov 1 to Dec 8
- IBM lattice generation
- Confidence score generation
- Fixed issues in scores
- Dec 8 to Jan 3 Concept of vacation was tried
- Jan 3 to now
- Fixed bugs, prepare release.
10Second-stage Processing
- Best-path search could now be specified in decode
- Implementation requires write back. (urgh.)
- Recognizer can now generate lattice in IBM format
- Word is attached at the link
- Sphinx format generates word attached to the
node. - Scores are normalized with best senone scores
- Rongs confidence-based routine is now in Sphinx
- conf
- Goodies use Sphinx logs3 routine -gt
significantly reduce alpha-beta scores mismatch.
11Second-stage Processing (cont.)
- Further work
- Best-path generation doesnt conform to past 3.5
- -gt Bugs caused by 3.6 development
- Also, the best path is not always in the lattice
- -gt Legacy bug
- Confidence-based method
- Lattice-based could only be used off-line
currently - 10 of the data still have alpha-beta mismatch
- Consensus network generation need special focus
12Scores we see (Change 1)
- Tree search now truly generate un-normalized
scores. - was normalized by the ending frame only
- Caused by bug introduced in mid-2005
- All 1-st stage search use the same score logging
functions - Include align, allphone, decode_anytopo, decode
- matchseg_write, match_write are the current
versions - log_ is still used but will soon be totally
replaced
13Scores we see(Change 2)
- Multi-stream GMM computation (ms_gauden)
- By default, it wont quantize log pdf to 8 bits
now - Single-stream GMM computation
- Vectors with zero means and variances are removed
(-remove_zero_var_gau) - Scores and performance will change
- Testing resource has changed.
- (Evandro grins at this point)
14Scores we see (Change 3)
- Sphinx now supports generation of different
hypseg format (-hypseg_fmt) - SPHINX 2-format
- SPHINX 3-format
- ctm format
- Always require more processing, but it is better
than nothing.
15Scores a summary
- Unnormalized (true) acoustic and language scores
generated by (-hypsegscore_unscale) - 1-st stage search and
- Best path search right after the 1-st stage
- Normalized acoustic score would be generated by
- Lattice generation
- If developers wants to have true scores in
lattice - Developers could get the best scores from the
decoder (bestsenscrdir) and do their own
processing
16Other important bug fixes
- Bug in vithist.c
- Caused assertion and stop the recognizer
- Now fix and will return error message to the
search abstraction routine.
17Attempts in search algorithm improvements (Mode 3)
- Flat-lexicon decoder
- Search implementation is completed
- decode could now use flat-lexicon decoding
- -op_mode 3
- Decoders revamping is completed
- Mode 2 (FST)
- Mode 3 (Flat-lexicon)
- Mode 4 (Ravis Tree-Lexicon)
- Mode 5 (Arthurs Tree-Lexicon)
- decode_anytopo is still there for backward
compatibility purpose - decode_anytopo decode in mode 3
18No Further Re-factoring
- Avoid re-factoring before next check-in
- Align and allphone have different input/output
file formats - It doesnt make sense to stuff into a single
executable. - Using XML configuration and control file will be
a choice - But it takes too much time to implement
19Algorithmic Work -Flat Lexicon Decoder
- Full triphone completed in flat-lexicon decoding
- 2.5 relative improvement in accuracy
- But requires 100xRT (urgh)
- Useful for debugging
- Also considered full trigram implementation
- Will results in another 5-10 times slow down
- Conclusion
- Flat lexicon search has come to its limit
20Algorithmic Work -Tree Lexicon Decoder
- Current full triphone implementation
- Has flaws in score propagation
- Tree copies
- ? No time to do it at all, Q4s workload nearly
kill AC - Benchmarking results
- GALE results
- Full Lexicon Tree Lexicon
- CALO/Communicator results
- Tree Lexicon 5 relative poorer.
- Conclusion
- Half a year on search is expected to give us
another 5
21Conclusion on Search
- Need to seriously consider
- Is working on search a good idea?
- In both CALO/GALE, gain come from
- SAT and cross adaptation
- Second-stage processing
- Confusion network
- Confidence annotation
- First-stage SD -gt Second-stage SA
- VTLN
- also only give 5 rel
- but it only takes 5 days to implement
22Sphinx on Different Text Encodings
- There are already non-CMU work for
- Spanish
- French
- Big question mark
- Could it work on other encoding?
23Sphinx on Mandarin (gb2312)
24Sphinx on Mandarin (cont.)
- Thanks to Ravi
- Bugs we fixed to get it through
- 1236322 libutil\str2words special character bug
- 1236166 special character wasn't supported
- This should give us fairly good foundation to
start on most language
25Summary of Sphinx in Fall 2005
- We have done something
- Strong focus in search research doesnt seem to
get us far. - Fire to fight on the modeling side
- Sounds like the time to check in and move on
26Progress of Sphinx 3.X (From X5 to X6)
27Progress of Sphinx 3.X(From X5 to X6)
- New Features (4 slides)
- Items that are significant
- Gentle, mild and simple re-factoring and its
consequence (4 slides) - Documentation (1 slide)
- Regression testing (1 slide)
- Pruned Features ?
28New Features (Search)
- Speed
- Further enhancement of CIGMMS
- BBI tree implementation (by Dave, in SphinxTrain)
- Search
- FST search
- Full triphone implementation in decode_anytopo
- Separation of search abstraction/implementation
in 3.X
29New Features (Adaptation)
- Adaptation
- Multiple classes for MLLR (by Dave)
- MAP adaptation (by Dave, in SphinxTrain)
30New Features (Others)
- New executables
- lm_convert
- lm3g2dmp
- dp
- If Evandro ask, Why do we need dp in sphinx 3?
- Say this, I dont know, we found the executable
at ./s3/src/misc/dp.c - conf
- Off-line word-level confidence annotation program
- Mismatch dict-LM
- Un-match entries could be automatically generated
(-lts_mismatch)
31Gentle, mild and simple re-factoring (GMM
computation)
- GMM computation is now shared among
- decode, decode_anytopo, align, allphone
- So e.g.
- decode_anytopo could use fast GMM computation
- decode could use SCHMM
32Gentle, mild and simple re-factoring (Search)
- Its consequence in search programming
- FST, Flat, Tree search now share the same
interface (decode) - Just like Sphinx 2 and 4
- Writing a new search wont be replacing a search
- 2-nd stage now works for decode
- Alright, not for FST search
33Gentle, mild and simple re-factoring (Others)
- Scores output now rationalized
- Several bug fixes causing seg faults are
eliminated - Vithist.c bugs
- Class-based LM is now working correctly
- Command-line among applications are now
synchronized and re-factored
34Documentation/Tutorial
- Hieroglyph
- Now writing 2nd draft
- Doxygen documentation
- (by Evandro) Tutorial now works
- archive_s3
- Sphinx 2
- Sphinx 3
- Sphinx 4
35Regression Testing
- Our weakest link
- Now daily
- Standard regression test is done
- Performance check on Communicator/TIDIGITs/TI46
- doxygen documentation will be made and tested
- make check now has 50 tests (3.5 11)
- fairly robust to careless mistakes
36Expected Trimmed Features
- Search
- Mode 0 alignment
- (?) Mode 1 allphone
- Mode 5 word tree copies
- If full triphone in Ravis tree search couldnt
be quickly, trimmed it as well - (?) Yitaos PCFG rescoring
37Conclusion of Sphinx 3.X (From X5 to X6)
- We have done something
- Development last year
- has enriched the code
- Niceify a lot of things internal to code
- There are hiccups in our development
- Not perfect
- Well, compare this with NASDAQ.
38DiscussionWhat should we do now?
- Option 1, keep on working without release
- Option 2, merge the crazy branch with the trunk
without release - Option 3, merge the crazy branch with the trunk
and create release-candidate Sphinx 3.6 RCI
39End