3rd Progress Meeting For Sphinx 3.6 Development - PowerPoint PPT Presentation

About This Presentation

Title:

3rd Progress Meeting For Sphinx 3.6 Development

Description:

Sphinx 3 and 4 have gone through bug fixes. CALO effort are now split to two ... Several bug fixes causing seg faults are eliminated. Vithist.c bugs ... – PowerPoint PPT presentation

Number of Views:31

Avg rating:3.0/5.0

Slides: 40

Provided by: Arthu61

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: 3rd Progress Meeting For Sphinx 3.6 Development

1
3rd Progress Meeting For Sphinx 3.6 Development

Arthur Chan,
David Huggins-Daines,
Yitao Sun
Carnegie Mellon University
Jan 25, 2006

2
This meeting

3rd Progress report on 3.6 development (40 pages)
Agenda
What happened in Fall 2005? (4 slides)
Progress of Sphinx Development in Fall 2005 (17
slides)
Summary of Progress in 2005 (10 slides)
Discussion Should we create one release
candidate? (1 slide)

3
What happened in FALL 2005?
4
What happened in Fall 2005?

Major Events in Sphinx Development
We participate GALE in Oct 2006
Conformance of the recognizers (sphinx 3 and
sphinx 4) become an issue
Lack of advanced acoustic modeling techniques
become very glaring
Sphinx 3 and 4 have gone through bug fixes.
CALO effort are now split to two
Off-line recognizer require major improvement in
LM and AM.
AM Issue is shared with GALE
On-line recognizer (CALO jargon Smartnote)
Now have new LM and AM
Require significant development work

5
Time distribution (Estimated)

Arthur
50 on GALE, 20 on CALO, 30 on Sphinx
Dave
65 CALO, 30 on PocketSphinx, 5 on Sphinx
Yitao
90 CALO, 10 on Sphinx

6
The Two Funded Projects

Upside
They point to issues that need to be solved
Need significant reprioritization of tasks
Balance of effort on the 2 projects is now
achieved
Downside
Code development of Sphinx becomes a slower
process
Also, we havent released s3 for a while
gt Should we release the code now?
Tired students and staffs can be found everywhere

7
Progress of Sphinx 3.6 in FALL 2005
8
Overview

Work on second-stage
Merging of bestpath search in the 2-nd stage of
tree search
IBM lattice generation
word confidence estimation
Behavior changes and bug fixes
Treatment of acoustic scores
Assertion in vithist.c
Attempts in search algorithm improvements
Mode 3 Flat lexicon decoding
Mode 4 Tree lexicon decoding
Sphinx on Mandarin and coded language.
New tools conf, dp

9
Work Schedule

Sep 1 to Oct 1
Implementation of triphones in flat lexicon
decoder
Oct 1 to Nov 1
Implementation of triphones on tree lexicon
decoder (incomplete)
Nov 1 to Dec 8
IBM lattice generation
Confidence score generation
Fixed issues in scores
Dec 8 to Jan 3 Concept of vacation was tried
Jan 3 to now
Fixed bugs, prepare release.

10
Second-stage Processing

Best-path search could now be specified in decode
Implementation requires write back. (urgh.)
Recognizer can now generate lattice in IBM format
Word is attached at the link
Sphinx format generates word attached to the
node.
Scores are normalized with best senone scores
Rongs confidence-based routine is now in Sphinx
conf
Goodies use Sphinx logs3 routine -gt
significantly reduce alpha-beta scores mismatch.

11
Second-stage Processing (cont.)

Further work
Best-path generation doesnt conform to past 3.5
-gt Bugs caused by 3.6 development
Also, the best path is not always in the lattice
-gt Legacy bug
Confidence-based method
Lattice-based could only be used off-line
currently
10 of the data still have alpha-beta mismatch
Consensus network generation need special focus

12
Scores we see (Change 1)

Tree search now truly generate un-normalized
scores.
was normalized by the ending frame only
Caused by bug introduced in mid-2005
All 1-st stage search use the same score logging
functions
Include align, allphone, decode_anytopo, decode
matchseg_write, match_write are the current
versions
log_ is still used but will soon be totally
replaced

13
Scores we see(Change 2)

Multi-stream GMM computation (ms_gauden)
By default, it wont quantize log pdf to 8 bits
now
Single-stream GMM computation
Vectors with zero means and variances are removed
(-remove_zero_var_gau)
Scores and performance will change
Testing resource has changed.
(Evandro grins at this point)

14
Scores we see (Change 3)

Sphinx now supports generation of different
hypseg format (-hypseg_fmt)
SPHINX 2-format
SPHINX 3-format
ctm format
Always require more processing, but it is better
than nothing.

15
Scores a summary

Unnormalized (true) acoustic and language scores
generated by (-hypsegscore_unscale)
1-st stage search and
Best path search right after the 1-st stage
Normalized acoustic score would be generated by
Lattice generation
If developers wants to have true scores in
lattice
Developers could get the best scores from the
decoder (bestsenscrdir) and do their own
processing

16
Other important bug fixes

Bug in vithist.c
Caused assertion and stop the recognizer
Now fix and will return error message to the
search abstraction routine.

17
Attempts in search algorithm improvements (Mode 3)

Flat-lexicon decoder
Search implementation is completed
decode could now use flat-lexicon decoding
-op_mode 3
Decoders revamping is completed
Mode 2 (FST)
Mode 3 (Flat-lexicon)
Mode 4 (Ravis Tree-Lexicon)
Mode 5 (Arthurs Tree-Lexicon)
decode_anytopo is still there for backward
compatibility purpose
decode_anytopo decode in mode 3

18
No Further Re-factoring

Avoid re-factoring before next check-in
Align and allphone have different input/output
file formats
It doesnt make sense to stuff into a single
executable.
Using XML configuration and control file will be
a choice
But it takes too much time to implement

19
Algorithmic Work -Flat Lexicon Decoder

Full triphone completed in flat-lexicon decoding
2.5 relative improvement in accuracy
But requires 100xRT (urgh)
Useful for debugging
Also considered full trigram implementation
Will results in another 5-10 times slow down
Conclusion
Flat lexicon search has come to its limit

20
Algorithmic Work -Tree Lexicon Decoder

Current full triphone implementation
Has flaws in score propagation
Tree copies
? No time to do it at all, Q4s workload nearly
kill AC
Benchmarking results
GALE results
Full Lexicon Tree Lexicon
CALO/Communicator results
Tree Lexicon 5 relative poorer.
Conclusion
Half a year on search is expected to give us
another 5

21
Conclusion on Search

Need to seriously consider
Is working on search a good idea?
In both CALO/GALE, gain come from
SAT and cross adaptation
Second-stage processing
Confusion network
Confidence annotation
First-stage SD -gt Second-stage SA
VTLN
also only give 5 rel
but it only takes 5 days to implement

22
Sphinx on Different Text Encodings

There are already non-CMU work for
Spanish
French
Big question mark
Could it work on other encoding?

23
Sphinx on Mandarin (gb2312)
24
Sphinx on Mandarin (cont.)

Thanks to Ravi
Bugs we fixed to get it through
1236322 libutil\str2words special character bug
1236166 special character wasn't supported
This should give us fairly good foundation to
start on most language

25
Summary of Sphinx in Fall 2005

We have done something
Strong focus in search research doesnt seem to
get us far.
Fire to fight on the modeling side
Sounds like the time to check in and move on

26
Progress of Sphinx 3.X (From X5 to X6)
27
Progress of Sphinx 3.X(From X5 to X6)

New Features (4 slides)
Items that are significant
Gentle, mild and simple re-factoring and its
consequence (4 slides)
Documentation (1 slide)
Regression testing (1 slide)
Pruned Features ?

28
New Features (Search)

Speed
Further enhancement of CIGMMS
BBI tree implementation (by Dave, in SphinxTrain)
Search
FST search
Full triphone implementation in decode_anytopo
Separation of search abstraction/implementation
in 3.X

29
New Features (Adaptation)

Adaptation
Multiple classes for MLLR (by Dave)
MAP adaptation (by Dave, in SphinxTrain)

30
New Features (Others)

New executables
lm_convert
lm3g2dmp
dp
If Evandro ask, Why do we need dp in sphinx 3?
Say this, I dont know, we found the executable
at ./s3/src/misc/dp.c
conf
Off-line word-level confidence annotation program
Mismatch dict-LM
Un-match entries could be automatically generated
(-lts_mismatch)

31
Gentle, mild and simple re-factoring (GMM
computation)

GMM computation is now shared among
decode, decode_anytopo, align, allphone
So e.g.
decode_anytopo could use fast GMM computation
decode could use SCHMM

32
Gentle, mild and simple re-factoring (Search)

Its consequence in search programming
FST, Flat, Tree search now share the same
interface (decode)
Just like Sphinx 2 and 4
Writing a new search wont be replacing a search
2-nd stage now works for decode
Alright, not for FST search

33
Gentle, mild and simple re-factoring (Others)

Scores output now rationalized
Several bug fixes causing seg faults are
eliminated
Vithist.c bugs
Class-based LM is now working correctly
Command-line among applications are now
synchronized and re-factored

34
Documentation/Tutorial

Hieroglyph
Now writing 2nd draft
Doxygen documentation
(by Evandro) Tutorial now works
archive_s3
Sphinx 2
Sphinx 3
Sphinx 4

35
Regression Testing

Our weakest link
Now daily
Standard regression test is done
Performance check on Communicator/TIDIGITs/TI46
doxygen documentation will be made and tested
make check now has 50 tests (3.5 11)
fairly robust to careless mistakes

36
Expected Trimmed Features

Search
Mode 0 alignment
(?) Mode 1 allphone
Mode 5 word tree copies
If full triphone in Ravis tree search couldnt
be quickly, trimmed it as well
(?) Yitaos PCFG rescoring

37
Conclusion of Sphinx 3.X (From X5 to X6)

We have done something
Development last year
has enriched the code
Niceify a lot of things internal to code
There are hiccups in our development
Not perfect
Well, compare this with NASDAQ.

38
DiscussionWhat should we do now?

Option 1, keep on working without release
Option 2, merge the crazy branch with the trunk
without release
Option 3, merge the crazy branch with the trunk
and create release-candidate Sphinx 3.6 RCI

39
End

Write a Comment

User Comments (0)