CALO Decoder Progress Report for March - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

CALO Decoder Progress Report for March

Description:

Miscellaneous efforts in improving the decoder. Contact with other groups, web page(s) ... VocalSound Description='whistling'/ /Segment XML tags conversion ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 15
Provided by: Arthu61
Category:

less

Transcript and Presenter's Notes

Title: CALO Decoder Progress Report for March


1
CALO Decoder Progress Report for March
  • Arthur (Decoder and ICSI Training)
  • Jahanzeb (Decoder)
  • Ziad (ICSI Training)
  • Moss (ICSI Training)
  • Carnegie Mellon University
  • Apr 13, 2004

2
This Presentation
  • Progress report for March
  • In February
  • Batch mode recognizer completed
  • Live-mode recognizer didnt work
  • In March
  • More decoder work
  • Speed, Accuracy, Interface.
  • ICSI transcription conversion task
  • Resources, Conversion Scripts
  • Miscellaneous efforts in improving the decoder
  • Contact with other groups, web page(s), manual.

3
Decoder work (Speed)
  • By Arthur and Jahanzeb
  • Sphinx 3.4 starts to work reasonably in
    Communicator task
  • 1G 1.1xRT, 2G 0.48xRT
  • Phoneme look-ahead research completed
  • 15-20 gain when CIGMMS applied
  • Will incorporate as a functionality
  • Outlook of April
  • Machine Optimization (Still there!)
  • WSJ evaluation
  • Technical report version of the results
    publishing.

4
Decoder work (Accuracy)
  • First comparison between s2 and s3.4
  • S3.0 S2 gt S3.3 gt S3.4
  • Not the fairest comparison
  • S3 model is trained by female speakers only
  • S3 model is less tuned
  • Outlook of April
  • Learn how to do training. Do a fairer comparison.
  • Change search structure.

5
Decoder work (Interface)
  • Live-mode decoder works
  • Live-mode recognizer interface is still poorer
    than S2
  • No config file yet.
  • Many users complained (Well, actually 2-3 of
    them)
  • Outlook of April
  • Focus on building better API-interface and
    command-line interface.
  • Jahanzeb will be there while Arthur is working on
    training.

6
ICSI Training
  • Transcription Conversion Task
  • By Moss, Ziad and Arthur
  • Completion of Resource
  • ltVocalSoundgt mapping (100)
  • ltNonVocalSoundgt mapping (100)
  • OOV (20)
  • Conversion script (90)

7
ICSI Transcription How does it look like?
  • ltSegment StartTime"41.311" EndTime"43.773"
    Participant"me013" DigitTask"true"gt
  • three six two four three zero seven
    ltComment Description"Digits"/gt
  • lt/Segmentgt
  • ltSegment StartTime"0.931" EndTime"3.611"
    Participant"me034"gt
  • ltVocalSound Description"whistling"/gt
  • lt/Segmentgt

8
XML tags conversion
  • Transcription is more detail than necessary.
  • Current Treatment
  • ltCommentgt Ignore whole sentence. Too many
    occurrences, too many varieties..
  • ltEmphasisgt Ignore.
  • ltPronouncegt Replace by GARBAGE
  • ltForeigngt Ignore whole sentence. Too few
    occurrence. Dont want to care
  • ltUncertaingt Replace by GARBAGE
  • ltVocalSoundgt ltNonVocalSoundgt Use mapping.

9
Plain-text Normalization
  • After XML Conversion
  • I I am no- , I mean C-zero
  • - can mean
  • - Interruption/Interjection marks
  • -XXX or XXX- Broken words
  • XXX-XXX hyphenated words
  • AM transcription
  • Get rid all pronunciations and leave broken words
    alone
  • LM transcription
  • Interruption marks and broken words will be
    removed
  • (Optional) Leave interruption marks there.

10
XML conversion script
  • Functionalities
  • Optional conversion
  • Resource (dict/mapping/rules) read-in
  • XML parser
  • Generate both transcription and control file for
    close-talking microphones
  • Generate both LM and AM transcription
  • TODO
  • Incorporate Ziads script
  • Correct timing information
  • Generation of far-field channels
  • Fix small bugs.

11
Outlook of ICSI training task in April
  • Complete OOVs transcription (Arthur, Moss and
    Ziad)
  • Fix bugs in conversion script (Arthur
  • Learn AM training (Ziad and Arthur)
  • LM training (Moss)
  • Fix potential problems in SphinxTrain.

12
Miscellaneous (Contact with other group)
  • Want to seek a better interface for Sphinx
  • Try to contact other groups to see whats up
  • XVoice-sphinx,
  • command-and-control application that tried to
    use Sphinx.
  • Actually it does dictation.
  • Not very happy with Sphinx after Sphinxs default
    AM and LM in command-and-control
  • OSSRI
  • No clear goal yet
  • Start to gather funding.
  • Dont really like Sphinx because Sphinx is
    poorer than ViaVoice in CC

13
We need to help them more
  • We need better
  • Release (to replace s3.3)
  • After WSJ evaluation, S3.4 will officially
    released to replace the current S3.3
  • Sphinx web page (also CMU web page)
  • Sphinxs web page need to have a more unified
    theme.
  • Task force will be gathered after ICSLP 2004.
  • Manual
  • Need to provide basic education to developers and
    hard-core hackers.
  • wrote the first outline of the manual.
  • 1st draft will appear in a quarter time-frame.

14
Summary
  • Still need to build good model for ICSI first.
    (Arthur/Ziad/Moss)
  • Training is also critical to understand why s2gt
    s3.3.
  • Better everything for the decoder
  • Arthur/Jahanzeb -gt 50/50
  • Others always on my priority queue, will pop
    up at the right time.
Write a Comment
User Comments (0)
About PowerShow.com