Less is More - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Less is More

Description:

There is no data like more data! 3. Goal: Use less to Perform more ... CCTV. NTDTV. RFA. ALL. Random(150h) 13.6. 22.2. 44.1. 25.0. Max-entropy (word char) 12.2. 21.8 ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 23
Provided by: scie5
Category:
Tags: cctv | less | more

less

Transcript and Presenter's Notes

Title: Less is More


1
Less is More?
  • Yi Wu
  • Alex Rudnicky

2
People
  • There is no data like more data!

3
Goal Use less to Perform more
  • Identifying an informative subset from a large
    corpus for Acoustic Model (AM) training.
  • Expectation of the Selected Set
  • Good in Performance
  • Fast in Selection

4
Motivation
  • The improvement of system will become
    increasingly smaller when we keep adding data.
  • Training acoustic model is time consuming.
  • We need some guidance on what is the most needed
    data.

5
Approach Overview
  • Applied to well-transcribed data
  • Selection based on transcription
  • Choose subset that have uniform distribution on
    speech unit (word, phoneme, character)

6
How to sample data wisely?--A simple example
  • k Gaussian distribution with known prior?i and
    unknown density function fi(µi ,si)

7
How to sample wisely?--A simplified example
  • We are given access to at most N examples.
  • We have right to choose how much we want from
    each class.
  • We train the model use MLE estimator.
  • When a new sample generated, we use our model to
    determine its class.
  • Question
  • How to sample to achieve minimum error?

8
The optimal Bayes Classifier
  • If we have the exact form of fi(x), above
    classification is optimal.

9
To approximate the optimal
  • We use our MLE
  • The true error would be bounded by optimal Bayes
    error plus error bound for our worst estimated

10
Sample Uniformly
  • We want to sample each class equally.
  • The data selected will have good coverage on each
    class.
  • This will give robust estimation on each class.

11
The Real ASR system
12
Data Selection for ASR System
  • The prior has been estimated independently by
    language model.
  • To make acoustic model accurate, we want to
    sample the W uniformly.
  • We can take the unit to be phoneme, character,
    word. We want their distribution to be uniform.

13
Entropy Measure for uniformness
  • Use the entropy of the word (phoneme) as ways of
    evaluation
  • Suppose the word (phoneme) has a sample
    distribution p1, p2. pn
  • Choose subset have maximum -p1log(p1)-p2log(p2)
    -... pn log(pn))
  • Entropy actually is the KL distance from uniform
    distribution

14
Computational Issue
  • It is computational intractable to find the
    transcription set that maximizes the entropy
  • Forward Greedy Search

15
Combination
  • There are multiple entropies we want to maximize.
  • Combination Method
  • Weighted Sum
  • Add sequentially

16
Experiment Setup
  • System Sphinx III
  • Feature 39 dimension MFCC
  • Training Corpus Chinese BN 97(30hr)
    GaleY1(810hr data)
  • Test Set RT04(60 min)

17
Experiment 1 ( use word distribution)
Table 1
18
More Result
19
Experiment 2 (add sequentially with phoneme and
character 150hr)
Table 2
20
Experiment 1,2
21
Experiment 3 (with VTLN)
Table 3
22
Summary
  • Choose data uniformly according to speech unit
  • Maximize entropy using greedy algorithm
  • Add data sequentially

Future Work
  • Combine Multiple Sources
  • Select Un-transcribed Data
Write a Comment
User Comments (0)
About PowerShow.com