Title: Covariation and weighting of harmonically decomposed streams for ASR
1Covariation and weighting of harmonically
decomposed streams for ASR
- Introduction
- Pitch-scaled harmonic filter
- Recognition experiments
- Results
- Conclusion
Production of /z/
aperiodic
periodic
2Motivation and aims
- Most speech sounds are either voiced or unvoiced,
which have very different properties - voiced quasi-periodic signal from phonation
- unvoiced aperiodic signal from turbulence noise
- Do these properties allow humans to recognize
speech in noise? - Maybe, we can use this information to help ASR...
- by computing separate features for the two parts.
- Are their two contributions complementary?
INTRODUCTION
http//www.ee.surrey.ac.uk/Personal/P.Jackson/Colu
mbo/
3Voiced and unvoiced parts of a speech signal
aperiodic contribution
periodic contribution
INTRODUCTION
http//www.ee.surrey.ac.uk/Personal/P.Jackson/Colu
mbo/
4Pitch-scaled harmonic filter
s(n)
time shifting
. . .
PSHF
PSHF
PSHF
aperiodic waveform
periodic waveform
METHOD
http//www.ee.surrey.ac.uk/Personal/P.Jackson/Colu
mbo/
5Decomposition example (waveforms)
Original
Periodic
Aperiodic
METHOD
http//www.ee.surrey.ac.uk/Personal/P.Jackson/Colu
mbo/
6Decomposition ex. (spectrograms)
Original
Periodic
Aperiodic
METHOD
http//www.ee.surrey.ac.uk/Personal/P.Jackson/Colu
mbo/
7Decomposition ex. (MFCC specs.)
Original
Periodic
Aperiodic
METHOD
http//www.ee.surrey.ac.uk/Personal/P.Jackson/Colu
mbo/
8Speech database Aurora 2.0
- From TIdigits database of connected English digit
strings (male female speakers), filtered with
G.712 at 8 kHz.
TRAIN
TEST
METHOD
http//www.ee.surrey.ac.uk/Personal/P.Jackson/Colu
mbo/
9Description of the experiments
- Baseline experiment base
- standard parameterisation of the original
waveforms (i.e., MFCC,?,??) - PCA experiments pca26, pca78, pca13 and pca39
- decorrelation of the feature vectors, and
reduction of the number of coefficients - Split experiments split, split1
- adjustment of stream weights (periodic vs.
aperiodic) - Caveat pitch values were derived from clean
speech files, for entire database!
METHOD
http//www.ee.surrey.ac.uk/Personal/P.Jackson/Colu
mbo/
10Parameterisations
METHOD
http//www.ee.surrey.ac.uk/Personal/P.Jackson/Colu
mbo/
11Full-sized PCA results
RESULTS
http//www.ee.surrey.ac.uk/Personal/P.Jackson/Colu
mbo/
12Variance of Principal Components
PCA26
PCA39
clean
multi
RESULTS
http//www.ee.surrey.ac.uk/Personal/P.Jackson/Colu
mbo/
13PCA26 experiments results
CLEAN
MULTI
14Summary of best PCA results
RESULTS
http//www.ee.surrey.ac.uk/Personal/P.Jackson/Colu
mbo/
15Split experiments results
16Sample Split results
Note same value of stream weights used in
training as in testing, for Split.
RESULTS
http//www.ee.surrey.ac.uk/Personal/P.Jackson/Colu
mbo/
17Split1 experiments results
18Summary of PCA Split results
RESULTS
http//www.ee.surrey.ac.uk/Personal/P.Jackson/Colu
mbo/
19Conclusions
- PSHF module split Auroras speech waveforms into
two synchronous streams (periodic and aperiodic) - large improvements over the single-stream
Baseline - Split was better than all PCA combinations
- PCA26/13 better than PCA 78/39, and PCA13 best
- Split1 marginally better than Split
- Periodic speech segments give robustness to noise.
- Further work
- Modeling how best to combine the streams?
- LVCSR evaluate front end on TIMIT (phone
recognition). - Robust pitch tracking
CONCLUSION
http//www.ee.surrey.ac.uk/Personal/P.Jackson/Colu
mbo/
20COLUMBO PROJECT Harmonic decomposition applied
to ASR
Philip J.B. Jackson 1 ltp.jackson_at_surrey.ac.ukgt Dav
id M. Moreno 2 ltdavidm_at_talp.upc.esgt Javier
Hernando 2 ltjavier_at_talp.upc.esgt Martin J.
Russell 3 ltm.j.russell_at_bham.ac.ukgt
http//www.ee.surrey.ac.uk/Personal/P.Jackson/Colu
mbo/
1
2
3