Title: Tonal Speech without Pitch
1Tonal Speech without Pitch
- Jerry Zhu
- zhuxj_at_cs.cmu.edu
- 2003/7/3
2Whats in your mouth
Tony Robinson, http//mi.eng.cam.ac.uk/ajr/SA95/n
ode15.html
3MFCC
Tony Robinson, http//mi.eng.cam.ac.uk/ajr/SA95/n
ode15.html
Focus on vocal tract shape (e.g. different
vowels) No pitch
4Tonal languages
- Tone variation in pitch. e.g. Mandarin, Thai
http//kca.org/education/ImageView.asp?ImageID179
5MFCC disastrous for tones?
- MFCC should have no pitch info.
- Bad for Mandarin speech recognition?
- Not really
- why?
6Hypothesis 1
- Language context helps a lot?
- e.g. singing over-rides pitch
- people do understand the lyric (sort of)
7Hypothesis 2
- MFCC retains some pitch?
- by imperfection
- residual pitch info used by speech recognizers
- Test convert MFCC to speech, listen for tones.
(TBD)
8Hypothesis 3
- Do we really need pitch to perceive tones?
- Test whispered speech
- Can native speakers perceive tones in whispered
speech?
Tony Robinson, http//mi.eng.cam.ac.uk/ajr/SA95/n
ode15.html
9Minimum pairs
- A minimum pair two 2-char words with only 1
tonal difference. - Why not use
- one-char words to prevent over-articulating
- multi-char words hard to find min pairs.
10Listener listens for the ORDERwithin each
minimum pair
Whisperer file
Listener file
11 Experiment setup
- Each whisperer/listener group work on about 100
different minimum pairs. - In a quiet room, 1 meter apart. Each pair
whispered once. - Native speakers. (Liu J., Yu H., Zhang Y., Zhu X.)
12What to expect
- If there is no tonal info in whisper, listeners
would guess the order with 50 accuracy.
13Result
14Result significant?
- Flip a coin 3 times, 2 heads 1 tail. A biased
coin? - Chi-square test
- Accuracy significantly better than random at p lt
0.0001 (thats really significant).
15Accuracy breakdown
.
correct/total
.
16Accuracy breakdown
.
Accuracy , significant at plt0.002
.
17Summary
- People do perceive tonal differences without
pitch. - How?
- Strength (power)?
- Duration?
- Subtle vocal tract shape difference?
18While we are whispering...
- Tonal difference (weve seen that)
- Voiced / unvoiced consonant? time vs. dime
- voice onset time
http//www.indiana.edu/hlw/PhonUnits/consonants2.
html
19Voiced/unvoiced consonant
- p,b, t,d, k,g
- Mandarin speakers
- 94 accuracy
- Aspiration
20Other languages?
- Thai
- Is tonal too 5 tones.
- Has ph, p, b
- would be interesting!