Title: The Whisper Effect
1The Whisper Effect
11752 - Spring2004 - FrankLin
2Shhhhh
Why do people whisper? People whisper when
theyre telling others a scandalous
secret People whisper when theyre ask to speak
softly as not to disturb others (remember your
elementary school librarian?) People whisper
when theyre too weak to speak normally People
whisper when(can you think of other
reasons?) It seems that whispering is the most
effective and efficient vocal communication when
it is better that only people within very short
range of the speaker should hear the speech.
3Shhhhh
- So exactly what is whispering, or, whisper
speech? - Is it just a softer, a less intense version of
regular speech? - Why is it harder to understand whisper speech,
even when it is spoken right next to your ear? - Would it be easier or more difficult to build a
speech recognizer for whisper speech? - Can different voices be recognized in whisper
speech? - How is a word stressed, or emphasized, in
whisper speech? - In an attempt to answer these questions, we have
4The Experiment
- Four subjects are asked to
- Speak 10 medium length sentences (5 to 12 words)
as naturally and as clearly as possible. - Repeat the 10 sentences again, but this time in
whisper speech. - The first 9 sentences covers each of the phonemes
in American English at least once, and the 10th
sentence is repeated three times (both in regular
speech and whisper speech), and each time a
different word is stressed.
5The Subjects
- This experiment was made possible by four dear
volunteers. They are - A Young female native English speaker from the
Midwest - A Young male native English speaker from eastern
Canada - A Young male native English speaker from the
Southeast - A Young male native English speaker from Texas
- These four subjects should give us a good idea of
the differences between regular and whisper
speech in North American English. - Without further delay, let us look at the results
6General Appearance of the Spectrograms
Spectrogram of the phrase stole my house in
regular speech
7General Appearance of the Spectrograms
Spectrogram of the phrase stole my house in
whisper speech
8General Appearance of the Spectrograms
At first glance, the spectrograms of whisper
speech looks like a string of fricative
noises. It is definitely much less intense than
regular speech, which would explain why it takes
less energy to whisper than to speak. Now lets
take a closer look at what happens to each type
of phonemes when we whisper, starting with vowels
9Vowels Whats all that hissing noise?
I lied a lot on Saturday in whisper speech
10Vowels Whats all that hissing noise?
Chang is not a China man in whisper speech
11Vowels Whats all that hissing noise?
Regular stole my house again, but this time
notice the HH
12Vowels Whats all that hissing noise?
Whisper stole my house again, can you tell
where HH starts and stops?
13Vowels Whats all that hissing noise?
A closer look at the vowels shows us something
interesting They all look like HHs! We all
know that HH is a very transparent phoneme, it
does not warp the vowels around it. Actually,
vowels seem to pass through HH because we can
make out the formants. Now it seems like all the
whisper vowels are just HHs with different
vowels passing through. Can you guess what would
the word is sound like in whisper speech? Did
you notice something peculiar with the formants?
14Vowels Whats all that hissing noise?
The boy will eat oat, pit, or soot
but only in small doses.
15Vowels Whats all that hissing noise?
A second look shows us that low f1 on vowels seem
to disappear entirely, which is also an attribute
of HHs. Fortunately, we can guess a low f1 on a
whisper spectrogram from the lack of it, and f2
and f3 are good enough indicators of labial,
velar, and dental phonemes. But how about
voicing? Isnt f1 going down usually an indicator
of voicing? Lets look at the voicing for
16Fricatives and Stops Why we dont say bzzzd
The fish thief stole my house
17Fricatives and Stops Why we dont say bzzzd
Can I pay tickets with tacos and pork?
18Fricatives and Stops Why we dont say bzzzd
The whisper fricatives and stops seems to be
relatively easy to spot in the spectrogram, just
as in regular speech. Now lets take a look at
the voiced fricatives and stops
19Fricatives and Stops Why we dont say bzzzd
The very vexed zebra in regular speech
20Fricatives and Stops Why we dont say bzzzd
The very vexed zebra in whisper speech
21Fricatives and Stops Why we dont say bzzzd
Beat the good dog, boy! in whisper speech
22Fricatives and Stops Why we dont say bzzzd
What happened? The voiced fricatives and stops
look just like their unvoiced counterparts! It
seems that theyve lost their voicing! So how do
we hear things like dog and zebra? It is
because we rely on high-level knowledge. If we
play just the phoneme of the whispered voiced
consonant by itself, we can hear that the
unvoiced version is actually pronounced!
23Fricatives and Stops Why we dont say bzzzd
Fricatives and stops are relatively easy to spot
in a whisper spectrogram but they can be
confusing, which is exactly the opposite of
24Nasals Barely there
Chang is not a China man
25Nasals Barely there
It seems nasals follow suit with the other
phonemesno voice bars and no low f1 formants.
Additionally nasals seem so faint that they
almost look like pauses. However, we can see from
the spectrogram that it isnt difficult to
identify which nasal it is we can see the
formants going up for N, going down for M, and
velar pinch for NG. What about liquids and
glides? They actually behave pretty well in
whisper speech identifying them is usually
easier.
26Liquids and Glides
Look, you wet your red leather boots!
27Try this at home!
- Now that we have gone through the different types
of phonemes, we can compile our results - Vowels resemble HHs
- Voiced fricatives and stops lose their voicing
- Nasals become faint but can be differentiated
- Liquids and glides do not change much
- Much high level knowledge is required to
recognize whisper speech - We can do a little test to demonstrate this
28F0 and Pitch
What sort of f0 and pitch does whisper speech
have? (Can you guess?) First, we can try using
the Emu Labeler do the pitch analysis for us
29F0 and Pitch
Pitch analysis for Somebody set up us the bomb!
(stress on us)
30F0 and Pitch
It seems that Emu Labeler has failed us (not too
surprisingly). But thats alright we can still
do it ourselves. Lets make the broadband
spectrograms into narrowband spectrograms
31F0 and Pitch
Somebody set up us the bomb! (stress on us)
Bandwidth70
32F0 and Pitch
Somebody set up us the bomb! (stress on us)
Bandwidth40
33F0 and Pitch
Somebody set up us the bomb! (stress on us)
Bandwidth20
34F0 and Pitch
As we make the bandwidth smaller and smaller, we
realize that we cannot make out the f0. But since
pitch is so important in stressing and
emphasizing parts of speech, how is stressing and
emphasizing done in whisper speech?
35F0 and Pitch
Somebody set up us the bomb! (stress on
somebody)
36F0 and Pitch
Somebody set up us the bomb! (stress on us)
37F0 and Pitch
Somebody set up us the bomb! (stress on bomb)
38F0 and Pitch
As you may have expected, because of the lack of
the ability to change the pitch, speakers uses
the other two methodsmore energy and longer
durationto emphasize something they want to
stress in whisper speech. Try sing in whispercan
you do it?
39One Last Thought Variability in Whisper Speech
One thing we notice throughout the experiment is
that many characteristics of regular speech are
lost in whisper speech. On the other hand, some
variability factors such as age, regional accent,
and emotion may also be reduced to some extent in
whisper speech.
40One Last Thought Variability in Whisper Speech
Which speaker whispered the sentence at the
bottom?
Speaker A
Speaker B
Chang is not a China man. in whisper
They treasured the very vexed zebra. in whisper
41One Last Thought Variability in Whisper Speech
Now can you tell?
Speaker A
Speaker B
Chang is not a China man. in regular speech
They treasured the very vexed zebra. in regular
speech
42One Last Thought Variability in Whisper Speech
It seems that whisper speech forces the speech to
lose some of its variability. What can you guess
anything about the speaker from the this speech?
(sex, age, nationality, region, the person?)
The fish thief stole my house. in whisper speech
The fish thief stole my house. in regular speech
43Conclusion
- Whisper speech introduces more ambiguity into
speech, therefore the recognition of whisper
speech requires much high level knowledge.
- There is no detectable pitch dynamics in whisper
speech.
- Whisper speech seem to reduce some variability in
speech.
44Conclusion
Would we ever need automatic speech recognition
for whisper speech?
For use in quiet places (library) For people with
speech difficulty (throat cancer) Can you think
of others? (secret agent watch?)
Would it be more difficult than automatic speech
recognition for regular speech?
More ambiguity Need more high-level language
modeling Less variability?
45The End
11752 - Spring2004 - FrankLin