Title: CS 551651:
1CS 551/651 Structure of Spoken Language Lecture
7 Syllable Structure, Vowel Neutralization, and
Coarticulation John-Paul Hosom Fall 2008
2- NOTE
-
- Theres a tutorial on the web that allows you
tohear the effect of different formant values - http//www.asel.udel.edu/speech/tutorials/synthesi
s/ceevees.html - You can enter start time, end time, amplitude,
and formant - values for beginning, middle and end of a
syllable, then - generate a waveform and hear the result.
3- Syllables
-
- Words are composed of phonetic clusters
syllables - Each syllable has a nucleus typically the
nucleus isa vowel or diphthong, sometimes a
syllabic nasal or lateral (button, bottle) or
retroflex (bird) - Nucleus is syllabic nasal or lateral only when
following alveolar consonant in previous
syllable of a word - Syllable boundaries sometimes ambiguous tasty
tas/ty tast/y ta/sty bottling bott/l/ing
bott/ling - Syllable can be broken into components syllable
contains onset, rhyme rhyme contains
nucleus, codaonset and coda are consonants,
rhyme is a vowel.
4Syllables
Limitations on consonant clusters not all CCC
combinations are possible in syllable-initial
position. Of those that are possible, almost
half are very rare.
possibly only one word in English spew
only a few English words pronounced (optionally)
with /s t y/ Stewart, steward, stew
very few English words/root with /s k l/
sclerosis
very few English words with /s k y/ skew,
askew, obscure
graphic from http//www.arts.uwa.edu.au/LingWWW/LI
N101-102
5- Syllables
-
- Sonority corresponds roughly to degree of
constrictionalong vocal and/or nasal tract - Ordering of sonority vowels, glides (/w/,
/y/), liquids (/l/, /r/), nasals, fricatives,
affricates, plosives - If a binary classification (sonorant/non-sonorant)
, then sonorant consists of all vowels, glides,
liquids, and nasals. - Fricatives, affricates, and plosives may be
clustered into onecategory, obstruents, for
purposes of sonority - Syllabification can be done according to
sonority principlethe sonority must rise and
fall in a syllable - Also, theres the Maximal Onset PrinciplePut a
consonant in the onset rather than the coda when
possible
6- Syllables
-
- Because of rise and fall of sonority in
syllables, the followingrestrictions occur
(a) glide (/w/,/y/) must be immediately
adjacent to a vowel, (b) /r/ is next closest
consonant to vowel, (c) /l/ is next closest
consonant to vowel, (d) nasal is next
closest, (e) obstruent is farthest from the
vowel (but there may be more than one
obstruent in onset or coda) - Obstruents in a cluster must have same voicing
- In series of obstruents between two vowels,
voicing can change only once, at the syllable
boundary. - English allows up to 3 consonants in syllable
initial position, 4 consonants at syllable final
position
7- Syllables
-
- Examples sphere /s f iy r/, streak /s t r iy k/,
texts /t eh k s t s/, helms /h eh l m z/ but
not /t l iy/ or /p w iy/ - The ordering of glides and liquids doesnt matter
for our purposes (applying to syllabification),
because glides and liquids can not occur
sequentially within the same syllablein English.
(However, two liquids in the same syllable
ispossible, e.g. Carl and girl, as long as
/r/ is closer toto the vowel than /l/.) - In English, most burst-fricative pairs are
represented as distinct phonemes (/ch/, /jh/),
although there are some othercases of
burst-fricative pairs (e.g. tsunami,
bishops). - Its also possible to have two or more adjacent
fricatives eleven twelfths
8- Vowel Neutralization
-
- When speech is uttered very quickly (or is not
well enunciated),the formants tend to shift
toward that of a neutral vowel
(from Daniloff, p. 320)
(from van Bergem 1993 p. 8)
9- Vowel Neutralization
-
- Target undershoot
/m ih pc ph ih
eh/
10/m ih pc ph ih
eh/
Target undershoot /ih/ extracted and
concatenated from mip
11- Vowel Neutralization
-
- However, neutralization is not always so simple
sometimesvowel formants shift away from the
neutral position,depending on their context, and
vowels tend toward slightlydifferent neutral
targets. - Neutralization is to some extent an artifact of
averagingover speakers and contexts (van Bergem
1993)
vowels from one speaker in different
phonetic contexts, and in reduced and
isolated speaking conditions
12- Coarticulation
-
- Coarticulation is the blending of adjacent
speech sounds, - due to gradual movement of the articulators.
- Coarticulation makes automatic speech recognition
andtext-to-speech synthesis difficult, but
humans use coarticulationto conserve effort
while speaking and provide robustnessduring
recognition. - There is Right-to-Left (RL) or anticipatory and
Left-to-Right (LR) or carry-over
coarticulation - Models of coarticulation and syllabification
? Locus Theory ? Modified Locus Theory
(Klatt) ? Öhmans Theory ? Kozhevnikov-Chistov
ich (KC) Theory ? Wickelgrens Theory, etc.
13Coarticulation RL coarticulation occurs
due to high-level planning of phonetic sequences
spoon s p uw n rounding in
isolation rounding in context
more observable if neighboring sounds
not specified with respect to potentially
coarticulated feature e.g. /s/, /p/, /n/ not
specified with respect to lip rounding (from
Daniloff, pp. 323-324)
14Coarticulation Locus Theory Locus Theory
(Delattre, Liberman, and Cooper, 1955) there
are, for each consonant, characteristic frequency
positions, or loci, at which the formant
transitions begin, or to which they may be
assumed to point. On this basis, the transitions
may be regarded simply as movements of the
formants from their respective loci to the
frequency levels appropriate for the next phone
The spectrographic patterns , which produce /d/
before /iy/, /aa/, and /ow/, show how these
transitions seem to be pointing to a F2 locus
in the vicinity of 1800 Hz. ? Each consonant
has target frequencies independent of
the neighboring vowels. ? Formants transition
from these target frequencies to the
vowel target frequencies.
15- Coarticulation Locus Theory
-
- Locus Theory
- Consonants and vowels both have targets of
articulatorpositions and therefore formant
frequency locations - Given sufficient duration of a syllable, all
phonemes reachtheir targets - The slope of the formants during a transition
from a consonantto a vowel is relatively
constant until reaching the target - If the syllable duration doesnt allow enough
time for theformants to reach their targets,
target undershoot occursand the formants
change direction before fully realizingthe
intended vowel
16- Coarticulation Locus Theory
-
- Locus Theory
(From Klatt 1987, p. 753)
17- Coarticulation Modified Locus Theory
-
- Problems with Locus Theory
- A transition may have both rapid and slow
componentsrapid release of obstruction via
tongue tip, followed by slow movement of tongue
body. - Preceding vowel can influence F2 onset of a CV
transition(Öhman, 1966) - F2 may be insensitive to oral constrictions
(obstruents)if the tongue position is toward the
front of the mouth (as in /iy/) - (as reported by Fant 1973, Klatt1987)
18- Coarticulation Modified Locus Theory
-
- Modified Locus Theory
- Klatt hypothesized that main effects of the vowel
on thearticulation of consonants are front/back
position and liprounding - Vowels divided into three sets front round
front, round(because there are no rounded
front vowels in English,sets 1 and 2 are
mutually exclusive) - front /iy ih eh ae/
- round /uw ao ow er/
- front, round /uh ah aa aw/
- Predicted Fonset from Ftarget for these 3 classes
(locus theory) - Achieved 95 intelligibility for CVC nonsense
syllables
19- Coarticulation Locus Theory
-
- Modified Locus Theory
-front, -round front round
(From Klatt 1987, p. 754)
20Coarticulation Öhmans Theory Öhman (1965)
found that loci of consonants is NOT
independent of neighboring vowels a
nd that for /g/ more than one locus is
required Conclusion consonant gestures are
superimposed on vowel gestures that are present
during the consonant even when consonant is
being uttered in VCV, there is effect of both V
on C.
21Coarticulation Öhmans Theory Öhman (1966)
proposed model of coarticulation based
on vocal-tract shape evolving over time. Assumes
that vocal-tract shapes can be mapped to formant
frequencies. For VCV utterances where s(x,t)
is the vocal tract shape at position x and time
t, v(x) is the vocal tract shape at position x
for a given vowel, c(x) is the vocal tract shape
of the consonant, k(t) is an interpolation value
(from 0 to 1), and wc(x) describes the degree to
which c(x) resists coarticulation. v(x)
describes the shape of the vocal tract, which
may be a combination of two vowels if V1 ? V2.
(v(x) will vary over time from V1 to V2)
22- Coarticulation Kozhevnikov-Chistovich (KC)
Theory -
- Syllabification using CnV pattern CV, CCV,
CCCV, - phrase give true answers
- g ih v t r uw ae n s er z
- ---- ----------- -- ------- -
- S1 S2 S3 S4 S5
- (2) Measured relative durations of words,
syllables, vowels - relative duration of vowel Dvow / Dsyll,
syllable Dsyll / Dword word Dword /
Dphrase
23Coarticulation Kozhevnikov-Chistovich (KC)
Theory Found coarticulation within syllable but
not across syllables C1 V1 C2 C3 V2
- articulatory gestures for consonant(s) and vowel
begin nearly - simultaneously with onset of initial consonant in
syllable - Example lip rounding in /uw/ begins with /v/ in
give true answers, - but nasalization of /ae/ does not occur.
- assumes little or no LR coarticulation
- assumes motor programming of speech is
discontinuous at VC boundary - counter-examples showing LR coarticulation
(Moll and Daniloff 1971, Kent, Carney, and
Severeid 1974, Öhman 1966)
24Coarticulation Wickelgrens Theory Speech
units are mentally coded as context-sensitive
units in phonetic string /X Y Z/, Y is encoded
as XYZ By assuming (context-sensitive)
allophones to be the basic unit of articulation,
it is trivial to account for how the
same phoneme in different phonemic environments
can be different in some respects at all levels
of the speech process (Wickelgren 1969, p. 11)
However, coarticulation can spread over more
than one phone (up to seven phones distance).
Other criticisms MacNeilage 1970, Whitaker
1970, Halwes and Jenkins 1971 Allophonic
richness may only beget strategic poverty (Kent
and Minifie 1977) However, Wickelgrens is the
only model currently used in ASR and
concatenative text-to-speech (exceptions Wouters
2001, Wrede 2001).
25- Coarticulation Gays Theory
-
- Gay, 1977 The syllabic unit of motor
organization is the CV unit - Based on X-ray motion pictures of VCV utterances
- anticipatory tongue movements for V2 in V1CV2
sequencedont begin until closure of C has been
attained - movement toward V2 occurs during closure of C,
havinga large effect on position and shape of
tongue during releaseof closure - V1 has little effect on position of tongue at
moment ofclosure - supports KC theory conflicts with Ohmans
findings
26- Coarticulation
-
- Other models MacNeilage, Henke, Benguerel and
Cowan,Moll and Daniloff, Liberman, Tatham, etc. - Some are feature based in that each phonetic
segmentis assigned distinctive features which
can then be modifiedin regular ways - Some are hierarchical models, with several
levels oforganization and complex interaction
between levels - However, coarticulatory patterns are not
explainedadequately by any theories or models
(Kent and Minifie, 1977) - Conflicting evidence (Öhman and Kent Moll vs.
KC and Gay)