CS 551651:

About This Presentation

Title:

CS 551651:

Description:

In English, most burst-fricative pairs are represented as distinct phonemes (/ch ... cases of burst-fricative pairs (e.g. 'tsunami,' 'bishops' ... – PowerPoint PPT presentation

Number of Views:58

Avg rating:3.0/5.0

Slides: 27

Provided by: johnpau1

Category:

Tags: burst

more less

Transcript and Presenter's Notes

Title: CS 551651:

1
CS 551/651 Structure of Spoken Language Lecture
7 Syllable Structure, Vowel Neutralization, and
Coarticulation John-Paul Hosom Fall 2008
2

NOTE
Theres a tutorial on the web that allows you
tohear the effect of different formant values
http//www.asel.udel.edu/speech/tutorials/synthesi
s/ceevees.html
You can enter start time, end time, amplitude,
and formant
values for beginning, middle and end of a
syllable, then
generate a waveform and hear the result.

Syllables
Words are composed of phonetic clusters
syllables
Each syllable has a nucleus typically the
nucleus isa vowel or diphthong, sometimes a
syllabic nasal or lateral (button, bottle) or
retroflex (bird)
Nucleus is syllabic nasal or lateral only when
following alveolar consonant in previous
syllable of a word
Syllable boundaries sometimes ambiguous tasty
tas/ty tast/y ta/sty bottling bott/l/ing
bott/ling
Syllable can be broken into components syllable
contains onset, rhyme rhyme contains
nucleus, codaonset and coda are consonants,
rhyme is a vowel.

4
Syllables
Limitations on consonant clusters not all CCC
combinations are possible in syllable-initial
position. Of those that are possible, almost
half are very rare.
possibly only one word in English spew
only a few English words pronounced (optionally)
with /s t y/ Stewart, steward, stew
very few English words/root with /s k l/
sclerosis
very few English words with /s k y/ skew,
askew, obscure
graphic from http//www.arts.uwa.edu.au/LingWWW/LI
N101-102
5

Syllables
Sonority corresponds roughly to degree of
constrictionalong vocal and/or nasal tract
Ordering of sonority vowels, glides (/w/,
/y/), liquids (/l/, /r/), nasals, fricatives,
affricates, plosives
If a binary classification (sonorant/non-sonorant)
, then sonorant consists of all vowels, glides,
liquids, and nasals.
Fricatives, affricates, and plosives may be
clustered into onecategory, obstruents, for
purposes of sonority
Syllabification can be done according to
sonority principlethe sonority must rise and
fall in a syllable
Also, theres the Maximal Onset PrinciplePut a
consonant in the onset rather than the coda when
possible

Syllables
Because of rise and fall of sonority in
syllables, the followingrestrictions occur
(a) glide (/w/,/y/) must be immediately
adjacent to a vowel, (b) /r/ is next closest
consonant to vowel, (c) /l/ is next closest
consonant to vowel, (d) nasal is next
closest, (e) obstruent is farthest from the
vowel (but there may be more than one
obstruent in onset or coda)
Obstruents in a cluster must have same voicing
In series of obstruents between two vowels,
voicing can change only once, at the syllable
boundary.
English allows up to 3 consonants in syllable
initial position, 4 consonants at syllable final
position

Syllables
Examples sphere /s f iy r/, streak /s t r iy k/,
texts /t eh k s t s/, helms /h eh l m z/ but
not /t l iy/ or /p w iy/
The ordering of glides and liquids doesnt matter
for our purposes (applying to syllabification),
because glides and liquids can not occur
sequentially within the same syllablein English.
(However, two liquids in the same syllable
ispossible, e.g. Carl and girl, as long as
/r/ is closer toto the vowel than /l/.)
In English, most burst-fricative pairs are
represented as distinct phonemes (/ch/, /jh/),
although there are some othercases of
burst-fricative pairs (e.g. tsunami,
bishops).
Its also possible to have two or more adjacent
fricatives eleven twelfths

Vowel Neutralization
When speech is uttered very quickly (or is not
well enunciated),the formants tend to shift
toward that of a neutral vowel

(from Daniloff, p. 320)
(from van Bergem 1993 p. 8)
9

Vowel Neutralization
Target undershoot

/m ih pc ph ih
eh/
10

Vowel Neutralization

/m ih pc ph ih
eh/
Target undershoot /ih/ extracted and
concatenated from mip
11

Vowel Neutralization
However, neutralization is not always so simple
sometimesvowel formants shift away from the
neutral position,depending on their context, and
vowels tend toward slightlydifferent neutral
targets.
Neutralization is to some extent an artifact of
averagingover speakers and contexts (van Bergem
1993)

vowels from one speaker in different
phonetic contexts, and in reduced and
isolated speaking conditions
12

Coarticulation
Coarticulation is the blending of adjacent
speech sounds,
due to gradual movement of the articulators.
Coarticulation makes automatic speech recognition
andtext-to-speech synthesis difficult, but
humans use coarticulationto conserve effort
while speaking and provide robustnessduring
recognition.
There is Right-to-Left (RL) or anticipatory and
Left-to-Right (LR) or carry-over
coarticulation
Models of coarticulation and syllabification
? Locus Theory ? Modified Locus Theory
(Klatt) ? Öhmans Theory ? Kozhevnikov-Chistov
ich (KC) Theory ? Wickelgrens Theory, etc.

13
Coarticulation RL coarticulation occurs
due to high-level planning of phonetic sequences
spoon s p uw n rounding in
isolation rounding in context
more observable if neighboring sounds
not specified with respect to potentially
coarticulated feature e.g. /s/, /p/, /n/ not
specified with respect to lip rounding (from
Daniloff, pp. 323-324)
14
Coarticulation Locus Theory Locus Theory
(Delattre, Liberman, and Cooper, 1955) there
are, for each consonant, characteristic frequency
positions, or loci, at which the formant
transitions begin, or to which they may be
assumed to point. On this basis, the transitions
may be regarded simply as movements of the
formants from their respective loci to the
frequency levels appropriate for the next phone
The spectrographic patterns , which produce /d/
before /iy/, /aa/, and /ow/, show how these
transitions seem to be pointing to a F2 locus
in the vicinity of 1800 Hz. ? Each consonant
has target frequencies independent of
the neighboring vowels. ? Formants transition
from these target frequencies to the
vowel target frequencies.
15

Coarticulation Locus Theory
Locus Theory
Consonants and vowels both have targets of
articulatorpositions and therefore formant
frequency locations
Given sufficient duration of a syllable, all
phonemes reachtheir targets
The slope of the formants during a transition
from a consonantto a vowel is relatively
constant until reaching the target
If the syllable duration doesnt allow enough
time for theformants to reach their targets,
target undershoot occursand the formants
change direction before fully realizingthe
intended vowel

Coarticulation Locus Theory
Locus Theory

(From Klatt 1987, p. 753)
17

Coarticulation Modified Locus Theory
Problems with Locus Theory
A transition may have both rapid and slow
componentsrapid release of obstruction via
tongue tip, followed by slow movement of tongue
body.
Preceding vowel can influence F2 onset of a CV
transition(Öhman, 1966)
F2 may be insensitive to oral constrictions
(obstruents)if the tongue position is toward the
front of the mouth (as in /iy/)
(as reported by Fant 1973, Klatt1987)

Coarticulation Modified Locus Theory
Modified Locus Theory
Klatt hypothesized that main effects of the vowel
on thearticulation of consonants are front/back
position and liprounding
Vowels divided into three sets front round
front, round(because there are no rounded
front vowels in English,sets 1 and 2 are
mutually exclusive)
front /iy ih eh ae/
round /uw ao ow er/
front, round /uh ah aa aw/
Predicted Fonset from Ftarget for these 3 classes
(locus theory)
Achieved 95 intelligibility for CVC nonsense
syllables

Coarticulation Locus Theory
Modified Locus Theory

-front, -round front round
(From Klatt 1987, p. 754)
20
Coarticulation Öhmans Theory Öhman (1965)
found that loci of consonants is NOT
independent of neighboring vowels a
nd that for /g/ more than one locus is
required Conclusion consonant gestures are
superimposed on vowel gestures that are present
during the consonant even when consonant is
being uttered in VCV, there is effect of both V
on C.
21
Coarticulation Öhmans Theory Öhman (1966)
proposed model of coarticulation based
on vocal-tract shape evolving over time. Assumes
that vocal-tract shapes can be mapped to formant
frequencies. For VCV utterances where s(x,t)
is the vocal tract shape at position x and time
t, v(x) is the vocal tract shape at position x
for a given vowel, c(x) is the vocal tract shape
of the consonant, k(t) is an interpolation value
(from 0 to 1), and wc(x) describes the degree to
which c(x) resists coarticulation. v(x)
describes the shape of the vocal tract, which
may be a combination of two vowels if V1 ? V2.
(v(x) will vary over time from V1 to V2)
22

Coarticulation Kozhevnikov-Chistovich (KC)
Theory
Syllabification using CnV pattern CV, CCV,
CCCV,
phrase give true answers
g ih v t r uw ae n s er z
---- ----------- -- ------- -
S1 S2 S3 S4 S5
(2) Measured relative durations of words,
syllables, vowels
relative duration of vowel Dvow / Dsyll,
syllable Dsyll / Dword word Dword /
Dphrase

23
Coarticulation Kozhevnikov-Chistovich (KC)
Theory Found coarticulation within syllable but
not across syllables C1 V1 C2 C3 V2

articulatory gestures for consonant(s) and vowel
begin nearly
simultaneously with onset of initial consonant in
syllable
Example lip rounding in /uw/ begins with /v/ in
give true answers,
but nasalization of /ae/ does not occur.
assumes little or no LR coarticulation
assumes motor programming of speech is
discontinuous at VC boundary
counter-examples showing LR coarticulation
(Moll and Daniloff 1971, Kent, Carney, and
Severeid 1974, Öhman 1966)

24
Coarticulation Wickelgrens Theory Speech
units are mentally coded as context-sensitive
units in phonetic string /X Y Z/, Y is encoded
as XYZ By assuming (context-sensitive)
allophones to be the basic unit of articulation,
it is trivial to account for how the
same phoneme in different phonemic environments
can be different in some respects at all levels
of the speech process (Wickelgren 1969, p. 11)
However, coarticulation can spread over more
than one phone (up to seven phones distance).
Other criticisms MacNeilage 1970, Whitaker
1970, Halwes and Jenkins 1971 Allophonic
richness may only beget strategic poverty (Kent
and Minifie 1977) However, Wickelgrens is the
only model currently used in ASR and
concatenative text-to-speech (exceptions Wouters
2001, Wrede 2001).
25

Coarticulation Gays Theory
Gay, 1977 The syllabic unit of motor
organization is the CV unit
Based on X-ray motion pictures of VCV utterances
anticipatory tongue movements for V2 in V1CV2
sequencedont begin until closure of C has been
attained
movement toward V2 occurs during closure of C,
havinga large effect on position and shape of
tongue during releaseof closure
V1 has little effect on position of tongue at
moment ofclosure
supports KC theory conflicts with Ohmans
findings

Coarticulation
Other models MacNeilage, Henke, Benguerel and
Cowan,Moll and Daniloff, Liberman, Tatham, etc.
Some are feature based in that each phonetic
segmentis assigned distinctive features which
can then be modifiedin regular ways
Some are hierarchical models, with several
levels oforganization and complex interaction
between levels
However, coarticulatory patterns are not
explainedadequately by any theories or models
(Kent and Minifie, 1977)
Conflicting evidence (Öhman and Kent Moll vs.
KC and Gay)