Title: Models of phonetic variation and selection
1Models of phonetic variation and selection
- Bjorn Lindblom
- Perilus XI, 1990, pp. 65-100
2How Do Phonetic Systems Evolve?
- Three Accounts
- (with comments)
- I am grateful to Bjorn Lindblom who sent me a
manuscript on this subject (Linblom, B. (2007)
Draft of Evolution of Phonology. In P. Hogan
(Ed.), The Cambridge Encyclopedia of the Language
Sciences). The opinions, additions, and
oversights are my own.
3Account 1 (Just-So)The Big Brain Story(M.
Donald)
- Evolution of human cognition allowed
vocal/auditory communication - Supposedly adaptive
- Omnidirectional
- Can be done in the dark
- Does not impede movement
- Epistemological Issue Ex post facto stories
cant be falsified
4Account 2 (Just-So)The Vocal Tract Story(P.
Lieberman)
- Descent of larynx reduction in jaw size
- a. Adaptive advantages
- Full range of vowels (chimps make a single vowel)
- Non-nasal sounds
- b. Offset disadvantages
- Choking risk (newborns can breathe and eat
simultaneously) - Risk of impacted teeth and less efficient chewing
- Epistemological Issue Ex post facto stories
cant be falsified
5Account 3 (Existence Proof)Challenge of the
Phonetic Inventory and its Implications
- Models the mismatch between
- Sounds that vocal tract CAN produce
- Sounds that vocal tract DOES produce
- As a tug-of-war between perceptual ease and
articulatory cost - Epistemological Issue Suppose we ran this model
on a computer. What is the truth-value of an
existence proof?
6What is this Challenge?
- UPSID database contains information on 317
languages - Average consonant inventory of 20-25/language
- 500-600 distinct segment types
- Sounds like a lot?
- UCLA Phonological Segment Inventory Database (as
of 1990)
7Suppose
- We are constructing languages
- Our language is to have 25 consonants
- We have an inventory of 500 consonants to choose
from? - How many distinct languages (not counting vowels)
can we produce?
8A Combination Problem
- Classic Formulation
- We have 7 people
- How many committees of 3 can we form
- Particular phrasing 7 Objects, 3 at a time
- General phrasing N Objects, R at a time
- General Formula N!/((N-R)! R!)
- Particular Formula 7!/((7-3)! 3!) 35
9In the Case of Languages
- 500!/((500-25)! 25!) 1.043912884 1042
10How Big is This Number?A First Pass
- Suppose
- We have a computer that generates descriptions of
1,000,000 phonetic systems per second - It runs round the clock for a year
- Then
- Our computer generates about 1014 phonetic
systems in a year
11How Big is This Number?A Second Pass
- Let P be the number of possible systems 1042
- Let R be the number of systems generated in year
on our computer 1028 - Let U be the age of the universe 13.7 X 109
years - Let F be the time required to generate the full
inventory of languages on our computer - Let N be the number of universe ages required to
generate the full inventory - F P/R 1042/1014 1028 year using our
computer - 1028 years needed using our computer
- N F/U 1028/(13.7 X 109) 1017
- 100 quadrillion times the age our universe
12How Big is This Number?A Third Pass
- Suppose our consonant inventories were placed in
a great hat - The probability of putting your hand in the hat
and pulling out one of the 317 consonant
inventories in the UPSID database is - 317/( 1.043912884 1042) .303 X 10-39
- .000000000000000000000000000000000000000303
- By Comparison
- Winning Hot Lotto on 9/22/07 6 10 21 27 32 15
- Probability of picking this is 1/1012
- .000000000001
13Implications
- The uniformity of the languages is striking
- Speech makes extremely fastidious use of the
phonotory and articulatory dimensions in
principle available (p. 67). - There must be selection criteria involved in the
evolution of phonetic systems
14Plausible Assumptions
- Tug-of-war between
- Articulatory simplicity (speech production)
- Perceptual distinctiveness (speech perception)
15Speech Production Articulatory Simplicity
- How to make i
- Jaw raised
- Tongue forms a palatal constriction
- Suppose we prevent the jaw from raising (by means
of a 20 mm bite block) - Tongue must compensate to produce i
16Question and Answer
- If speakers can make i in two ways
- Why is i universally produced as a high vowel
(p. 68) - Why does the normal position involve a raised
jaw (p. 68) - Answer extreme articulations are avoided in
speech (p. 68)
17Vowel Reduction
- I said Will not Bill
- Robert will do it
- /i/ is part of a stressed syllable in Will
- /i/ is part of an unstressed syllable in will and
is shorter - /i/ in Will is pronounced I
- /i/ in will approaches U in color (p. 69)
18The Undershoot Hypothesis
- As vowels in a CVC sequence get shorter, there is
less time for articulators to move from the first
to the second consonant. - The articulators (for both consonants and the
vowel) undershoot their targets. - The reason is mechanical duration-dependent
undershoot (p. 69)
19And Can Be Made Precise
- Assumption articulators behave like damped
springs with fixed time constants - Consider /bab/
- Force applied to raise the jaw for /b/
- Force applied to lower the jaw for /a/
- Force applied to close the jaw for /b/
- If these forces dont change but are applied
closer in time, the jaw gesture will exhibit
more and more undershoot with respect to the
opening for the vowel (p. 70).
20How to Avoid Undershoot
- Let
- w work in joules
- f force in newtons
- d distance in mm
- p power in watts
- t time in seconds
- w f d and p w/t (f d)/t
- Where undershoot occurs, d is reduced
- To eliminate undershoot, requires an increase in
force over the same time interval. This results
in an increase in work over time.
21Argument for Adaptive Benefit
- Undershoot is a type of coarticulation
- Ubiquitous in speech
- Why?
- The Motor Theory of Speech (Lieberman, again)
- without it, each phoneme becomes a syllable
- talkers could speak only as fast as they could
spell (p. 70) - Imagine spending your social life in a chat room.
The adaptive benefits seem not great (to make up
a just so story of my own)
22Articulatory SimplicitySome Generalizations
- Articulatory configurations that
- Represent increased displacement from rest
positions - Require movements with greater velocities
- Become more rare
- (Example di requires less displacement and
velocity than the retroflex ?i)
23Articulatory SimplicityConclusion
- Articulators can be modeled as springs
- On-line speech production appears to operate as
if physiological processes were governed by a
power constraint limiting energy expenditure per
unit time (p. 72) - In a felicitous phrase
- Speech production prefers the physiological
pianissimo (p. 72)
24The Other Side of the CoinSpeech Perception
- Premise
- Speakers are extremely good at adapting their
pronunciations to the varying demands of the
speaking situation (p. 74) - Implication
- Phonetic gestures are an adaptive and
malleable means to more global communicative
ends (p. 74)
25Redundancy
- All languages exhibit structural redundancy
- Consequently words and phonemes of individual
utterances show short-term variations in
predictability. (p. 74)
26Consider Minimal Pair Triads
- Sa mere sest fait beaucoup de soucis
- Sa mere sest fait beaucoup de soucis
- Sa mere se fait beaucoup de soucis
- French speakers have no problems
- Swedes without French perform poorly
- When sest/se is presented in isolation, Swedes
improve dramatically relative to the French
27Implication
- Speech perception is a product of
- Signal-driven information and
- Signal-independent information
- Knowledge of French is signal-independent
information - Speech signals are perceptually adequate
- when they match the listeners signal-independent
info - They need not be acoustically invariant, only
perceptually sufficiently contrastive. (p.75).
28Modeling the Evolution of Phonetic Systems Vowels
- Observations
- Languages favor high-low contrasts
- (over front-back, rounded-unrounded)
- A survey of 209 languages indicate a consistent
preference for peripheral vowels
29Vowel Hypothesis
- Drawn from typological data and three previous
attempts at modeling - A preference for front-back contrasts originates
in the interaction - Between a need for maximal perceptual contrast
the dispersion principle (p.77) - And the idiosyncratic shape of phonetic space for
vowels
30Idiosyncratic Space
- When both acoustic and motor dimensions are
considered, the vowel space - Offers more room for high-low than front-back or
rounding gestures
31Modeling the Evolution of Phonetic Systems
Consonants
- Inventory size is a predictor of how consonants
pattern - Lindblom divides consonants into three categories
- Basic (e.g., alveolar stop)
- Elaborated (e.g., click)
- Complex (e.g. breathy voiced palatal click)
- Where elaborated are departures from basic, and
complex are combinations of elaborated.
32Patterning
- If we graph the number of systems using each type
of articulation as a function of consonant
inventory size, we find that - Basic Articulations used in small systems
- Basic Elaborated used in larger systems
- Basic Elaborated Complex used in largest
systems
33Size (of inventory) Principle
- The larger the system inventory, the more complex
the articulatory elaborations needed to retain
contrast
34Vowels Pattern Similarly
- Lending support not to Maximal Contrast (as
suggested earlier) - But to Sufficient Contrast
- As the system grows in size, the use of vowels
drawn from the corners of the chart increase in
frequency. - Slightly more central or neutral articulations
appear to be permitted in the small systems. (p.
86)
35Conclusion
- Both vowel and consonant inventories show
evidence of having been molded by similar
processes. (p. 87) - Articulatory simplification
- Perceptual distinctiveness
36A Quantitative Model of the Observations
- A model consists of
- A state space (i.e., the domain of possible
sounds) - A set of constraints (i.e., why take this path
rather than another through the space) - Criteria for finding optimal solutions (i.e., how
do we know one candidate solution is better than
another)
37Define a Vowel Articulation Space
- Begin with the variables
- Labial width and height
- Mandible position
- Tongue blade elevation and front-back position
- Tongue body front-back position
- Larynx position
- The set of all values for all variables defines a
possible vowel articulation space - Clearly, this is hypothetical. Were we to
implement this, we would need some criteria to
select particular sounds from the infinite number
of sounds in the articulation space. - Suppose we do this.
- From Lindblom, B. (1986) Phonetic universals in
vowel systems. In J.J. Ohala J. Jaeger (eds.),
Experimental Phonology. Orlando Academic Press.
Pages 13-44.
38Define a Vowel Perception Space 1
- Determine F1 and F2 for each value of the
articulatory space. Call this the acoustic space. - Map these frequencies onto the mel scale
- Mel Scale
- Scale of pitches judged to be equally distant
from one another (e.g, the difference between 500
mels and 1000 mels is the distance between 1500
mels and 2000 mels). -
- Now, for each possible vowel, N, for example, we
have two mel parameters, M1 and M2, corresponding
to F1 and F2.
39Define a Vowel Perception Space 2
- Store these M1 and M2 values in an N X N matrix
- Define Lij, perceptual distance, between any two
vowels, i and j, as follows - Lij (M1i M2j)2 (M2i M2j)21/2
- (notice that this is just the Pythagorean Theorem)
40Define A Vowel Perception Space 3
- We want to eliminate the diagonal and half of the
cells. - So, the
- Num of cells N(N-1)/2
- Apply distance formula to the remaining cells and
fill them in.
41Constructing a Vowel System
- We have N potential vowels in our system
- We want to construct all possible K vowel systems
such that K lt N - Technique
- Compute the intervowel distance across each
possible K vowel system - Choose each K vowel system that maximizes the
distance
42Reciprocals and Squares
- Recall the distance formula stored in each of our
N (N 1)/2 cells - Lij (M1i M2j)2 (M2i M2j)21/2
- Equivalent to
- Lij2 (M1i M2j)2 (M2i M2j)2
- Maximizing Lij2 has the same effect as minimizing
1/Lij2
43SupposeN 4K 3How many candidate systems do
we have?4!/((4-3)!3!) 4 a,e,i,a,e,o,e,i,
o,a,i,oThe best one is the one whose
perceptual distance among all members is maximized
44Minimize the sum of the reciprocal of the
distances across the entire system, where i is
the row counter and j is the column counter
Lindbloms Listener Formula
45Some Omitted Details
- Lindbloms formula will sum the perceptual
distance among each of the uncolored cells, 6 in
total. - But is this what we want?
- For the a,e,i set we want to sum the distance
from a-e, a-e, e-i and so on for the other 3
sets. - So, for each candidate vowel set we have to sum
three values and choose the minimum.
46An Alternative Minimization Procedure
- vowelSet computeOptimumVowelSet(k, n)
-
- minDistance ? 0
- minVowelSet ?
- NumVowelSets ? combination of n vowels taken k
at a time -
- Do NumVowlSets times
-
- vowelSet ? generate a set of vowels not yet
generated - sum ? perceptual distance between summed values
of each - pair in the set
- if (sum lt minDistance)
-
- minDistance ? sum
- minVowlSet vowelSet
-
-
47Whats A Reader To Do
- If Im wrong (and I probably am)
- Continue with Lindbloms Formula
- If Im right, Lindbloms formulas can be easily
adjusted. The adjustments dont change his
argument
48Articulatory Cost
- Recall the Listener Formula
Where Lij is the perceptual distance between
vowels i and j Experimentally determine another
variable, Tij, that represents the articulatory
cost of the pair vowels i and j
49Minimize the Ratio
- Of articulatory cost
- To perceptual distinctivenes
- Over each vowel pair in the candidate k vowel
system. Call this the Talker Formula
In English, this balances costs of production
against the benefits of perception
50Social Factors
- A huge literature supports the idea that
perceptual, articulatory and social factors
interact in shaping phonetic structure (p. 89) - Introduce a social factor, Sij
- Where, 0 lt Sij lt 1
- Giving
- Notice
- If sij 1, the formula is identical to the
talker-oriented criteria - If sij 0, all k vowel vowel sets will be
equally attractive - If sij is in the open interval (0 .. 1), it
becomes an adjustable parameter
51Major Problem
- We have developed a set of models that would be
very hard to test (p. 89). - Turns out that social factors are much less
important than articulatory and perceptual factors
52How Do We Know?
- Lindblom
- Divided all consonants in the UPSID database into
obstruents and sonorants - Divided the languages represented into 13
language groups - Almost all language groups had a roughly 70/30
percent ratio of obstruents to sonorants.
53Take Home Point
- This stability indicates that socio-cultural
factors are not powerful enough to drastically
overrule motoric and communicative constraints.
(p. 93)
54Conclusion
- Has been known since Fants work in the early
sixties that the space of possible speech sounds
has significant disjunctions - Theory of Adaptive Dispersal
- Functional selective pressure resulted in the
phonemic inventories of the worlds languages - At the same time
- Maximize perceptual differences
- Minimize articulatory costs