Title: Dynamics of Gestures: Temporal Patterning
1Dynamics of Gestures Temporal Patterning
Work supported by NIH grant DC-03663
- Elliot Saltzman
- Boston University
- Haskins Laboratories
2Colleagues
- Dani Byrd
- University of Southern California, USA
- Louis Goldstein
- Yale University Haskins Laboratories, USA
- Hosung Nam
- Yale University Haskins Laboratories, USA
3Question What is being learned when we learn a
skilled behavior?
- Answer The dynamical system, or coordinative
structure, that shapes functional, coordinated
activity defined across animal and environment - But what is a dynamical system?
- Roughly, it is a system of interacting variables
whose change over time are shaped by laws or
rules of motion - what types of variables?
- what types of rules of motion?
4System states, parameters, and graphsand their
dynamics
- Any dynamical system can be completely
characterized according to three types of
variablesstate, parameter, and graphand their
dynamics (Farmer, 1986) - State variables a systems active degrees of
freedom - defined by the number of autonomous 1st order
equations used to describe the system - Ex) position velocity of the mass in a damped
mass-spring system - Ex) activations of nodes in a connectionist
network - State dynamics the forces (velocity vector
field) defined in the space of state variables
(state space) that shapes motion patterns of the
state variables
5System states, parameters, and graphsand their
dynamics(cont.)
- System parameters
- Ex) m, b, k, and escapement strength in a limit
cycle equation - Ex) target position in a point attractor equation
- Ex) pendulum length
- Ex) inter-node synaptic connection strength in a
connectionist network - Parameter dynamics the forces/processes that
shape motion patterns of the system parameters - Ex) intentional changes in oscillation frequency
in finger-wiggling experiment - Ex) actor-environment field equation for
specifying target position in reaching - Ex) changing system eigenfrequency due to
alteration of pendulum lengths in
pendulum-swinging experiment - Ex) connectionist learning algorithms for
changing system weights to solve a given
computational task
6System states, parameters, and graphsand their
dynamics(cont.)
- System graph Architecture of the systems
equation of motion - the parameterized set of relationships defined
among a systems state variables - Ex) circuit diagram (e.g., Simulink)
representation of mbk equation of motion
- Ex) node/connection diagram in a connectionist
network
7System states, parameters, and graphsand their
dynamics(cont.)
- Graph dynamics the forces/processes that
change the system graph - state variables (i.e., system dimensionality)
- Ex) recruitment/selection/assembly of degrees of
freedom appropriate for task in a particular
actor-environment context - e.g., recruitment of trunk leaning or body
twisting for reaching, depending on distance to
target - interconnection/linkage structure defined across
state variables - Ex) learning/discovering appropriate interlimb
oscillator coupling functions to perform bimanual
mn rhythms - Ex) constructivist connectionist learning
algorithms that add/delete nodes and/or
connections to implement grammar appropriate
for learning given class of functions
8Outline of Remaining Presentation
- Part 1 Overview and review of task-dynamic model
of speech production - Four types of timing phenomena Intragestural,
transgestural, intergestural, and global - Hybrid dynamical model Task dynamics recurrent
connectionist network - Part 2 Focus on system graphs and intergestural
timing/phasing in speech production - Influence of system graph on patterns of relative
timing between vowels and consonants in syllables - Competitive, coupled oscillator model of syllable
structure - task-dynamic model of intergestural phasing
(Saltzman Byrd, 2000)
9Outline of Remaining Presentation (cont.)
- Part 3 State and/or parameter dynamics and
transgestural timing - Phrasal boundary effects on local speaking rate
- Prosodic gestures (p-gestures) induce local
slowings of central clock - Part 4 Intragestural timing Gestural
anticipation intervals - Self-organization of gestural onsets given
required times of target attainment - Constrained temporal elasticity of anticipation
intervals
10Part 1 Overview and Review
- General Theoretical Question
- How can we characterize the dynamics that
underlie the temporal coordination among the
units (gestures) of speech?
11Dynamics Defined
- Dynamics
- Laws or rules that specify the forces that
change a systems variables (system state) from
one moment to the next
12Speech Gestures
- Equivalence classes of goal-directed actions by
different sets of articulators in the vocal tract - examples
- /p/, /b/, /m/Upper lip, lower lip, and jaw work
together to close the lips. - /a/, /o/Tongue body and jaw work together to
position and shape the tongue dorsum (surface)
for the vowel.
13Articulatory Phonology Catherine Browman and
Louis Goldstein
- Speech can be described with a unitary structure
that captures both phonological and physical
properties. - Act of speaking can be decomposed into atomic
units, or gestures. - Units of information Linguistic primitives of
speech production - Units of action Dynamically-controlled
constriction actions of distinct vocal tract
organs (e.g., lips, tongue tip, tongue body,
velum, glottis) - Coordinated into larger molecular structures
14Four Aspects of Speech Timing
- Intragestural variations of temporal patterns of
individual gestures - Ex. Temporal asymmetry of velocity profiles
- Intergestural relative phasing among gestures
- Sequencing and partial temporal overlap
(coproduction) of vowel and consonant gestures in
the word (and syllable) /pub/ - Transgestural modulations of temporal patterns
of all active gestures during a relatively
localized portion of an utterance - Ex. Temporally localized slowing of all gestures
in neighborhood of phrasal boundaries - Global temporal pattern of entire utterance
- Ex. Overall speaking rate or style
15Overview Hybrid Dynamical Model
- Modeling dynamics of speech production a hybrid
dynamical model - 2 components
- Task-dynamic component shapes articulatory
trajectories given gestural timing information as
input. Uses tract-variable and model articulator
coordinates. - Recurrent neural network provides a dynamics of
gestural timing. Uses activation coordinates.
16Tract Variable Model Articulator Coordinates
17Gestural Activation
- A gestures dynamics influence vocal tract
activity for a discrete interval of time. - Activations wax and wane gradually at edges.
- A gestures strength is defined by its activation
level (range 0-1)
bad
time
18Gestures as Dynamical Systems
- Gestural activations are used to define
gesture-specific control dynamics in goal/task
space coordinates - point attractor dynamics of damped mass-spring
systems in the task-space - constriction space (tract variables) closing the
lips, raising the tongue tip, etc. - constriction target is approached regardless of
initial conditions or perturbations along the way
19Gestural Equation of Motion
Total gestural acceleration is the sum of the
constriction gesture and neutral gesture
acceleration components.
Constriction gesture
Neutral gesture (governs return to neutral
posture)
20Hybrid Model Three Coordinate Systems
21Hybrid Dynamical Model Overall Structure
22Part 2 Intergestural Timing, System Graphs, and
Syllable Structure
- Phenomenon Vowel and consonant gestures within
syllables show characteristic signatures of
relative timing/phasing - We hypothesized that these different patterns
were due to corresponding differences in
intergestural coupling graphs - coupling graphs were implemented in simulations
- simulations were compared with actual data
23Syllable Structure Some Definitions
- The vowel and consonant gestures in a syllable
can be partitioned in three componentsOnset,
Nucleus, Coda
24Relative Timing in Syllables
- There is an asymmetry in patterns of relative
timing displayed within syllable-initial (onset)
and syllable-final (coda) consonant clusters - C-center effect on mean values of intergestural
relative phase - c-center pattern occurs syllable-initially in
onsets but not syllable- finally in codas - Browman Goldstein (1988), Byrd (1995)
- Stability of relative phasing
- Greater stability (lower standard deviation) of
relative phasing occurs syllable initially in
onsets than syllable-finally in codas - Byrd (1996), Cho (2001)
- Both effects are hypothesized to emerge from
appropriate dynamic coordination of gestures
viewed in a oscillatory framework
25C-center Effect in Onsets, not Codas
Hypothetical Model
C-center
If add an additional coordination (C-C phasing)?
But C-V phasing is preserved as global
c-center-to-V coordination
CV and CC phasings in competition
C-C phasing separates CC in timing
C-V phasing
26Why C-center Effect in Onsets and not Codas?
- Browman Goldstein (2000)s Hypothesis
- there are different coupling structures (system
graphs) for onsets (C1,oC2,oV) and codas
(VC1,cC2,c) - there is C1,o-V coupling in onsets, but there is
no V-C2,c coordination (coupling) in codas - as a result, there is competition betweenVC and
CC phasings for onsets, but not for codas
27Proposed Coupling Graphs CCV vs. VCC
Competitive coupling structure
No V-C2 coordination No competition
28Stability of Relative Phasing
- Browman Goldstein (2000) additionally
hypothesized that - Competitive coupling structures in syllable
initial position may also help explain the
greater stability of intergestural phasing in
onsets than in codas
29Outline of Simulation Experiments
- C-center effect in CCV but not VCC?
- Greater stability (lower variability) between
consonants in CCV than VCC? - Effect of syllable boundary in heterosyllabic CC
sequences
30What do Oscillators Have to do with Speech?
- Oscillatory units have a well defined variable
representing timephase - dynamics of coupled limit cycle oscillators
allows their relative timing to emerge in a
self-organized manner due to intrinsic oscillator
dynamics and the nature of the coupling. - the best developed theories of inter-unit timing
come from work in (non-speech) rhythmic movement
31What do oscillators have to do with speech?
(cont.)
- Phase has also been adopted as a measure of
intrinsic gestural time in speech gestures
(Browman Goldstein, Kröger, et al.) - although point attractor models have been used to
model these gestures, intrinsic gestural phase
has been defined relative to an associated
abstract, underlying gestural oscillator - Previously, the coordination of gestures in terms
of their relative phase has been specified by
hand in models of word production - we have been pursuing a model of speech timing
that allows relative phasing to self-organize as
it does in oscillatory systems
32Task-dynamics of Intergestural Phasing
- We assume that rhythmic and non-rhythmic speech
behavior have a common underlying dynamical
organization - here, we attempt to reconcile work in coupled
oscillator dynamics and intergestural timing in
speech. - Saltzman Byrd (2000) implemented a task-dynamic
approach to controlling (generalized) relative
phase and (mn) frequency ratio in a single pair
of coupled nonlinear oscillators - For a pair of oscillators in 11 frequency
locking - the component oscillators must be coupled to one
another in a manner specific to the desired
relative phasing - We have generalized the Saltzman Byrd (2000)
model to implement intergestural coupling among
multiple (gt2) gestures (Nam, Saltzman,
Goldstein, 2003)
33Control of Relative Phase General Approach
- Intergestural coupling is defined in a pairwise
manner among a set of oscillators in three steps - 1stdefine set of task space potential functions,
V(y), - state-variable represents relative phase (? øi
øj) - point minimum corresponds to desired relative
phase value, y0 - 2nddefine corresponding task-space (relative
phase) dynamics - 3rdtransform these dynamics into the required
coupling forces between the component oscillators - see Saltzman Byrd (2000) for details
34Simulation Experiment 1 C-center effect in CCV
Competition
C-centers
- Target relative phase
- C1-V 50?
- C2-V 50?
- C1-C2 30?
C1
C1
V
C2
C2
- Resultant rel. phase(Final output)
- C1-V 59.94?
- C2-V 39.96?
- C1-C2 19.98?
Mean of c-centers
C1
C-center effect
V
C2
35Simulation Experiment 1 No C-center effect in
VCC
No competition
C-center
- Target relative phase
- V-C1 50?
- V-C2 none
- C1-C2 30?
C1
C1
V
C2
Mean of c-centers
- Resultant rel phase(Final output)
- V-C1 49.96?
- V-C2 79.90?
- C1-C2 29.94?
C1
No c-center effect
V
C2
36Adding noiseSimulation Experiment 2
- Source of noise
- slight differences in frequencies of oscillators
(detuning) - Noise modeled by adding a linear function to the
potential energy function - V (?) -a cos (? - ?0) b (? - ?0)
- b represents the amount of inter-oscillator
detuning, - which perturbs the location of potential
minimum - b randomly varied across simulations trials
within conditions defined by a given standard
deviation - standard deviation of b manipulated across
simulation conditions
37Results Simulation Experiment 2
- Interconsonant phasing is more variable in
syllable-final position
std. of CC phase (radian)
1.0
Onsets
Codas
std. of detuning b
.05
.65
.25
.45
.85
- Browman Goldsteins hypothesis proved correct
- Onsets in competition show greater stability
38Simulation Experiment 3 Generalizing the Model
to Hetero-Syllabic Consonant Sequences
e.g. a scab e.g. mask amp e.g. bag sab
39Results Simulation Experiment 3
- C-to-C phasing is more variable across boundaries
std. of CC phase (radian)
Onsets
1.0
Codas
X-bound
std. of detuning b
.05
.65
.25
.45
.85
- The result (VCCV lt VCCV lt VCCV) corresponds to
Byrd (1994)s findings
40Conclusion Importance of System Graph
- Dynamic structure (system graphs coupling
structure) generates observed phonetic
asymmetries of intergestural phasing (mean
patterns and their stability) - C-center effect
- mean relative phasing
- Greater temporal stability
Competitive coupling structure in onset
Consonants not directly coupled across boundaries
- Effect of boundaries
- (Greater variability)
41Future Directions Where are the Underlying
Oscillators?
- Hypothesis Underlying oscillators live at the
state-unit level of the hybrid models recurrent
network as members of an entrained oscillatory
ensemble - Question Is there a 11 association between
oscillators and gestures? - Question How are the mappings learned between
oscillators and gestural activations?
42Part 3 Transgestural Effects of Phrasal
Boundaries
- It has been shown that prosodic boundaries induce
temporally local contextual variation in ongoing
articulation - prosodic boundaries are boundaries between words
and higher order phrases in speech - Boundary effects on articulation include
- lengthening of gestural durations
- decreased overlap (coarticulation) between
adjacent gestures - spatially larger gestures in phrase-initial
positions - Boundary effects appear to be graded
- stronger boundaries induce greater lengthening
43Boundary Adjacent Slowing
- It has been shown that speech gestural durations
lengthen in the region of word and phrase
boundaries - It also appears that stronger boundaries induce
greater lengthening - Example (Byrd Saltzman 1998)
44Boundary Adjacent Slowing(Byrd Saltzman 1998)
45Boundary Adjacent Slowing(Byrd Saltzman 1998)
Speaker J
mmi
none
word
pre-boundary lip opening duration
list
vocative
post-boundary lip closing duration
Boundary Type
utterance
Speaker K
none
word
list
vocative
utterance
0
100
200
300
(ms)
46Boundary Adjacent Relative Timing
- Additionally, evidence exists suggesting that
phrase boundaries affects the relative timing
(i.e. overlap) between gestures. - Chitoran, Goldstein Byrd (to appear), Byrd
(1996), Hardcastle, (1985), Byrd, Kaun,
Narayanan, Saltzman, (2000), Jun (1993),
Keating et al. (in press)
Time between displacement extrema in CC
.
70
47Approach Prosodic (p)-gestures
- Question How can we account for the variations
of gestural timing associated with prosodic
context? - p-gestures (prosodic gestures) influence the
expression of all constriction gestures which are
concurrently active with the p-gestures - Transgestural effect
- Effect in proportion to the activation level of
the p-gesture. - p-gesture activation determined by boundary
strength.
Byrd, Kaun, Naryanan, Saltzman (2000), Byrd
(2000), Byrd Saltzman (subm)
48Two constrictions spanning a phrase boundary
49How is this Prosodic Action Effected?Parameter
Dynamics Stiffness Lowering
- Lowering of gestural stiffness values has been
hypothesized to underlie gestural lengthening
adjacent to phrasal boundaries. - Beckman et al. 1992, Byrd Saltzman 1997
- Local, transgestural on-line modulation of
gestural parameter values. - E.g. Locally lower stiffness local
slowing
50But...
- Changes in both duration and relative timing
occur at phrase boundaries. - Stiffness scaling does not account for changes in
relative timing. - modulates point-attractor parameter values, but
does not specifically influence the domain of
gestural activation.
51How is this Prosodic Action Effected?Central
Clock Slowing
- Hypothesis Prosodic effects are induced by time
slowing at the gestural control level. - slowing the timecourse of gestural activation
(Byrd Saltzman, subm) - Slowing the central clock has both intragestural
and intergestural timing consequences. -
- Related Work V.-Bateson, Hirayama, Honda,
Kawato, 1992 Bailly, Laboissière, Schwarz,
1991 ODell Nieminen, 1999 and especially,
Port Cummins, 1992, and Barbosa Bailly, 1994
52Gestural Activation
53Slowing Activation Timecourse
Stretched with time slowing
1
0.5
No time slowing
0
0
0.05
0.1
0.15
0.2
0.25
Equation for time scaling/stretching/slowing
- ? is scaled time,
- t is unscaled time whose flowrate 1, and
- a(t ), gestural activations (constriction and
p-gestures), are functions of scaled time.
54Simulation data No p-gesture
1
GESTURE 1
GESTURE 2
Activation
0.5
0
0
0.05
0.1
0.15
0.2
0.25
1
0.5
Position
0
-0.5
-1
0
0.05
0.1
0.15
0.2
0.25
gesture 2 duration
1
0.5
Velocity
0
-0.5
gesture 1 duration
-1
0
0.05
0.1
0.15
0.2
0.25
55Simulation p-gesture realized via clock slowing
Activation (faint unslowed bold slowed)
1
GESTURE 1 (phrase-final)
GESTURE 2 (phrase-initial)
0
.
5
p-gesture
0
0
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
Position (faint unslowed bold slowed)
1
0
.
5
0
-
0
.
5
-
1
0
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
56Initial Strengthening
- Initial strengthening apparently spatially
larger gestures in phrase-initial positions. - E.g., more linguapalatal contact in lingual
consonants longer linguapalatal seal durations
longer VOTs (Keating, Jun, Fougeron, Cho, Hsu,
others) more breathy hs (Pierrehumbert
Talkin, 1992) more lip rounding in rounded
vowels (van Lieshout et al., 1995) - BUT what is the articulatory foundation for these
very different types of effects?
Can we unite slowing, lesser overlap, and
strengthening in terms of articulatory
dynamicsspecifically clock slowing??
57Simulation Clock slowing withtwo (same
constriction) phrase-initial gestures
Gesture1closing (e.g. lingual
C) Gesture2opening (e.g. following
V) Gesture1 duration Gesture2 duration Time
between peak velocities Spatial strengthening
(phrase initial)
Activation (faint unslowed bold slowed)
gest 1 (consonant)
gest 2 (vowel)
1
0.5
p-gesture
0
0
0.05
0.1
0.15
0.2
0.25
Position (faint unslowed bold slowed)
2
Refererence line for plausible linguapalatal
contact
1
0
-1
-2
0
0.05
0.1
0.15
0.2
0.25
58Summary p-gestures
- Local slowing of a central clock appears to be a
plausible way to capture prosodically driven
shaping of articulatory behavior. - Unlike stiffness modulation which only affects
gestural durations, clock rate modulation
generates several experimentally observed
prosodic effects - gestural lengthening
- reduced intergestural overlap
- spatial strengthening
59Theoretical Implications of Prosodic-Gestures
- First step in conceiving a dynamical
implementation of phrasal structure. - Just like articulatory gestures, phrasal
junctures are viewed as - Having inherent durational properties
- Being temporally coordinated with other gestures
- Provides a theoretical reconciliation of what in
the past has been an inconsistency in the manner
in which prosodic structure and segmental
structure have been conceptualized in
Articulatory Phonology (Browman Goldstein, 1992
and elsewhere).
60Part 4 Anticipatory Behavior of Speech Gestures
- Question
- When does gestural motion begin relative to its
required time of target attainment in an
utterance? - Answer Controversial
- Look-ahead modelas early as possible given no
other conflicting demands - Frame modeltime-locked to the time of target
attainment
61Intragestural Effects Gestural Anticipation
Intervals
- Intragestural shaping of gestural anticipation
intervals - Self-organization of gestural onsets given
required times of target attainment - Emergent behavior from a bidirectionally coupled
set of dynamical systems - Activation dynamics (recurrent neural network)
- Primary responsibility shaping gestural
activation patterns - Acts as sequence-specific central controller
(clock, c.p.g.) - drives task-dynamic model (feedforward)
- Interarticulator coordination dynamics (task
dynamics) - Primary responsibility shaping articulator
trajectories - Ongoing state modulates recurrent controller
(feedback)
62Architecture of a Simple Hybrid Model
task-dynamic elements
sequential network elements
inter-element synapses
label delay lines
numbers symbols fixed weights assigned to some
synapses.
63Network Training Side Constraints Interval
Types
- Network training/programming.
- backprogagation-in-time distal supervised
learning - Two constraint types during training
- Task constraints specific to current task,
e.g., reach target at a specified time - Side constraints generic constraints, e.g.,
maximize smoothness, minimize effort, etc. - Two types of training interval
- Care task and side constraints
- Dont care only side constraints
- We used a side constraint that minimized gestural
activation.
64Anticipatory Behavior Effect of Side Constraints
Care
Don't care
interval
interval
Activation
level
Tract variable
position
- Left column Look-ahead behavior occurs when
side constraints are absent, and gestural onset
occurs near the beginning of the dont-care
interval, regardless of its length. - Right column Frame model behavior occurs when
side constraints are present, regardless of the
dont-care intervals length, and gestural
onsets are approximately time-locked to the
care interval.
65Constrained Temporal Elasticity in Speech
- Data on anticipatory lip-protrusion in French
speakers (e.g., Abry Lallouache, 1995) suggests
that anticipatory behavior may be neither rigidly
time-locked nor totally unconstrained. This
suggests a constrained temporal elasticity,
intermediate between these two extremes. - Abry Lallouaches Movement Expansion Model,
i.e., a gestures anticipatory interval lengthens
as the preceding dont care interval lengthens,
but only fractionally. Different speakers show
different lengthening fractions. - We generated temporally elastic behavior using
intermediate values of side-constraints.
66Constrained Elasticity in the Hybrid Network
67Constrained Elasticity Lengthening Fractions
68