Speech production - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Speech production

Description:

Air pressure from lungs builds up behind closed vocal cords ... intonation changes meaning 'He's gone' or 'He's gone?' emphasis (aka. stress) ... – PowerPoint PPT presentation

Number of Views:109
Avg rating:3.0/5.0
Slides: 31
Provided by: philipj5
Category:

less

Transcript and Presenter's Notes

Title: Speech production


1
Speech production phonetics
EEM.ssr Speaker Speech Recognition
  • by
  • Dr Philip Jackson

lecturer in speech audio Centre for Vision,
Speech Signal Processing, Department of
Electronic Engineering.
http//www.ee.surrey.ac.uk/Teaching/Courses/eem.ss
r
2
Speech production
3
The human vocal tract
4
Speech production 1
  • Air pressure from lungs builds up behind closed
    vocal cords
  • Vocal cords are repeatedly forced apart and
    pulled together again, producing a series of
    small pulses of air
  • Air in vocal tract vibrates quasi-periodically
  • Rate of vibration of the vocal folds determines
    fundamental frequency, f0, which contributes to
    the perceived pitch of the voice

5
Speech production 2
  • The vocal tract forms a resonator with a complex
    shape, which admits some harmonics of the
    fundamental, while suppressing others
  • A region of high-amplitude harmonics is called a
    formant
  • The first two formants, F1 and F2, are important
    for vowel discrimination
  • Speech sounds are classified according to the way
    the sound is generated and the position of the
    vocal apparatus

6
Anatomy of the vocal system
7
Articulators
  • Jaw
  • Lips
  • Tongue tip
  • Tongue blade
  • Velum
  • Larynx

mid-sagittal MRI for vowel /i/
8
Anatomy of the larynx
9
Voicing/phonation
10
Articulatory trajectories
from West (2000)
11
Kuda Mimi pada?
  • Acoustic waveform
  • Y-coord tongue dorsum
  • Y-coord lower lip
  • Lips and tongue

www.humnet.ucla.edu
12
Dynamic imaging
from Mohammad (1999)
13
Phonetics
14
The sounds of language
  • International Phonetic Association (IPA)
  • By manner
  • Origin of air stream, and inward/outward
  • Whether vocal folds are vibrating (i.e.,
    voiced/unvoiced)
  • Whether the velum is raised/lowered
  • By place
  • Which part of the vocal tract is involved, i.e.,
    the place of articulation
  • Shape of the lips (rounded/spread)

15
Manner of articulation 1
  • Plosive (aka. stop)
  • Produced by the abrupt release of a constriction
    somewhere along the vocal tract
  • English has plosives at the following places
  • Labial (lips) /p, b/
  • Alveolar (palatal ridge) /t, d/
  • Velar (soft palate) /k, g/
  • Characterised by a short period of silence, a
    burst and then turbulence noise after release
  • Timing of voice onset after release distinguishes
    voiced/unvoiced cognates
  • Characteristics depend on context

16
Manner of articulation 2
  • Trill repeated vibration of one articulator
    against another
  • Tap (aka. flap) with a single touch
  • Nasal
  • Oral cavity is occluded, and velum is lowered, so
    air flows out through the nose
  • Characterised by vowel-like structure, but weaker
    energy
  • Identity cued by transitions from surrounding
    sounds
  • English has only voiced nasals
  • Labial /m/, alveolar /n/, velar /N/

17
Manner of articulation 3
  • Fricative
  • Airstream is forced through a constriction,
    causing turbulence
  • Characterised by non-periodic (noisy) sound
  • Frequency cut-off is inversely proportional to
    the length of the cavity in front of the
    constriction
  • English has voiced and unvoiced fricatives
  • Labio-dental /f, v/
  • Inter-dental /T, D/
  • Alveolar /s, z/
  • Palatal-alveolar /S, Z/
  • Glottal /h/

18
Consonants
19
Vowel space
F2
HIGH
2.0
1.8
1.6
1.4
1.2
1.0
0.8
0.6
0.4
kHz
BACK
heed
FRONT
0.3
whod
0.4
hid
hood
F1
heard
0.5
hoard
head
0.6
hod
hut
0.7
hard
had
LOW
20
Vowels
21
English phonemes
22
Phonemes (ideal)
  • minimal unit of speech discrimination between
    words
  • language (even dialect) dependent
  • defined in terms of their distribution
  • described in terms of manner and place of
    production
  • useful in ASR because they allow access to
    standard on-line dictionaries
  • English has approximately 44 phonemes
  • but, they are ideal, not real objects

23
Phones (real)
  • Speech is inherently variable.
  • Generic variation
  • rate of speaking, loudness, context
  • Inter-speaker variation
  • physical, age, gender, dialect
  • Intra-speaker variation
  • health, mood, external factors

24
Phonetics
  • Linguistic units have no boundaries
  • Features are asynchronous, e.g.,
  • nasality
  • lip-rounding (effect beyond syllable possible)
  • Cues to identity may be in surrounding sounds
  • plosives and nasals are cued by transitions
  • pod vs. pot, cued by the length of the
    preceding vowel

25
Speech is not acoustic text
26
Contextual effects
  • Co-articulation affects how each sound is
    realized in its context
  • // realized differently in cab and cat
  • // in can may be nasalized
  • /r/ sound in train different to that in arrow
  • /p/ may be different in each of pin, spin,
    and apt
  • can be may be assimilated, to cam be
  • Allophones, variants of a phoneme
  • differences are caused by context, and are not
    contrastive (milk vs. leak)

27
Fluent speech
  • Reduction - target positions may not be reached
  • vowels tend to be neutralised (centralised)
  • consonants may not be fully articulated
  • Elision - sounds get missed out in normal fluent
    speech
  • fish and chips to fish n chips
  • temporary to tempry
  • bread and butter to brem budder
  • Epenthesis - sounds may be inserted
  • law and order to Laura Norder
  • something to sumpfing

28
Beyond the phoneme
  • homophones, or homonyms
  • to, too, two
  • hear, here
  • glasses, for seeing or drinking
  • ambiguity of segmentation
  • grey tape or great ape
  • how to wreck a nice beach
  • intonation changes meaning
  • Hes gone or Hes gone?
  • emphasis (aka. stress)
  • the cat sat on the mat

29
External factors
  • Noise - Lombard effect
  • Vibration
  • vibrations in the chest, oral and nasal cavities
    interfere with speech production
  • Fatigue
  • speaking rate decreases, slurring occurs due to
    loss of control
  • Fear
  • speaking rate increases, pitch rises due to
    muscle tightening
  • Cognitive loading
  • interaction with other tasks, stress

30
Summary of speech sounds
  • The speech signal
  • varies from person to person, and occasion to
    occasion
  • is not broken up into convenient units
  • is altered by external physical factors
  • Speech sounds
  • are inherently confusable (i.e., articulated in
    largely similar ways)
  • may be inserted or missing altogether
  • change with context, which is required to be able
    to make an unambiguous interpretation
Write a Comment
User Comments (0)
About PowerShow.com