Acquiring and implementing phonetic knowledge - PowerPoint PPT Presentation

About This Presentation

Title:

Acquiring and implementing phonetic knowledge

Description:

Louis C.W. Pols Institute of Phonetic Sciences (IFA) http://www.fon.hum.uva.nl/ Amsterdam Center for Language and Communication (ACLC) / LOT Faculty of Humanities ... – PowerPoint PPT presentation

Number of Views:216

Avg rating:3.0/5.0

Slides: 32

Provided by: Louis128

Category:

more less

Transcript and Presenter's Notes

Title: Acquiring and implementing phonetic knowledge

1
Acquiring and implementing phonetic knowledge

Louis C.W. Pols
Institute of Phonetic Sciences (IFA)
http//www.fon.hum.uva.nl/
Amsterdam Center for Language and
Communication (ACLC) / LOT
Faculty of Humanities, University of Amsterdam
Herengracht 338, Amsterdam, The Netherlands

Eurospeech 2001 - Scandinavia Aalborg, Sept. 3,
2001, Keynote
2
why so excited?

speech speech research are beautiful

doing, supervising, talking, publishing is fun
speech community is wonderful
ISCA, former ESCA, is the best

Paul Dalsgaard c.s., Aalborg, Denmark, and
Eurospeech 2001-Scandinavia unique

so..
what better could happen to me than getting this
ISCA medal, here and now in the year that I
became 60! and 75 years chair Phonetics in Adam
3
outline

phonetic knowledge
acquiring and implementing that knowledge
30 years ago, 7th ICA in Budapest, Sept. 1971
nothing compared to G. Fant K. Stevens (ESCA
medallists) who can easily talk about half a
century of experience in speech research!
speech production and speech perception
supervising some 25 Ph.D. projects
speech acquisition (L1 and L2)
speech technology
speech databases
what might future bring us?

4
acquiring and implementing phonetic knowledge

from speech production and speech perception
via speech analysis
via experimental procedures
via data mining in speech databases
via literature
formalizing and generalizing knowledge
applying knowledge via rules, statistical
procedures, proper selections, etc.

5
phonetic knowledge isindispensable for

language acquisition (both L1 and L2)
education and training
aids for the handicapped
speech technology (analysis, coding, synthesis,
recognition, dialogs, translation, spotting)

but, see Eurospeech Special Event 7, Friday,
900-1230 Integration of Phonetic Knowledge in
Speech Technology a) Experiments and
Experiences, Presentations b) Is Phonetic
Knowledge any Use? Panel Discussion
6
7th ICABudapest

17th ICA now in Rome, Italy (Sept. 2-7)
every 3 years first one in 1951 in Delft, Neth.
7th ICA in Budapest, Hungary (Sept. 1971), plus
subsequent Speech Symposium in Szeged
my first active participation in a major
(speech) conference
substantial international participation on speech
proper view of state-of-the-art 30 years ago

7
state-of-the-art 30 years ago (1)

speech perception
Kasuya effect of context on vowel perception
Rao plosive - vowel interaction
Kozhevnikov perception of AM vowel-like stimuli
Chistovich vowel discrimination, plus keynote on
importance of psycho-acoustics for speech
perception
followed by Symposium on Auditory Analysis and
Perception of Speech, Leningrad, Aug. 1973
speech production
Fujimura dynamic palatography, electromyography,
and Tokyo x-ray microbeam system

8
state-of-the-art 30 years ago (2)

speech processing
Velichko dynamic programming
Atal initial ideas about predictive coding
speech synthesis (no rule synthesis, no diphones)
Liljencrants Fant OVE III formant synthesizer
Coker articulatory synthesis
Mermelstein and Atal Vocal Tract transfer
functions
Rabiner digital formant synthesizer we
were away a year ago, may we all learn a yellow
lion roar
Denes word concatenation
Itakura digital filters of ladder form for
synthesis

9
state-of-the-art 30 years ago (3)

speech recognition (only template matching,
simple time normalization, no probabilistic
approach)
isolated word recognition (some 50 words)
Erman over telephone carefully spoken by one
Neely in noise male speaker Ken Stevens
Pols dimensional representation of BF spectra
Rao diad matching
Bonner DAWID-II system
Sakoe dynamic processing for time normalization
Dreyfus-Graf artificial language to simplify
recogn.
Flanagan keynote on focal points in sp. comm.
res.

10
state-of-the-art 30 years ago (4)

musical acoustics
Sundberg real time pitch extraction in folk
music
Mathews music synthesis
psycho-acoustics
Houtgast psychophysical evidence for lateral
inhibition
Evans Wilson neurophysiological evidence
Julesz critical bands in vision and audition
de Boer reverse-correlation method

11
speech production and perception

three representative events
Speech Recognition As pAttern Classification
(SPRAAC), MPI-workshop July 11-13, 2001
van Son Pols Phoneme recognition as a
function of task and context
Moore Cutler Constraints on theories of human
vs.. machine recognition of speech
MIT Symposium on Invariance and variability of
speech processes, Cambridge, Oct. 1983
Symposium on Auditory analysis and perception of
speech, Leningrad, Aug. 1973

12
supervising some 25 Ph.D. projects

ideas and productivity via these students
Dutch habit good-looking booklet of each thesis
plus reports at conf., workshops, and in open
lit.
in 3 main fields of research
early speech acquisition (normal/pathological)
speech production and perception
(normal/pathological)
speech technology
joint responsibility for several projects
daily supervision by Florien Koopmans- van Beinum
with colleague promotores

13
(No Transcript)
14
Univ. of Amsterdam Sept. 26, 2001
15
coarticulatory effects on the schwaD. van Bergem
(1995)

stylized F2-tracks with second order polynomials
F2-track of the schwa via model prediction

t-n
w-l
16
Gradual Learning Algorithm (GLA)P. Boersma (1998)
Boersma Hayes, Linguistic Inquiries 32(1),
2001, 45-86
17
(No Transcript)
18
speech signal processing package praat

mainly developed and maintained by P. Boersma
meanwhile gt4000 registered users in 85 countries
freely available upon request (http//www.fon.hum.
uva.nl/praat/)
for all common platforms Macintosh, Windows,
Linux, SGI, Solaris, HP-UX
user friendly, excellent graphical output,
scriptable
see demo at Educational Arena (Thu. afternoon)
praat doing phonetics by computer
a.o. used for transcriptions in Spoken Dutch
Corpus

19
phonetic knowledge andearly speech acquisition

source filter description system (FvB-JvdSt)
early indicators for dyslexia (C. Schwippert)
early hearing screening with babies
but, early detection requires early intervention
optimizing digital hearing aids
objective adaptation of hearing aids for babies
cochlear implants, also for young babies

20
early speech development
vB, Cl, vdD, Developmental Sc. 4(1), 2001, 61-70
see poster, sess.C26
21
phonetic knowledge and speech technology (1)

speech technology barely existed 30 years ago
ideal test bed for all acquired speech knowledge
speech synthesis
fully natural synthetic speech ( including
multilingual and in various speaking styles) ?
text interpretation and speech generation problem
solved
even better if optimized for noisy and
reverberant conditions and for non-natives and
elderly people
speech understanding
full performance ? speaker adaptation, robust
word recognition, and speech understanding
problem solved

22
predicting prominence

Ph.D. project Barbertje Streefkerk (oral, sess.
B32)
acoustical and/or textual features to predict
prom.
(for ASR and rule synthesis purposes,
respectively)
prominence judgment by listeners at word level
textual feat. POS (11 categ.), syll, word pos.,
co-occ.
rule set to predict prom. (level 0-4) for
results see paper
acoustical features (7) additional (5)
F0 median range, syll. word median sent.
duration vowel, syllable Vnorm. sent. rate
intensity vowel Vnorm. sentence
neural net predictor 82 best score (prom. 0 /
1)

23
phonetic knowledge and speech technology (2)

speech technological needs for handicapped
artificial voice for laryngectomized speakers
better digital hearing aid for hearing impaired
better cochlear implant for deaf
natural speech output for visually impaired
training aids for speech and language impaired
speech technology in education and training

24
phonetic knowledge in speech databases

speech databases potentially are a wealth of
phonetic knowledge
requires annotation (manually or automatic) at
various levels (from segmental to prosodic
linguistic)
requires SQL-type access intelligent data
mining
new ways of defining knowledge, e.g.
duration modeling
pronunciation variants
concatenative synthesis (best match)

25
2 examples

Spoken Dutch Corpus
Dutch-Flemish project, start June 1998, 5 years
10M words 1000 hrs of speech, many
styles/speakers
for all 10M orthography, lemmas, POS
for 1M phonetic and syntactic annotation
for 250k prosodic annotation
IFA corpus (Dutch), R. van Son (poster, sess.
D36)
few speakers (4 M and 4 F), but gt30 min./speaker
various speaking styles per speaker, and
all material phonemically segmented and labeled
free access via SQL query language

26
Spoken Dutch Corpus

W. Levelt (chairman Board), J.P. Martens (overall
coordinator), Nijmegen Univ. (Dutch coordination)
so far, mainly project-internal results, e.g.
optimizing transcription protocols, e.g.
orthographic (using praat)
phonetic doe ik du-w-Ik is zes Is_sEs
determining consistency and efficiency (costs)
optimizing automatic procedures for
POS-tagging lemmatization
syntactic annotation (semi-automatic)
grapheme-to-phoneme conversion
word alignment

27
IFA corpus Consonant duration
Intervocalic Nasals, Fricatives, Stops, and
Glides in Spontaneous and Read connected speech
(2 or more syllable words) accounting for the
effects of speaker (8), style, and phoneme
identity word freq. lt 1/4000 (CELEX) words not
at sentence boundary
spont. str. (202, 295, 20)
spont. unstr. (96, 810, 94)
read str. (715, 837, 75)
read unstr. (285, 2586, 317) I M F
28
some conclusions (1)

let speech speak for itself (speech databases)
25 Ph.D students can do much more than one
(administratively overloaded) senior
despite skepticism much progress in last 30 yrs
over 10,000 active in spoken language community
?700 papers at E01 gt all speech papers in 1971
JASA speech 2nd (14.4) in 1999 (Nlt700) 6th in
1970 (5.1)
joint phonetic knowledge is insufficient to solve
todays communicative demands

?
29
some conclusions (2)

speech is most natural form of communication,
however, natural HC dialog is far away
synthetic speech is intelligible, but no proper
control over naturalness and speaker/style char.
ASR requires greater robustness and quicker
adaptation
speech and language technology could be used more
in education, language training and aids for the
handicapped
much basic knowledge about sp. perc. still missing

30
some intriguing questions