AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY - PowerPoint PPT Presentation

About This Presentation
Title:

AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY

Description:

Herengracht 338, 1016 CG Amsterdam, The Netherlands. tel: 31 20 5252183; ... Automatically labeled using a pronunciation lexicon and a modified HMM recognizer ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 19
Provided by: ifa64
Category:

less

Transcript and Presenter's Notes

Title: AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY


1
AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY
  • R.J.J.H. van Son, Barbertje M. Streefkerk, and
  • Louis C.W. Pols

Institute of Phonetic Sciences / ACLC University
of Amsterdam, Herengracht 338, 1016 CG Amsterdam,
The Netherlandstel 31 20 5252183 fax 31 20
5252197 email Rob.van.Son_at_hum.uva.nl ICSLP2000,
Beijing, China, Oct. 20, 2000
2
INTRODUCTION
  • Speech is "efficient" Important components are
    emphasized
  • Less important ones are de-emphasized
  • Two mechanisms
  • 1) Prosody Lexical Stress and Sentence Accent
    (Prominence)
  • 2) Predictability Frequency of Occurrence
    (tested) and
  • Context (not tested)

3
MECHANISMS FOR EFFICIENT SPEECH
  • Speech emphasis should mirror importance
  • which largely corresponds to unpredictability
  • Prosodic structure distributes emphasis according
    to importance (lexical stress, sentence accent /
    prominence)
  • Speakers can (de-)emphasize according to supposed
    (un)importance
  • Speech production mechanisms can facilitate
    redundant speech or hamper unpredictable speech

4
QUESTIONS
  • Can the distribution of emphasis or reduction be
    completely explained from Prosody? (Lexical
    stress
  • and Sentence Accent / Prominence)
  • If not, can we identify a speech production
    mechanism that would assist efficiency in speech?
  • e.g. preprogrammed articulation of redundant and
    / or high-frequent syllable-like segments?

5
SPEECH MATERIAL (DUTCH)
  • Single Male Speaker Vowels and Consonants
    Matched Informal and Read speech, 791 matched VCV
    pairs
  • Polyphone Vowels only 273 speakers (out of
    5000), telephone speech, 1244 read sentences
    Segmented with a modified HMM recognizer (Xue
    Wang)
  • Corpora sizes Number of realizations of vowels
    and consonants

Unstressed Stressed Total
Corpus ? Accent ?
Single consonants 550 180 569
283 1582 Speaker vowels 812 461 528
224 2025 Polyphone vowels
4435 4942 9603 3516 22496
  • Accent Sentence accent / Prominence
  • Stressed/Unstressed Lexical stress

6
METHODS SPEECH PREPARATION
  • Single speaker corpus
  • All 2 x 791 VCV segments hand-labeled
  • Also sentence accent determined by hand
  • 22 Native listeners identified consonants from
    this corpus
  • Polyphone corpus
  • Automatically labeled using a pronunciation
    lexicon and a modified HMM recognizer
  • 10 Judges marked prominent words (prominence
    1-10)
  • Word and Syllable -log2(Frequencies) for both
    corpora were determined from Dutch CELEX

7
METHODS ANALYSISSingle Speaker
CorpusConsonants and Vowels
  • Duration in ms (vowels and consonants)
  • Contrast (vowels only) F1 / F2 distance to (300,
    1450) Hz in semitones
  • Spectral Center of Gravity (CoG) (V and
    C)Weighted mean frequency in semitones at point
    of maximum energy
  • Log2(Perplexity) from consonant identification
    Calculated from confusion matrices

8
METHODS ANALYSISPolyphone Corpus Vowels only
  • Loudness
  • in sone
  • Spectral Center of Gravity (CoG)
  • Weighted mean frequency in semitones averaged
    over the segment
  • Prominence (1-10)The number of 'PROMINENT'
    listener judgements0 5 is considered
    Unaccented6 10 is considered Accented

9
CONSISTENCY OF MEASUREMENTS Correlation
coefficients between factors

G
Single Speaker
E
S
A
2
C
Polyphone
Filled symbols Plt0.01
  • Duration in ms Loudness in sones
  • CoG Spectral Center of Gravity (semitones)
  • Px log2(Perplexity) plotted is R
  • Contrast F1/ F2 distance to (300, 1450) Hz
    (semitones)

10
CONSONANT REDUCTION VERSUS FREQUENCY OF
OCCURRENCE (correlation coefficients)
Single speaker corpus (n1582)
G
E
A
Filled symbols Plt0.01
  • CoG Spectral Center of Gravity (semitones)
  • Perplexity log2(Perplexity), plotted is R.
  • Syllable and word frequencies were correlated
    (R0.230, p0.01)

11
VOWEL REDUCTION VERSUS FREQUENCY OF
OCCURRENCE (correlation coefficients)
Single speaker corpus (n2025)
Filled symbols Plt0.01
  • Duration in ms
  • Contrast F1/ F2 distance to (300, 1450) Hz
    (semitones)
  • CoG Spectral Center of Gravity (semitones)
  • Syllable and word frequencies were correlated
    (R0.280, plt0.01)

12
DISCUSSION OF SINGLE SPEAKER DATA
  • There are consistent correlations between
    frequency of occurrence and acoustic reduction
    (duration, CoG and contrast), but not for
    consonant identification (perplexity)
  • Correlations for syllable frequencies tend to be
    larger than those for word frequencies (p?0.01)
  • Correlations were found after accounting for
    Phoneme identity, Lexical Stress and Sentence
    Accent

13
PROMINENCE VERSUS VOWEL REDUCTION AND FREQUENCY
OF OCCURRENCE (correlation coefficients)
Polyphone corpus (n22496)
G
Loudness
E
CoG
C
Syllable freq.
A
Word freq.
Filled plt0.01
Filled symbols Plt0.01
  • Loudness (sone)
  • CoG Spectral Center of Gravity (semitones)
  • Syllable and word frequencies (-log2(freq))

14
VOWEL REDUCTION VERSUS FREQUENCY OF
OCCURRENCE (correlation coefficients)
Polyphone corpus (n22496)
Filled symbols Plt0.01
Accent Prom gt 5 Prom lt 5
  • Loudness (sone)
  • CoG Spectral Center of Gravity (semitones)
  • Syllable and word frequencies were correlated
    (R0.316, plt0.01)

15
DISCUSSION OF POLYPHONE DATA
  • Perceived prominence correlates with acoustic
    vowel reduction (loudness, CoG) and frequency of
    occurrence (syllable and word)
  • There are small but consistent correlations
    between acoustic vowel reduction and frequency
    of occurrence
  • Correlations were found after accounting for
    Vowel identity, Lexical Stress and Prominence

16
CONCLUSIONS
  • LEXICAL STRESS and
  • SENTENCE ACCENT / PROMINENCE cannot explain all
    of the efficiency of speech FREQUENCY OF
    OCCURRENCE and possibly CONTEXT in general are
    needed for a full account
  • A SYLLABARY which speeds up (and reduces) the
    articulation of stored, high-frequency,
    syllables with respect to computed, rare,
    syllables might explain at least part of our data

17
SPOKEN LANGUAGE CORPUSHow Efficient is Speech
  • 8-10 speakers 60 minutes of speech each
    (fixed and variable materials)
  • Informal story telling and retold stories 15
    min
  • Reading continuous texts 15 min
  • Reading Isolated (Pseudo-) sentences 20 min
  • Word lists 5 min
  • Syllable lists 5 min

18
MEASURINGSPEECH EFFICIENCY
  • Speaking Style differences
  • (Informal, Retold, Read, Sentences, Lists)
  • Predictability
  • Frequency of Occurrence (words and syllables)
  • In Context (language models)
  • Cloze-tests
  • Shadowing (RT or delay)
  • Acoustic Reduction
  • Segment identification
  • Duration
  • Spectral reduction
Write a Comment
User Comments (0)
About PowerShow.com