Informative Dialect Identification - PowerPoint PPT Presentation

About This Presentation
Title:

Informative Dialect Identification

Description:

Dialects, Accents, and Languages. 3. MIT Lincoln Laboratory. Language Recognizer or L1 detector? ... Accent classification (Angkititrakul, Hansen 2006) Language ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 66
Provided by: peopleC
Category:

less

Transcript and Presenter's Notes

Title: Informative Dialect Identification


1
Informative Dialect Identification
  • Nancy Chen
  • Oct. 31, 2008

2
Dialects, Accents, and Languages
3
Language Recognizer or L1 detector?
Language Recognizer
Indian English
Hindi
4
Automatic Speech Recognizers
I only understand English. You are speaking a
foreign language.
Indian English
5
Traditional Automatic Recognizers
speech
18, 53,
  • Big black box
  • Input features not intuitive
  • Not F0, F1, F2
  • Thousands of Gaussians, each with 40 dimensions
  • Efficiently process lots of data
  • Hard to interpret models and results
  • Training data 100 hrs

Traditional automatic recognizers
6
Linguistic Studies
Spread the peanut butter
  • Few speakers
  • 20-30 at most
  • Perceptual analysis takes much time and effort
  • Phonological rules

Linguistic studies
7
American English Speaker
Spread the peanut butter
  • Voiceless stop consonants are unaspirated when
    preceded by fricatives
  • p in spread sounds more like b
  • Intervocalic /t/ flapped when followed by
    unstressed syllable
  • t in butter does not produce intra-oral pressure

Linguistic studies
8
Indian English Speaker
I cant spread the peanut butter with Harr
  • Voiceless stop consonants are always unaspirated
  • /p/, /t/, /k/ sound like /b/, /d/, /g/
  • Inter-dental fricatives become stop-like
  • the sounds like de
  • Alveolar consonants /t/, /d/, /n/ are retroflex
  • /w/ ? /v/
  • British English influence
  • Rhoticity gone when vowel /r/
  • /ae/ ? /a/ e.g., bath, cant

Linguistic studies
9
Goal
Spread the peanut butter
speech
18, 53,
Traditional automatic recognizers
Linguistic studies
Informative dialect identification
ter ?dxer
speech
American English
10
Potential Applications
  • Forensic phonetics
  • Speaker recognition and characterization
  • Automated speech recognition and synthesis
  • Accent training education
  • Articulatory and phonological disorder diagnosis

11
Challenges
  • Automatic phone recognition limitations
  • State-of-the-art phone recognition accuracy
    only 50-60
  • Commercial speech recognition rely heavily on
    grammar and social context
  • Inadequately capture dialect differences
  • e.g. retroflex t recognized as typical t,
    r, ax,
  • Sub-dialects within Indian English

12
Related Research
  • Automatic speech recognition for non-native
    speech (Fung 2005, Livescu 2000)
  • Accent classification (Angkititrakul, Hansen
    2006)
  • Language identification (Li, Ma, Lee 2007)

13
Techniques
  • Acoustic modeling (e.g., Torres-Carrasquillo et
    al. 2004)
  • Gaussian mixture models, hidden Markov models
  • N-grams of phonetic units (e.g., Zissman 1995)
  • Models the grammar of phones
  • PRLM (Phone Recognition followed by Language
    Modeling)
  • Our approach acoustic modeling of
    dialect-discriminating phonetic contexts

14
Terminology Notation
  • Monophone
  • e.g. t, a
  • Biphone a monophone in the context of other
    phones
  • Phonetic notation
  • e.g. k-r is an r preceded by k
  • e.g. ta is t followed by a
  • Mathematical notation biphone variable ?b is
    phone ? followed by phone ? ?, ? monophone
    set
  • Only consider two dialects dd1, d2
  • d1 American English
  • d2 Indian English

15
Finding Dialect-Specific Phonological Rules
  • Supervised Learning
  • If phone transcriptions are available
  • Unsupervised Learning
  • If no phone transcriptions are available

16
Supervised Classification
  1. Extract phonological rules
  2. Adapt biphone models
  3. Dialect recognition task via likelihood ratio test

17
Supervised Rule Extraction Example 1
Indian English
Decoded phone v ai n
Ground-truth w ai n
Word wine wine wine
Phone recognizer ?
wine
American English
Decoded phone v ai n
Ground-truth v ai n
Word vine vine vine
Phone recognizer ?
vine
  • Recognition accuracy of the recognizer-hypothesiz
    ed v is 0 for Indian English, but 100 for
    American English.
  • Recognition accuracy of v differs across
    dialects

18
Supervised Rule Extraction Example 2
Indian English
Decoded phone b ae t
Ground-truth p ae t
Word pat pat pat
Phone recognizer ?
pat
American English
Decoded phone b ae t
Ground-truth b ae t
Word bat bat bat
Phone recognizer ?
bat
  • Recognition accuracy of the recognizer-hypothesiz
    ed b is 0 for Indian English, but 100 for
    American English.
  • Recognition accuracy of b differs across
    dialects

19
Supervised Rule Extraction Example 3
Indian English
Decoded phone dx (flap) er
Ground-truth t s
Word Beats Beats
Phone recognizer ?
beats
American English
Decoded phone dx (flap) er
Ground-truth dx (flap) er
Word Butter Butter
Phone recognizer ?
butter
  • Recognition accuracy of the recognizer-hypothesiz
    ed dxer is 0 for Indian English, but 100
    for American English.
  • Recognition accuracy of dxer differs across
    dialects

20
Rule Extraction Criteria
  • Biphone ?b is dialect-discriminating for dialect
    d1 and d2 if
  • The recognition accuracy of biphone ?b in dialect
    d1 is different from that in dialect d2
  • The occurrence frequency of biphone ?b is
    sufficient

equations
21
Adapt Biphone Models
adapt
Dialect-neutral monophone model
American-English-specific monophone model
22
Adapt Biphone Models
adapt
Dialect-neutral monophone model
American-English-specific monophone model
adapt
American-English-specific monophone model
American-English-specific biphone model
23
Dialect Recognition likelihood scores
Log Likelihood
American-English biphone models
Test utterance
Indian-English biphone models
Log Likelihood
24
Dialect Recognition likelihood ratio test
Log Likelihood
Log Likelihood Ratio Test
American-English biphone models
Test utterance
Indian-English biphone models
Log Likelihood
25
Dialect Recognition decision making
Log Likelihood
Log Likelihood Ratio Test
American-English biphone models
Detection Error Analysis
Test utterance
Indian-English biphone models
Log Likelihood
Threshold Determination
Dialect decision
26
Unsupervised Classification
  • Unsupervised rule extraction
  • Adapt all biphone models
  • Prune out non-dialect-specific biphone models
  • Dialect recognition via likelihood ratio test

27
Retaining Biphone Models Example
Decoded phone labels b ah dx er
Dialect-netural monophone model
American English
28
Retaining Biphone Models Example
Decoded phone labels b ah dx er
Dialect-netural monophone model
12 16 20 18
American Biphone Model
American English
29
Retaining Biphone Models Example
Decoded phone labels b ah dx er
Dialect-netural monophone model
12 16 20 18
American Biphone Model
American English
11 17 -10 15
Indian Biphone Model
30
Retaining Biphone Models Example
Decoded phone labels b ah dx er
Dialect-netural monophone model
12 16 20 18
American Biphone Model
American English
11 17 -10 15
Indian Biphone Model
31
Retaining Biphone Models Example
Decoded phone labels b ah dx er
Dialect-netural monophone model
12 16 20 18
American Biphone Model
American English
11 17 -10 15
Indian Biphone Model
Decoded phone labels
Log likelihood ratio
b ah dx er
1 -1 30 3
32
Retaining Biphone Models Example
Decoded phone labels b ah dx er
Dialect-netural monophone model
12 16 20 18
American Biphone Model
American English
11 17 -10 15
Indian Biphone Model
Decoded phone labels
Log likelihood ratio
b ah dx er
1 -1 30 3
The larger the log likelihood ratio of biphone
dxer, the more dialect-specific dxer is of
American English
33
Quantifying Dialect Discriminability
Log Likelihood
American biphone models
Log Likelihood Ratio
Indian biphone models
AmericanEnglish
Log Likelihood
34
Quantifying Dialect Discriminability
Log Likelihood
American biphone models
Log Likelihood Ratio
Indian biphone models
AmericanEnglish
Log Likelihood
35
Quantifying Dialect Discriminability
Log Likelihood
American biphone models
Log Likelihood Ratio
Indian biphone models
AmericanEnglish
Log Likelihood
Log Likelihood
American biphone models
Log Likelihood Ratio
Indian biphone models
IndianEnglish
Log Likelihood
36
Quantifying Dialect Discriminability
Log Likelihood
American biphone models
Log Likelihood Ratio
Indian biphone models
AmericanEnglish
Log Likelihood
Log Likelihood
American biphone models
Log Likelihood Ratio
Indian biphone models
IndianEnglish
Log Likelihood
37
Quantifying Dialect Discriminability
Log Likelihood
American biphone models
Log Likelihood Ratio
Indian biphone models
AmericanEnglish
Log Likelihood
Log Likelihood
American biphone models
Log Likelihood Ratio
Indian biphone models
IndianEnglish
Log Likelihood
38
Quantifying Dialect Discriminability
Log Likelihood
American biphone models
Log Likelihood Ratio
Indian biphone models
AmericanEnglish
Log Likelihood
Keep ??
Log Likelihood
American biphone models
Log Likelihood Ratio
Indian biphone models
IndianEnglish
Log Likelihood
equations
39
Experimental Setup
  • Training set
  • 104 hrs of dialect-marked data without
    transcriptions
  • Test set
  • 1298 American English trials
  • 200 Indian English trials
  • Each trial is 30 seconds
  • Dialect-neutral monophone HMM models
  • trained on 23 hrs of transcribed data
  • 47 English monophones

40
Pilot Study Dialect-Specific r Biphones
  • Recognizer decoded r instances were manually
    labeled in both dialects

Dialect Labeled instances Accuracy of decoded r
Indian 200 50
American 500 80
41
Detection Error Trade-off Curve
System EER ()
Monophone 10.5
r-biphone 10.4
EER Equal Error Rate
42
Discussion
  • r-biphones performs at least as well as
    monophones
  • r-biphones performs better when false alarms
    are penalized more
  • r-biphones not necessarily interpretable
  • Phone recognition errors
  • Rules only learned from minimal transcriptions (
    1min speech)
  • Sub-dialect issues with Indian English. Rules
    derived from speakers with Hindi as first
    language, but distribution of first language of
    speakers in test data is unknown.
  • ? Study more data with unsupervised algorithm

43
Unsupervised Learning Experiment
  • A developmental set (instead of test set) was
    used to determine the biphone models to retain
  • The proposed filtered-biphone system uses 25
    less biphone models, while EER performance is
    still comparable to the baseline
    unfiltered-biphone system

System EER ()
Monophone 10.5
r-biphone 10.4
Unfiltered-biphones 9.4
Filtered-biphones 9.7
44
Equal Error Rate (EER) Results
System EER ()
Monophone 10.5
r-biphone 10.4
Unfiltered-biphones 9.4
Filtered-biphones 9.7
PRLM 9.7
PRLM unfiltered biphones 6.6
PRLM Filtered biphones 6.9
Biphone Models
Fusion Experiments
  • Biphone systems are all superior to baseline
    monophone system
  • Filtered-biphone system is comparable with
    unfiltered-biphone system, regardless with or
    without fusion with PRLM
  • 29.3 relative gain obtained when proposed
    unfiltered-biphone system fuses with PRLM.

45
Detection Error Trade-off
46
Discussion of Learned Rules
  • Dialect-discriminating biphones
  • Flap biphones dxr, dxaxr, dxer
  • Biphones aes, aeth occurring in class,
    bath
  • Biphones learned in supervised method, e.g. rs
  • Non-dialect-discriminating biphones
  • No-speech sounds (e.g., filled pauses, coughing)
  • /zh/ biphones

47
What if more biphones are pruned?
EER of test set ()
Amount of pruned biphone models determined by
developmental set ()
48
Contributions
  • We present systematic approaches to discovering
    dialect-discriminating biphones, with and without
    using phone transcriptions
  • The proposed filtered-biphone system achieves
    comparable performance to a baseline
    unfiltered-biphone system despite using 25 less
    biphone models
  • Our approach complements other systems. When the
    filtered-biphone system is fused with a PRLM
    system, we obtain 29 relative gains
  • This is a first step towards a linguistically-info
    rmative dialect recognition system

49
Future Work
  • Investigate corpora with transcriptions to
    enhance interpretability of phonological rules
  • Model dialect-specific biphones in other dialects
    to ensure approach is language/dialect
    independent
  • Incorporate more sophisticated techniques to
    enhance recognition performance
  • Potential clinical applications diagnosing
    articulatory and phonological disorders

50
Additional Slides
51
Rule Extraction Criteria
  • Biphone mn is dialect-discriminating of dialect
    d1 if
  • The occurrence frequency of biphone mn is
    sufficient
  • The recognition accuracy of biphone mn in dialect
    d1 is different from that in dialect d2

Phone recognizer
Threshold
Ground-truth
Phone recognizer
Threshold
back
52
Unsupervised Rule Extraction
  • Adapt all biphones first, and then prune out less
    dialect-specific biphones.
  • The log likelihood ratio of biphone mn in dialect
    d is
  • Biphone mn is dialect-discriminating if

next
back
53
Unsupervised Rule Extraction
  • The log likelihood ratio of biphone mn in dialect
    d1 is

Duration of yd1, mn
Acoustic observation of biphone mn in dialect d1
previous
back
54
Comparison with PRLM
  • The difference in biphone recognition accuracy
    across dialects is caused by an acoustic
    difference that is dialect-specific
  • Directly used in supervised method
  • Implicitly used in unsupervised method
  • No method to date has used this dialect-specific
    info
  • PRLM models the difference between biphone
    occurrence frequency across dialects
  • Our approach and PRLM complement each other

55
Old Slides
56
Not All Phonetic Contexts are Created Equal
Indian English
Decoded phone dx (flap) er
Ground-truth Retroflex t er
Phone recognizer ?
57
Not All Phonetic Contexts are Created Equal
Indian English
Decoded phone v ai n
Ground-truth w ai n
Phone recognizer ?
American English
Decoded phone w ai n
Ground-truth w ai n
Phone recognizer ?
58
Not All Phonetic Contexts are Created Equal
Decoded phone sequence
Indian English
dx er
dx accuracy 0
Phone recognizer ?
59
Not All Phonetic Contexts are Created Equal
Decoded phone sequence
Indian English
dx er
dx accuracy 0
Phone recognizer ?
American English
dx er
dx accuracy 90
Phone recognizer ?
60
Not All Phonetic Contexts are Created Equal
Decoded phone sequence
Indian English
dx er
dx accuracy 0
Phone recognizer ?
American English
dx er
dx accuracy 90
Phone recognizer ?
  • Decoded segment of dxer is not acoustic
    implementations of dx for Indian English, but
    are most likely dx for American English
  • Biphone dxers recognition accuracy difference
    across dialects is caused by an acoustic
    difference that is dialect-specific

61
Not All Phonetic Contexts are Created Equal
American English
Decoded phone dx (flap) er
Ground-truth dx (flap) er
Phone recognizer ?
62
Flap consonants in American English
  • Flap is like a stop consonant except
  • No intra-oral pressure buildup
  • No release burst
  • Intervocalic /t/ or /d/ ! flap consonant
  • before unstressed vowels and syllabic /l/
  • e.g., butter, party, bottle
  • at the end of a word before a vowel
  • e.g., what else, whatever
  • Do these pairs sound different?
  • Ladder/latter, metal/medal, coating/coding,
    bitter/bidder, better/bedder

63
Indian English
  • No aspiration in /p, t, k/
  • /p,t,k/ sound more like /b, d, g/
  • Retroflex alveolar consonants /t/, /d/, /n/
  • Stop-like interdental fricatives
  • e.g., thin, that
  • /w/ ? /v/
  • British English influence
  • Rhoticity gone when vowel /r/
  • /ae/ ? /a/ e.g., class, bath, cant
  • Audio demo

64
Language Recognizer or L1 detector?
Language Recognizer
American speaking Mandarin
American English
65
Pilot Study Dialect-Specific r Biphones
  • Recognizer decoded r instances were manually
    labeled in both dialects
  • Examples of some extracted rules
  • dx-r, v-r, w-r, rdx, rs, rsil,
    rn, rr

Dialect Labeled instances Accuracy of decoded r
Indian 200 50
American 500 80
Write a Comment
User Comments (0)
About PowerShow.com