Title: Building a sentential model for automatic prosody evaluation
1Building a sentential modelforautomatic prosody
evaluation
Part A
- Kyuchul Yoon
- School of English Language Literature
- Yeungnam University
- 2009.06.19
- Korea University
2English pronunciation evaluation
Introduction
- English pronunciation proficiency evaluation
- Ultimate goals
- Evaluation at
- The segmental level
- The suprasegmental level
- Current goals
- Evaluation at
- The suprasegmental level
3English pronunciation evaluation
Introduction
- The goal of present study
- Prosody evaluation of a single target utterance
- Produced by a Korean student
- Given
- An English target sentence
- A sentential model for prosody evaluation
4Manual vs. automatic
Introduction
- Problems of manual evaluation
- What to evaluate
- How to evaluate
- Consistency
- Problems of automatic evaluation
- How to reflect human knowledge
5Manual vs. automatic
Introduction
- A possible solution?
- Avoid knowledge-based abstraction
- Compare a target utterance with native speakers
utterances - Use multiple utterances for comparison
- Multiple good utterances from native speakers
- Adopt raw values
- Calculate difference values between the target
and the good utterances in terms of - The three prosodic aspects F0, intensity,
durations ? 3D coordinates
6How to build the model
Introduction
- Use multivariate statistical analysis
- A discriminant analysis
- The components of the model
- (The segmental proficiency scores controlled)
- The manual prosody evaluation scores (response)
- The automatic prosody evaluation scores (factors)
- The requirements of the model
- The correlation between the two levelsManual
scores vs. Automatic scores
7How to build the model
Introduction
- The manual prosody scores (an ideal case)
- The good utterance versions (point 5)by many
native speakers of English - The utterance versions by Korean students whose
prosodic proficiencies are - High (point 5)
- Intermediate (point 3)
- Low (point 1)
- On a scale of 1 (worst) to 5 (best)
8How to build the model
Introduction
- The automatic prosody scores
- Use of Praat scripts
- Comparison between a single target utterance
multiple native speakers utterances to yield
scores for - The F0 difference
- The intensity difference
- The duration difference
- in the form of 3D coordinates (x, y, z) (F0,
Int, Dur) - One utterance yields as many coordinates as the
number of good native speakers
9How to build the model
Introduction
- Evaluation by comparisons
10A 3D sentential modelfor prosody evaluation
Introduction
- A 3D model
- 3D axes F0, intensity, durations
- (F0, Int, Dur) coordinates (x, y, z)
- Automatic scores as scatterplot points
- Manually evaluated scores group the points
11A 3D sentential modelfor prosody evaluatioin
Introduction
- Validity of the model
- Sufficient separation of groups with different
manual scores - colors manual scores
- arrowheads automatic scores
12Sentential prosody evaluation 7
Methods
Before after duration manipulation
native
learner before
learner after
13Sentential prosody evaluation 7
Methods
F0 point-to-point comparison btw/ native and
learner after normalization
native
learner after
Automatic score (F0, Int, Dur) (x, y, z)
14Sentential prosody evaluation 7
Methods
Intensity point-to-point comparison btw/ native
and learner after normalization
native
learner after
Automatic score (F0, Int, Dur) (x, y, z)
15Sentential prosody evaluation 7
Methods
Duration segment-to-segment comparison btw/
native and learner
native
learner before
Automatic score (F0, Int, Dur) (x, y, z)
Euclidean distance metric for evaluation measure
P (p1, p2, p3,..., pn) and Q (q1, q2, q3,...,
qn) in Euclidean n-dimensional space
16Manual evaluation of sentential prosody
Methods
Manual scores for Set B utterances The dancing
queen likes only the apple pies
17Sentential prosody evaluation 7
Methods
A sample score array for one utterance from group
K5one learner utterance vs. 10 model native
utterances Automatic prosody score for K5.U1
(899,142,408), (360,92,190), (716,178,183)
18A prosody evaluation modelby a Korean phonetician
Results
Korean phoneticians Model
19A prosody evaluation modelby a Korean phonetician
Results
Korean phoneticians Model
20A sample prosody evaluationwith a discriminant
analysis
Results
21To make this fully automatic
Discussion
- For manual evaluation of the training model
- The number of Korean learners
- The more the better
- The levels of English proficiency
- The diverse the better (scores 1 through 5)
- For automatic evaluation of the trainees
- Need automatic segmentation (ASR)
- Need to deal with redundant/missing segments
22Building a sentential modelfor automatic
evaluation of pronunciation proficiency
- What about segmental evaluation?
Part B
23Segmental evaluation byspectral comparison
Methods
- Sex/age controlled (no normalization was used)
- Adult male (native/Korean) speakers were selected
- Spectral comparison
- Three equally-spaced spectral slices were used
for each matching segments - Euclidean distance measure was used from a pair
of matching spectral envelopes - Four coordinates for pronunciation proficiency
evaluation - Segments, F0, intensity, durations
- (w, x, y, z) becomes one of the score array
24Manual evaluation of overall proficiency
Methods
Manual scores for Set C utterances Put your toys
away right now
ltTable 4gt The overall scores of the 34 utterances
for Set C sentence Put your toys away right
now. The manual evaluation was performed by a
Korean phonetician. Note that the subjects were
all male adults.
25A pronunciation proficiency evaluation modelby a
Korean phonetician
Results
Korean phoneticians Models
(Intensity axis not shown)
26A prosody evaluation modelby a Korean phonetician
Results
Korean phoneticians Model
27A discriminant analysis
Results
ltTable 5gt The classification table from the
discriminant analysis of one test data. The
number in each cell represents the probability of
the automatic pronunciation Proficiency score
being classified into the predicted group.
ltTable 6gt The confusion matrix for the
classification table.
28Discriminant analyseswith leave-one-out
cross-validation
Results
Testing for score 4 6 out of 9 correct
Testing for score 2 12 out of 15 correct
29Discriminant analyseswith leave-one-out
cross-validation
Results
- For N4 K2 groups, evaluation models were built
by using - The discriminant analysis with
- Leave-one-out cross-validation
- The number of models (built by discriminant
analyses) was 24 - Group N4 9 subjects
- Group K2 15 subjects
- Success rate
- Group N4 6 out of 9 predicted correct
- Group K2 12 out of 15 predicted correct
30Automatic evaluationof pronunciation proficiency
Discussion
- Viability of sentential models for the evaluation
of - Segmental proficiency spectral comparison
- Prosodic proficiency F0/intensity/durations
- in the form of multiple score array
coordinates (segments, F0, intensity,
durations) (w, x, y, z) - Comparison seems to work
- A target utterance vs. multiple model native
utterances - Better models can be built with
- More (controlled) utterances
- More score resolution
- Current score 2 (bad) score 4 (good)
- Future score 1 (worst) score 3 (fair) score
5 (best)
31References
1 Boersma, Paul, Praat, a system for doing
phonetics by computer, Glot International
5(9/10), pp.341-345, 2001. 2 Mahalanobis, P.C.,
On the generalized distance in statistics,
Proceedings of the National Institute of Science
of India 12, pp.49-55, 1936. 3 Moulines, E.
F. Charpentier, Pitch synchronous waveform
processing techniques for text-to-speech
synthesis using diphones, Speech Communication
9, pp.453-467, 1990. 4 Ramus, F., M. Nespor, J.
Mehler, Correlates of linguistic rhythm in the
speech signal, Cognition 73, pp. 265-292,
1999. 5 Rhee, S., S. Lee, Y. Lee S. Kang,
Design and construction of Korean-Spoken English
Corpus (K-SEC), Malsori 46, pp.159-174,
2003. 6 Yoon, K, Imposing native speakers'
prosody on non-native speakers' utterances The
technique of cloning prosody, Journal of the
Modern British American Language Literature
25(4), pp.197-215, 2007. 7 Yoon, K. 2008.
Synthesis and evaluation of prosodically
exaggerated utterances. Unpublished manuscript