Building a sentential model for automatic prosody evaluation

About This Presentation

Title:

Building a sentential model for automatic prosody evaluation

Description:

... Adopt raw values Calculate difference values between the target and the good utterances in terms of The three prosodic aspects : F0, intensity, ... – PowerPoint PPT presentation

Number of Views:146

Avg rating:3.0/5.0

Slides: 32

Provided by: lingOhio

Category:

more less

Transcript and Presenter's Notes

Title: Building a sentential model for automatic prosody evaluation

1
Building a sentential modelforautomatic prosody
evaluation
Part A

Kyuchul Yoon
School of English Language Literature
Yeungnam University
2009.06.19
Korea University

2
English pronunciation evaluation
Introduction

English pronunciation proficiency evaluation
Ultimate goals
Evaluation at
The segmental level
The suprasegmental level
Current goals
Evaluation at
The suprasegmental level

3
English pronunciation evaluation
Introduction

The goal of present study
Prosody evaluation of a single target utterance
Produced by a Korean student
Given
An English target sentence
A sentential model for prosody evaluation

4
Manual vs. automatic
Introduction

Problems of manual evaluation
What to evaluate
How to evaluate
Consistency
Problems of automatic evaluation
How to reflect human knowledge

5
Manual vs. automatic
Introduction

A possible solution?
Avoid knowledge-based abstraction
Compare a target utterance with native speakers
utterances
Use multiple utterances for comparison
Multiple good utterances from native speakers
Adopt raw values
Calculate difference values between the target
and the good utterances in terms of
The three prosodic aspects F0, intensity,
durations ? 3D coordinates

6
How to build the model
Introduction

Use multivariate statistical analysis
A discriminant analysis
The components of the model
(The segmental proficiency scores controlled)
The manual prosody evaluation scores (response)
The automatic prosody evaluation scores (factors)
The requirements of the model
The correlation between the two levelsManual
scores vs. Automatic scores

7
How to build the model
Introduction

The manual prosody scores (an ideal case)
The good utterance versions (point 5)by many
native speakers of English
The utterance versions by Korean students whose
prosodic proficiencies are
High (point 5)
Intermediate (point 3)
Low (point 1)
On a scale of 1 (worst) to 5 (best)

8
How to build the model
Introduction

The automatic prosody scores
Use of Praat scripts
Comparison between a single target utterance
multiple native speakers utterances to yield
scores for
The F0 difference
The intensity difference
The duration difference
in the form of 3D coordinates (x, y, z) (F0,
Int, Dur)
One utterance yields as many coordinates as the
number of good native speakers

9
How to build the model
Introduction

Evaluation by comparisons

10
A 3D sentential modelfor prosody evaluation
Introduction

A 3D model
3D axes F0, intensity, durations
(F0, Int, Dur) coordinates (x, y, z)
Automatic scores as scatterplot points
Manually evaluated scores group the points

11
A 3D sentential modelfor prosody evaluatioin
Introduction

Validity of the model
Sufficient separation of groups with different
manual scores
colors manual scores
arrowheads automatic scores

12
Sentential prosody evaluation 7
Methods
Before after duration manipulation
native
learner before
learner after
13
Sentential prosody evaluation 7
Methods
F0 point-to-point comparison btw/ native and
learner after normalization
native
learner after
Automatic score (F0, Int, Dur) (x, y, z)
14
Sentential prosody evaluation 7
Methods
Intensity point-to-point comparison btw/ native
and learner after normalization
native
learner after
Automatic score (F0, Int, Dur) (x, y, z)
15
Sentential prosody evaluation 7
Methods
Duration segment-to-segment comparison btw/
native and learner
native
learner before
Automatic score (F0, Int, Dur) (x, y, z)
Euclidean distance metric for evaluation measure
P (p1, p2, p3,..., pn) and Q (q1, q2, q3,...,
qn) in Euclidean n-dimensional space
16
Manual evaluation of sentential prosody
Methods
Manual scores for Set B utterances The dancing
queen likes only the apple pies
17
Sentential prosody evaluation 7
Methods
A sample score array for one utterance from group
K5one learner utterance vs. 10 model native
utterances Automatic prosody score for K5.U1
(899,142,408), (360,92,190), (716,178,183)
18
A prosody evaluation modelby a Korean phonetician
Results
Korean phoneticians Model
19
A prosody evaluation modelby a Korean phonetician
Results
Korean phoneticians Model
20
A sample prosody evaluationwith a discriminant
analysis
Results
21
To make this fully automatic
Discussion

For manual evaluation of the training model
The number of Korean learners
The more the better
The levels of English proficiency
The diverse the better (scores 1 through 5)
For automatic evaluation of the trainees
Need automatic segmentation (ASR)
Need to deal with redundant/missing segments

22
Building a sentential modelfor automatic
evaluation of pronunciation proficiency

What about segmental evaluation?

Part B
23
Segmental evaluation byspectral comparison
Methods

Sex/age controlled (no normalization was used)
Adult male (native/Korean) speakers were selected
Spectral comparison
Three equally-spaced spectral slices were used
for each matching segments
Euclidean distance measure was used from a pair
of matching spectral envelopes
Four coordinates for pronunciation proficiency
evaluation
Segments, F0, intensity, durations
(w, x, y, z) becomes one of the score array

24
Manual evaluation of overall proficiency
Methods
Manual scores for Set C utterances Put your toys
away right now
ltTable 4gt The overall scores of the 34 utterances
for Set C sentence Put your toys away right
now. The manual evaluation was performed by a
Korean phonetician. Note that the subjects were
all male adults.
25
A pronunciation proficiency evaluation modelby a
Korean phonetician
Results
Korean phoneticians Models
(Intensity axis not shown)
26
A prosody evaluation modelby a Korean phonetician
Results
Korean phoneticians Model
27
A discriminant analysis
Results
ltTable 5gt The classification table from the
discriminant analysis of one test data. The
number in each cell represents the probability of
the automatic pronunciation Proficiency score
being classified into the predicted group.
ltTable 6gt The confusion matrix for the
classification table.
28
Discriminant analyseswith leave-one-out
cross-validation
Results
Testing for score 4 6 out of 9 correct
Testing for score 2 12 out of 15 correct
29
Discriminant analyseswith leave-one-out
cross-validation
Results

For N4 K2 groups, evaluation models were built
by using
The discriminant analysis with
Leave-one-out cross-validation
The number of models (built by discriminant
analyses) was 24
Group N4 9 subjects
Group K2 15 subjects
Success rate
Group N4 6 out of 9 predicted correct
Group K2 12 out of 15 predicted correct

30
Automatic evaluationof pronunciation proficiency
Discussion

Viability of sentential models for the evaluation
of
Segmental proficiency spectral comparison
Prosodic proficiency F0/intensity/durations
in the form of multiple score array
coordinates (segments, F0, intensity,
durations) (w, x, y, z)
Comparison seems to work
A target utterance vs. multiple model native
utterances
Better models can be built with
More (controlled) utterances
More score resolution
Current score 2 (bad) score 4 (good)
Future score 1 (worst) score 3 (fair) score
5 (best)

31
References
1 Boersma, Paul, Praat, a system for doing
phonetics by computer, Glot International
5(9/10), pp.341-345, 2001. 2 Mahalanobis, P.C.,
On the generalized distance in statistics,
Proceedings of the National Institute of Science
of India 12, pp.49-55, 1936. 3 Moulines, E.
F. Charpentier, Pitch synchronous waveform
processing techniques for text-to-speech
synthesis using diphones, Speech Communication
9, pp.453-467, 1990. 4 Ramus, F., M. Nespor, J.
Mehler, Correlates of linguistic rhythm in the
speech signal, Cognition 73, pp. 265-292,
1999. 5 Rhee, S., S. Lee, Y. Lee S. Kang,
Design and construction of Korean-Spoken English
Corpus (K-SEC), Malsori 46, pp.159-174,
2003. 6 Yoon, K, Imposing native speakers'
prosody on non-native speakers' utterances The
technique of cloning prosody, Journal of the
Modern British American Language Literature
25(4), pp.197-215, 2007. 7 Yoon, K. 2008.
Synthesis and evaluation of prosodically
exaggerated utterances. Unpublished manuscript

Write a Comment

User Comments (0)

About PowerShow.com

Building a sentential model for automatic prosody evaluation - PowerPoint PPT Presentation

Building a sentential model for automatic prosody evaluation

... Adopt raw values Calculate difference values between the target and the good utterances in terms of The three prosodic aspects : F0, intensity, ... – PowerPoint PPT presentation