Context in Multilingual Tone and Pitch Accent Recognition - PowerPoint PPT Presentation

About This Presentation

Title:

Context in Multilingual Tone and Pitch Accent Recognition

Description:

Compensate for phrase contour. Analyze impact of ... Use of non-linear slope compensate may improve. Conclusion. Employ common acoustic representation ... – PowerPoint PPT presentation

Number of Views:43

Avg rating:3.0/5.0

Slides: 20

Provided by: ginal5

Learn more at: http://people.cs.uchicago.edu

Category:

more less

Transcript and Presenter's Notes

Title: Context in Multilingual Tone and Pitch Accent Recognition

1
Context in Multilingual Tone and Pitch Accent
Recognition

Gina-Anne Levow
University of Chicago
September 7, 2005

2
Roadmap

Motivating Context
Data Collections Processing
Modeling Context for Tone and Pitch Accent
Context in Recognition
Conclusion

3
Challenges

Tone and Pitch Accent Recognition
Key component of language understanding
Lexical tone carries word meaning
Pitch accent carries semantic, pragmatic,
discourse meaning
Non-canonical form (Shen 90, Shih 00, Xu 01)
Tonal coarticulation modifies surface realization
In extreme cases, fall becomes rise
Tone is relative
To speaker range
High for male may be low for female
To phrase range, other tones
E.g. downstep

4
Strategy

Common model across languages, SVM classifier
Acoustic-prosodic model no word label, POS,
lexical stress info
No explicit tone label sequence model
English, Mandarin Chinese (also Cantonese)
Exploit contextual information
Features from adjacent syllables
Height, shape direct, relative
Compensate for phrase contour
Analyze impact of
Context position, context encoding, context type
gt 20 relative improvement over no context
Preceding context greater enhancement than
following

5
Data Collection Processing

English (Ostendorf et al, 95)
Boston University Radio News Corpus, f2b
Manually ToBI annotated, aligned, syllabified
Pitch accent aligned to syllables
Unaccented, High, Downstepped High, Low
(Sun 02, Ross Ostendorf 95)
Mandarin
TDT2 Voice of America Mandarin Broadcast News
Automatically force aligned to anchor scripts
(CUSonic)
High, Mid-rising, Low, High falling, Neutral

6
Local Feature Extraction

Uniform representation for tone, pitch accent
Motivated by Pitch Target Approximation Model
Tone/pitch accent target exponentially approached
Linear target height, slope (Xu et al, 99)
Scalar features
Pitch, Intensity max, mean (Praat, speaker
normalized)
Pitch at 5 points across voiced region
Duration
Initial, final in phrase
Slope
Linear fit to last half of pitch contour

7
Context Features

Local context
Extended features
Pitch max, mean, adjacent points of preceding,
following syllables
Difference features
Difference between
Pitch max, mean, mid, slope
Intensity max, mean
Of preceding, following and current syllable
Phrasal context
Compute collection average phrase slope
Compute scalar pitch values, adjusted for slope

8
Classification Experiments

Classifier Support Vector Machine
Linear kernel
Multiclass formulation
(SVMlight, Joachims), LibSVM (Cheng Lin 01)
41 training / test splits
Experiments Effects of
Context position preceding, following, none,
both
Context encoding Extended/Difference
Context type local, phrasal

9
Results Local Context
Context Mandarin Tone English Pitch Accent
Full 74.5 81.3
Extend LR 74 80.7
Extend L 74 79.9
Extend R 70.5 76.7
Diffs LR 75.5 80.7
Diffs L 76.5 79.5
Diffs R 69 77.3
Both L 76.5 79.7
Both R 71.5 77.6
No context 68.5 75.9
10
Results Local Context
Context Mandarin Tone English Pitch Accent
Full 74.5 81.3
Extend PrePost 74.0 80.7
Extend Pre 74.0 79.9
Extend Post 70.5 76.7
Diffs PrePost 75.5 80.7
Diffs Pre 76.5 79.5
Diffs Post 69.0 77.3
Both Pre 76.5 79.7
Both Post 71.5 77.6
No context 68.5 75.9
11
Results Local Context
Context Mandarin Tone English Pitch Accent
Full 74.5 81.3
Extend PrePost 74 80.7
Extend Pre 74 79.9
Extend Post 70.5 76.7
Diffs PrePost 75.5 80.7
Diffs Pre 76.5 79.5
Diffs Post 69 77.3
Both Pre 76.5 79.7
Both Post 71.5 77.6
No context 68.5 75.9
12
Results Local Context
Context Mandarin Tone English Pitch Accent
Full 74.5 81.3
Extend PrePost 74 80.7
Extend Pre 74 79.9
Extend Post 70.5 76.7
Diffs PrePost 75.5 80.7
Diffs Pre 76.5 79.5
Diffs Post 69 77.3
Both Pre 76.5 79.7
Both Post 71.5 77.6
No context 68.5 75.9
13
Discussion Local Context

Any context information improves over none
Preceding context information consistently
improves over none or following context
information
English Generally more context features are
better
Mandarin Following context can degrade
Little difference in encoding (Extend vs Diffs)
Consistent with phonological analysis (Xu) that
coarticulation is carryover, not anticipatory

14
Results Discussion Phrasal Context
Phrase Context Mandarin Tone English Pitch Accent
Phrase 75.5 81.3
No Phrase 72 79.9

Phrase contour compensation enhances recognition
Simple strategy
Use of non-linear slope compensate may improve

15
Conclusion

Employ common acoustic representation
Tone (Mandarin), pitch accent (English)
Cantonese, recent experiments
SVM classifiers - linear kernel 76, 81
Local context effects
Up to gt 20 relative reduction in error
Preceding context greatest contribution
Carryover vs anticipatory
Phrasal context effects
Compensation for phrasal contour improves
recognition

16
Current Future Work

Application of model to different languages
Cantonese, Dschang (Bantu family)
Cantonese 65 acoustic only, 85 w/segmental
Integration of additional contextual influence
Topic, turn, discourse structure
HMSVM, GHMM models
http//people.cs.uchicago.edu/levow/projects/tai
Supported by NSF Grant 0414919

17
Confusion Matrix (English)
Recognized Tone Manually Labeled Tone Manually Labeled Tone Manually Labeled Tone Manually Labeled Tone
Unaccented High Low D.S. High
Unaccented 95 (888/934) 25 (110/440) 100 (12/12) 53.5 (61/114)
High 4.6 (43/934) 73 (322/440) 0 38.5 (44/114)
Low 0 0 0 0
D.S. High 0.3 (3/934) 2(8/440) 0 8 (9/114)
18
Confusion Matrix (English)
Recognized Tone Manually Labeled Tone Manually Labeled Tone Manually Labeled Tone Manually Labeled Tone
Unaccented High Low D.S. High
Unaccented 95 25 100 53.5
High 4.6 73 0 38.5
Low 0 0 0 0
D.S. High 0.3 2 0 8
19
Confusion Matrix (Mandarin)
Recognized Tone Manually Labeled Tone Manually Labeled Tone Manually Labeled Tone Manually Labeled Tone
High Mid-Rising Low High-Falling Neutral
High 84 (38/45) 9 (5/56) 5 (1/20) 13 0 (9/68)
Mid-Rising 6.7 (3/45) 78.6 (44/56) 10 (2/20) 7 27.3 (5/68) (3/11)
Low 0 3.6 (2/56) 70 (14/20) 7 (5/68) 27.3
High-Falling 7.4 (4/45) 3.6 (2/56) 10 (2/20) 70 (48/68) 0
Neutral 0 5.3 (3/56) 5 (1/20) 1.5 (1/68) 45
20
Confusion Matrix (Mandarin)
Recognized Tone Manually Labeled Tone Manually Labeled Tone Manually Labeled Tone Manually Labeled Tone
High Mid-Rising Low High-Falling Neutral
High 84 9 5 13 0
Mid-Rising 6.7 78.6 10 7 27.3
Low 0 3.6 70 7 27.3
High-Falling 7.4 3.6 10 70 0
Neutral 0 5.3 5 1.5 45
21
Related Work