Context in Multilingual Tone and Pitch Accent Recognition - PowerPoint PPT Presentation

About This Presentation
Title:

Context in Multilingual Tone and Pitch Accent Recognition

Description:

Compensate for phrase contour. Analyze impact of ... Use of non-linear slope compensate may improve. Conclusion. Employ common acoustic representation ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 20
Provided by: ginal5
Category:

less

Transcript and Presenter's Notes

Title: Context in Multilingual Tone and Pitch Accent Recognition


1
Context in Multilingual Tone and Pitch Accent
Recognition
  • Gina-Anne Levow
  • University of Chicago
  • September 7, 2005

2
Roadmap
  • Motivating Context
  • Data Collections Processing
  • Modeling Context for Tone and Pitch Accent
  • Context in Recognition
  • Conclusion

3
Challenges
  • Tone and Pitch Accent Recognition
  • Key component of language understanding
  • Lexical tone carries word meaning
  • Pitch accent carries semantic, pragmatic,
    discourse meaning
  • Non-canonical form (Shen 90, Shih 00, Xu 01)
  • Tonal coarticulation modifies surface realization
  • In extreme cases, fall becomes rise
  • Tone is relative
  • To speaker range
  • High for male may be low for female
  • To phrase range, other tones
  • E.g. downstep

4
Strategy
  • Common model across languages, SVM classifier
  • Acoustic-prosodic model no word label, POS,
    lexical stress info
  • No explicit tone label sequence model
  • English, Mandarin Chinese (also Cantonese)
  • Exploit contextual information
  • Features from adjacent syllables
  • Height, shape direct, relative
  • Compensate for phrase contour
  • Analyze impact of
  • Context position, context encoding, context type
  • gt 20 relative improvement over no context
  • Preceding context greater enhancement than
    following

5
Data Collection Processing
  • English (Ostendorf et al, 95)
  • Boston University Radio News Corpus, f2b
  • Manually ToBI annotated, aligned, syllabified
  • Pitch accent aligned to syllables
  • Unaccented, High, Downstepped High, Low
  • (Sun 02, Ross Ostendorf 95)
  • Mandarin
  • TDT2 Voice of America Mandarin Broadcast News
  • Automatically force aligned to anchor scripts
    (CUSonic)
  • High, Mid-rising, Low, High falling, Neutral

6
Local Feature Extraction
  • Uniform representation for tone, pitch accent
  • Motivated by Pitch Target Approximation Model
  • Tone/pitch accent target exponentially approached
  • Linear target height, slope (Xu et al, 99)
  • Scalar features
  • Pitch, Intensity max, mean (Praat, speaker
    normalized)
  • Pitch at 5 points across voiced region
  • Duration
  • Initial, final in phrase
  • Slope
  • Linear fit to last half of pitch contour

7
Context Features
  • Local context
  • Extended features
  • Pitch max, mean, adjacent points of preceding,
    following syllables
  • Difference features
  • Difference between
  • Pitch max, mean, mid, slope
  • Intensity max, mean
  • Of preceding, following and current syllable
  • Phrasal context
  • Compute collection average phrase slope
  • Compute scalar pitch values, adjusted for slope

8
Classification Experiments
  • Classifier Support Vector Machine
  • Linear kernel
  • Multiclass formulation
  • (SVMlight, Joachims), LibSVM (Cheng Lin 01)
  • 41 training / test splits
  • Experiments Effects of
  • Context position preceding, following, none,
    both
  • Context encoding Extended/Difference
  • Context type local, phrasal

9
Results Local Context
Context Mandarin Tone English Pitch Accent
Full 74.5 81.3
Extend LR 74 80.7
Extend L 74 79.9
Extend R 70.5 76.7
Diffs LR 75.5 80.7
Diffs L 76.5 79.5
Diffs R 69 77.3
Both L 76.5 79.7
Both R 71.5 77.6
No context 68.5 75.9
10
Results Local Context
Context Mandarin Tone English Pitch Accent
Full 74.5 81.3
Extend PrePost 74.0 80.7
Extend Pre 74.0 79.9
Extend Post 70.5 76.7
Diffs PrePost 75.5 80.7
Diffs Pre 76.5 79.5
Diffs Post 69.0 77.3
Both Pre 76.5 79.7
Both Post 71.5 77.6
No context 68.5 75.9
11
Results Local Context
Context Mandarin Tone English Pitch Accent
Full 74.5 81.3
Extend PrePost 74 80.7
Extend Pre 74 79.9
Extend Post 70.5 76.7
Diffs PrePost 75.5 80.7
Diffs Pre 76.5 79.5
Diffs Post 69 77.3
Both Pre 76.5 79.7
Both Post 71.5 77.6
No context 68.5 75.9
12
Results Local Context
Context Mandarin Tone English Pitch Accent
Full 74.5 81.3
Extend PrePost 74 80.7
Extend Pre 74 79.9
Extend Post 70.5 76.7
Diffs PrePost 75.5 80.7
Diffs Pre 76.5 79.5
Diffs Post 69 77.3
Both Pre 76.5 79.7
Both Post 71.5 77.6
No context 68.5 75.9
13
Discussion Local Context
  • Any context information improves over none
  • Preceding context information consistently
    improves over none or following context
    information
  • English Generally more context features are
    better
  • Mandarin Following context can degrade
  • Little difference in encoding (Extend vs Diffs)
  • Consistent with phonological analysis (Xu) that
    coarticulation is carryover, not anticipatory

14
Results Discussion Phrasal Context
Phrase Context Mandarin Tone English Pitch Accent
Phrase 75.5 81.3
No Phrase 72 79.9
  • Phrase contour compensation enhances recognition
  • Simple strategy
  • Use of non-linear slope compensate may improve

15
Conclusion
  • Employ common acoustic representation
  • Tone (Mandarin), pitch accent (English)
  • Cantonese, recent experiments
  • SVM classifiers - linear kernel 76, 81
  • Local context effects
  • Up to gt 20 relative reduction in error
  • Preceding context greatest contribution
  • Carryover vs anticipatory
  • Phrasal context effects
  • Compensation for phrasal contour improves
    recognition

16
Current Future Work
  • Application of model to different languages
  • Cantonese, Dschang (Bantu family)
  • Cantonese 65 acoustic only, 85 w/segmental
  • Integration of additional contextual influence
  • Topic, turn, discourse structure
  • HMSVM, GHMM models
  • http//people.cs.uchicago.edu/levow/projects/tai
  • Supported by NSF Grant 0414919

17
Confusion Matrix (English)
Recognized Tone Manually Labeled Tone Manually Labeled Tone Manually Labeled Tone Manually Labeled Tone
Unaccented High Low D.S. High
Unaccented 95 (888/934) 25 (110/440) 100 (12/12) 53.5 (61/114)
High 4.6 (43/934) 73 (322/440) 0 38.5 (44/114)
Low 0 0 0 0
D.S. High 0.3 (3/934) 2(8/440) 0 8 (9/114)
18
Confusion Matrix (English)
Recognized Tone Manually Labeled Tone Manually Labeled Tone Manually Labeled Tone Manually Labeled Tone
Unaccented High Low D.S. High
Unaccented 95 25 100 53.5
High 4.6 73 0 38.5
Low 0 0 0 0
D.S. High 0.3 2 0 8
19
Confusion Matrix (Mandarin)
Recognized Tone Manually Labeled Tone Manually Labeled Tone Manually Labeled Tone Manually Labeled Tone
High Mid-Rising Low High-Falling Neutral
High 84 (38/45) 9 (5/56) 5 (1/20) 13 0 (9/68)
Mid-Rising 6.7 (3/45) 78.6 (44/56) 10 (2/20) 7 27.3 (5/68) (3/11)
Low 0 3.6 (2/56) 70 (14/20) 7 (5/68) 27.3
High-Falling 7.4 (4/45) 3.6 (2/56) 10 (2/20) 70 (48/68) 0
Neutral 0 5.3 (3/56) 5 (1/20) 1.5 (1/68) 45
20
Confusion Matrix (Mandarin)
Recognized Tone Manually Labeled Tone Manually Labeled Tone Manually Labeled Tone Manually Labeled Tone
High Mid-Rising Low High-Falling Neutral
High 84 9 5 13 0
Mid-Rising 6.7 78.6 10 7 27.3
Low 0 3.6 70 7 27.3
High-Falling 7.4 3.6 10 70 0
Neutral 0 5.3 5 1.5 45
21
Related Work
  • Tonal coarticulation
  • Xu Sun,02 Xu 97Shih Kochanski 00
  • English pitch accent
  • X. Sun, 02 Hasegawa-Johnson et al, 04 Ross
    Ostendorf 95
  • Lexical tone recognition
  • SVM recognition of Thai tone Thubthong 01
  • Context-dependent tone models
  • Wang Seneff 00, Zhou et al 04

22
Pitch Target Approximation Model
  • Pitch target
  • Linear model
  • Exponentially approximated
  • In practice, assume target well-approximated by
    mid-point (Sun, 02)
Write a Comment
User Comments (0)
About PowerShow.com