Letter-to-phoneme conversion - PowerPoint PPT Presentation

1 / 64
About This Presentation
Title:

Letter-to-phoneme conversion

Description:

(Toutanova and Moore, 2001) Linguistic interest of relationships between letters and phonemes. ... look at GIZA , Pharaoh, Carmel, etc. University of Alberta ... – PowerPoint PPT presentation

Number of Views:106
Avg rating:3.0/5.0
Slides: 65
Provided by: Kar80
Category:

less

Transcript and Presenter's Notes

Title: Letter-to-phoneme conversion


1
Letter-to-phoneme conversion
  • Sittichai Jiampojamarn
  • sj_at_cs.ualberta.ca
  • CMPUT 500 / HUCO 612
  • September 26, 2007

2
Outline
  • Part I
  • Introduction to letter-phoneme conversion
  • Part II
  • Many-to-Many alignments and Hidden Markov Models
    to Letter-to-phoneme conversion., NAACL 2007
  • Part III
  • On-going work discriminative approaches for
    letter-to-phoneme conversion
  • Part IV
  • Possible term projects for CMPUT 500 / HUGO 612

3
The task
  • Converting words to their pronunciations
  • study -gt s t ? d I
  • band -gt b æ n d
  • phoenix -gt f i n I k s
  • king -gt k I ?
  • Words ? sequences of letters.
  • Pronunciations ? sequence of phonemes.
  • Ignoring syllabifications, and stresses.

4
Why is it important?
  • Major component in speech synthesis systems
  • Word similarity based on pronunciation
  • Spelling correction. (Toutanova and Moore, 2001)
  • Linguistic interest of relationships between
    letters and phonemes.
  • Not a trivial task, but tractable.

5
Trivial solutions ?
  • Dictionary searching answers on database
  • Great effort to construct such large lexicon
    database.
  • Cant handle new words and misspellings.
  • Rule-based approaches
  • Work well on non-complex languages
  • Fail on complex languages
  • Each word creates its own rules. --- end up with
    remembering word-phoneme pairs.

6
John Kominek and Alan W. Black, Learning
Pronunciation Dictionaries Language Complexity
and Word Selection Strategies, In proceeding of
HLT-NAACL 2006, June 4-9, pp.232-239
7
Learning-based approaches
  • Training data
  • Examples of words and their phonemes.
  • Hidden structure
  • band ? b æ n d
  • b ?b, a ? æ, n ? n, d ? d
  • abode ? ? b o d
  • a ? ?, b ? b, o ? o, d ? d, e ? _

8
Alignments
  • To train L2P, we need alignments between letters
    and phonemes

a -gt ? b -gt b o -gt o d -gt d e
-gt _
9
Overview standard process
10
Letter-to-phoneme alignments
  • Previous work assumed one-to-one alignment for
    simplicity (Daelemans and Bosch, 1997 Black et
    al., 1998 Damper et al., 2005).
  • Expectation-Maximization (EM) algorithms are used
    to optimize the alignment parameters.
  • Matching all possible letters and phonemes
    iteratively until the parameters converge.

11
1-to-1 alignments
  • Initially, alignments parameters can start from
    uniform distribution, or counting all possible
    letter-phoneme mapping. Ex. abode ? ? b o d

P(a, ?) 4/5 P(b,b) 3/5
12
1-to-1 alignments
  • Find the best possible alignments based on
    current alignment parameters.
  • Based on the alignments found, update the
    parameters.

13
Finding the best possible alignments
  • Dynamic programming
  • Standard weighted minimum edit distance algorithm
    style.
  • Consider the alignment parameter P(l,p) is a
    mapping score component.
  • Try to find alignments which give the maximum
    score.
  • Allow to have null phonemes but not null letters
  • It is hard to incorporate null letters in the
    testing data

14
Visualization
15
Visualization
16
Visualization
17
Visualization
18
Visualization
19
Visualization
20
Visualization
21
Visualization
22
Visualization
23
Visualization
24
Problems with 1-to-1 alignments
  • Double letters two letters map to one phoneme.
    (e.g. ng ?, sh ?, ph f)

25
Problem with 1-to-1 alignments
  • Double phonemes one letter maps to two phonemes.
    (e.g. x k s, u j u)

26
Previous solutions for double phonemes
  • Preprocess using a fix list of phonemes.
  • k s -gt X
  • j u -gt U

Lose "j" and "u"
27
Applying many-to-many alignments and Hidden
Markov Models to Letter-to-Phoneme conversion
  • Sittichai Jiampojamarn, Grzegorz Kondrak and
    Tarek Sherif
  • Proceedings of the Annual Conference of the North
    American Chapter of the Association for
    Computational Linguistics (NAACL-HLT 2007),
    Rochester, NY, April 2007, pp.372-379.

28
Overview system
29
Many-to-many alignments
  • EM-based method.
  • Extended from the forward-backward training of a
    one-to-one stochastic transducer (Ristad and
    Yianilos, 1998).
  • Allow one or two letters to map to null, one, or
    two phonemes.

30
Many-to-many alignments
31
Many-to-many alignments
32
Many-to-many alignments
33
Prediction problem
  • Should the prediction model generate phonemes
    from one or two letters ?
  • gash g æ ? gasholder g æ s h o l d ? r

34
Letter chunking
  • A bigram letter chunking prediction automatic
    discovers double letters.

Ex. longs
35
Overview system
36
Phoneme prediction
  • Once the training examples are aligned, we need a
    phoneme prediction model.
  • Classification task or sequence prediction?

37
Instance based learning
  • Store the training examples.
  • The predicted class is assigned by searching the
    most similar training instance.
  • The similarity functions
  • Hamming distance, Euclidean distance, etc.

38
Basic HMMs
  • A basic sequence-based prediction method.
  • In L2P,
  • letters are observations
  • phonemes are states
  • Output phoneme sequences depend on both emission
    and transition probabilities.

39
Applying HMM
  • Use an instance based learning to produce a list
    of candidate phones with confidence values
    conf(phonei) for each letteri. (emission
    probability).
  • Use a language model of phoneme sequence in the
    training data to obtain transition probability
  • P(phonei phonei-1, phonei-n).

40
Visualization
Buried -gt b E r aI d 2.38 x 10-8
Buried -gt b E r I d 2.23 x 10-6
41
Evaluation
  • Data sets
  • English CMUDict (112K), Celex (65K).
  • Dutch Celex (116K).
  • German Celex (49K).
  • French Brulex (27K).
  • IB1 algorithm implemented in TiMBL package as the
    classifier.(W. Daelemans et al., 2004.)
  • Results are reported in word accuracy rate based
    on 10-fold cross validation.

42
(No Transcript)
43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
Messages
  • Many-to-many alignments show significant
    improvements over one-to-one traditional
    alignments.
  • HMM-like approach helps when a local classify has
    difficulty to predict phonemes.

47
Criticism
  • Joint models
  • Alignments, chunking, prediction, and HMM.
  • Error propagation
  • Errors from one model to other models which are
    unlikely to re-correct.
  • Can we combine and optimize at once ? Or at least
    allow the system to re-correct past errors ?

48
On-going work
  • Discriminative approach
  • for letter-to-phoneme conversion

49
Online discriminative learning
  • Let x is an input word and y is an output
    phonemes.
  • represents features describing x and
    y.
  • is a weight vector for

50
Online training algorithm
  • Initially,
  • For k iterations
  • For all letter-phoneme sequence pairs (x,y)
  • update weights according to and

51
Perceptron update (Collins, 2002)
  • Simple update training method.
  • Try to move the weights to the direction of
    correct answers when predicting wrong answers.

52
Examples
  • Separable case

Adapted from Dan Kleins tutorial slides at NAACL
2007.
53
Examples
  • Non-separable case

Adapted from Dan Kleins tutorial slides at NAACL
2007.
54
Issues with Perceptron
  • Overtraining test / held-out accuracy usually
    rises, then falls.
  • Regularization
  • if the data isnt separable, weights often thrash
    around.
  • Finds a barely separating solution

Taken from Dan Kleins tutorial slides at NAACL
2007.
55
Margin Infused Relaxed Algorithm (MIRA) (Crammer
and Singer, 2003)
  • Use n-best list to update weights.
  • separate by a margin at least as large as a loss
    function
  • and keep the weight changes as small as possible.

56
Loss function in letter-to-phoneme
  • Describe the loss of an incorrect prediction
    compared to the correct one.
  • Word error (0/1), phoneme error, or combination.

57
Results
  • Incomplete !!!
  • MIRA outperforms Perceptron.
  • Using 0/1 loss and combination loss are better
    than the phoneme loss function alone.
  • Overall, results show better performance than
    previous work.

58
Possible term projects
59
Possible term projects
  • Explore more linguistic features.
  • Explore machine translation systems for
    letter-to-phoneme conversion.
  • Unsupervised approaches for letter-to-phoneme
    conversion.
  • Other cool ideas to improve on a partial system
  • Data for evaluation are provided
  • Alignments are provided.
  • L2P model are provided.

60
Linguistic features
  • Looking for linguistic features to help L2P
  • Most systems incorporate letter feature (n-gram)
    type in some ways.
  • The new features (must) be obtained by using
    (only) word information.
  • Works been already done
  • Syllabification Susans thesis
  • Find syllabification break on letters using SVM
    approach.

61
Machine translation approach
  • L2P problem can be seen as a (simple) machine
    translation problem.
  • Where, wed like to translate letters to
    phonemes.
  • Consider L2P ? MT
  • Letters ? words
  • Words ? sentences
  • Phonemes ? target sentences
  • Moses -- a baseline SMT system, ACL 2007
  • http//www.statmt.org/wmt07/baseline.html
  • May need to also look at GIZA, Pharaoh, Carmel,
    etc.

62
Unsupervised approaches
  • Assuming, we dont have examples of word-phoneme
    pairs to train a model.
  • We can start from a list of possible
    letter-phoneme mappings
  • Or assuming, we have a small set of example pairs
    (100 pairs).
  • Dont expect to outperform the supervised
    approach but take advantage of being unsupervised
    methods

63
References
  • Collins, M. 2002. Discriminative training methods
    for hidden Markov models theory and experiments
    with perceptron algorithms. In Proceedings of the
    Acl-02 Conference on Empirical Methods in Natural
    Language Processing - Volume 10 Annual Meeting of
    the ACL. Association for Computational
    Linguistics, Morristown, NJ, 1-8
  • Crammer, K. and Singer, Y. 2003.
    Ultraconservative online algorithms for
    multiclass problems. J. Mach. Learn. Res. 3 (Mar.
    2003), 951-991.
  • Kristina Toutanova and Robert C. Moore. 2001.
    Pronunciation modeling for improved spelling
    correction. In ACL02 pp144-151, 2001.
  • John Kominek and Alan W Black, Learning
    Pronunciation Dictionaries Language Complexity
    and Word Selection Strategies, NAACL06, pp.
    232-239, 2006.
  • Walter M. P. Daelemans and Antal P. J. van den
    Bosch. 1997. Language-independent data-oriented
    grapheme-to-phoneme conversion. In Progress in
    Speech Synthesis, pages 77.89. Springer, New
    York.
  • Alan W Black, Kevin Lenzo, and Vincent Pagel.
    1998. Issues in building general letter to sound
    rules. In The Third ESCA Workshop in Speech
    Synthesis, pages 77-80.

64
References
  • Robert I. Damper, Yannick Marchand, John DS.
    Marsters, and Alexander I. Bazin. 2005. Aligning
    text and phonemes for speech technology
    applications using an EM-like algorithm,
    International Journal of Speech Technology,
    8(2)147-160, June 2005.
  • Eric Sven Ristad and Peter N. Yianilos. 1998.
    Learning string-edit distance. IEEE
    Transactions on Pattern Analysis and Machine
    Intelligence, 20(5)522.532.
  • Walter Daelemans, Jakub Zavrel, Ko Van Der Sloot,
    and Antal Van Den Bosch. 2004. TiMBL Tilburg
    Memory Based Leaner, version 5.1, reference
    guide. In ILK Technical Report Series 04-02.,
    2004.
Write a Comment
User Comments (0)
About PowerShow.com