Title: Letter-to-phoneme conversion
1Letter-to-phoneme conversion
- Sittichai Jiampojamarn
- sj_at_cs.ualberta.ca
- CMPUT 500 / HUCO 612
- September 26, 2007
2Outline
- Part I
- Introduction to letter-phoneme conversion
- Part II
- Many-to-Many alignments and Hidden Markov Models
to Letter-to-phoneme conversion., NAACL 2007 - Part III
- On-going work discriminative approaches for
letter-to-phoneme conversion - Part IV
- Possible term projects for CMPUT 500 / HUGO 612
3The task
- Converting words to their pronunciations
- study -gt s t ? d I
- band -gt b æ n d
- phoenix -gt f i n I k s
- king -gt k I ?
- Words ? sequences of letters.
- Pronunciations ? sequence of phonemes.
- Ignoring syllabifications, and stresses.
4Why is it important?
- Major component in speech synthesis systems
- Word similarity based on pronunciation
- Spelling correction. (Toutanova and Moore, 2001)
- Linguistic interest of relationships between
letters and phonemes. - Not a trivial task, but tractable.
5Trivial solutions ?
- Dictionary searching answers on database
- Great effort to construct such large lexicon
database. - Cant handle new words and misspellings.
- Rule-based approaches
- Work well on non-complex languages
- Fail on complex languages
- Each word creates its own rules. --- end up with
remembering word-phoneme pairs.
6John Kominek and Alan W. Black, Learning
Pronunciation Dictionaries Language Complexity
and Word Selection Strategies, In proceeding of
HLT-NAACL 2006, June 4-9, pp.232-239
7Learning-based approaches
- Training data
- Examples of words and their phonemes.
- Hidden structure
- band ? b æ n d
- b ?b, a ? æ, n ? n, d ? d
- abode ? ? b o d
- a ? ?, b ? b, o ? o, d ? d, e ? _
8Alignments
- To train L2P, we need alignments between letters
and phonemes
a -gt ? b -gt b o -gt o d -gt d e
-gt _
9Overview standard process
10Letter-to-phoneme alignments
- Previous work assumed one-to-one alignment for
simplicity (Daelemans and Bosch, 1997 Black et
al., 1998 Damper et al., 2005). - Expectation-Maximization (EM) algorithms are used
to optimize the alignment parameters. - Matching all possible letters and phonemes
iteratively until the parameters converge.
111-to-1 alignments
- Initially, alignments parameters can start from
uniform distribution, or counting all possible
letter-phoneme mapping. Ex. abode ? ? b o d
P(a, ?) 4/5 P(b,b) 3/5
121-to-1 alignments
- Find the best possible alignments based on
current alignment parameters.
- Based on the alignments found, update the
parameters.
13Finding the best possible alignments
- Dynamic programming
- Standard weighted minimum edit distance algorithm
style. - Consider the alignment parameter P(l,p) is a
mapping score component. - Try to find alignments which give the maximum
score. - Allow to have null phonemes but not null letters
- It is hard to incorporate null letters in the
testing data
14Visualization
15Visualization
16Visualization
17Visualization
18Visualization
19Visualization
20Visualization
21Visualization
22Visualization
23Visualization
24Problems with 1-to-1 alignments
- Double letters two letters map to one phoneme.
(e.g. ng ?, sh ?, ph f)
25Problem with 1-to-1 alignments
- Double phonemes one letter maps to two phonemes.
(e.g. x k s, u j u)
26Previous solutions for double phonemes
- Preprocess using a fix list of phonemes.
- k s -gt X
- j u -gt U
Lose "j" and "u"
27Applying many-to-many alignments and Hidden
Markov Models to Letter-to-Phoneme conversion
- Sittichai Jiampojamarn, Grzegorz Kondrak and
Tarek Sherif - Proceedings of the Annual Conference of the North
American Chapter of the Association for
Computational Linguistics (NAACL-HLT 2007),
Rochester, NY, April 2007, pp.372-379.
28Overview system
29Many-to-many alignments
- EM-based method.
- Extended from the forward-backward training of a
one-to-one stochastic transducer (Ristad and
Yianilos, 1998). - Allow one or two letters to map to null, one, or
two phonemes.
30Many-to-many alignments
31Many-to-many alignments
32Many-to-many alignments
33Prediction problem
- Should the prediction model generate phonemes
from one or two letters ? - gash g æ ? gasholder g æ s h o l d ? r
34Letter chunking
- A bigram letter chunking prediction automatic
discovers double letters.
Ex. longs
35Overview system
36Phoneme prediction
- Once the training examples are aligned, we need a
phoneme prediction model. - Classification task or sequence prediction?
37Instance based learning
- Store the training examples.
- The predicted class is assigned by searching the
most similar training instance. - The similarity functions
- Hamming distance, Euclidean distance, etc.
38Basic HMMs
- A basic sequence-based prediction method.
- In L2P,
- letters are observations
- phonemes are states
- Output phoneme sequences depend on both emission
and transition probabilities.
39Applying HMM
- Use an instance based learning to produce a list
of candidate phones with confidence values
conf(phonei) for each letteri. (emission
probability). - Use a language model of phoneme sequence in the
training data to obtain transition probability - P(phonei phonei-1, phonei-n).
40Visualization
Buried -gt b E r aI d 2.38 x 10-8
Buried -gt b E r I d 2.23 x 10-6
41Evaluation
- Data sets
- English CMUDict (112K), Celex (65K).
- Dutch Celex (116K).
- German Celex (49K).
- French Brulex (27K).
- IB1 algorithm implemented in TiMBL package as the
classifier.(W. Daelemans et al., 2004.) - Results are reported in word accuracy rate based
on 10-fold cross validation.
42(No Transcript)
43(No Transcript)
44(No Transcript)
45(No Transcript)
46Messages
- Many-to-many alignments show significant
improvements over one-to-one traditional
alignments. - HMM-like approach helps when a local classify has
difficulty to predict phonemes.
47Criticism
- Joint models
- Alignments, chunking, prediction, and HMM.
- Error propagation
- Errors from one model to other models which are
unlikely to re-correct. - Can we combine and optimize at once ? Or at least
allow the system to re-correct past errors ?
48On-going work
- Discriminative approach
- for letter-to-phoneme conversion
49Online discriminative learning
- Let x is an input word and y is an output
phonemes. - represents features describing x and
y. - is a weight vector for
50Online training algorithm
- Initially,
- For k iterations
- For all letter-phoneme sequence pairs (x,y)
-
- update weights according to and
51Perceptron update (Collins, 2002)
- Simple update training method.
- Try to move the weights to the direction of
correct answers when predicting wrong answers.
52Examples
Adapted from Dan Kleins tutorial slides at NAACL
2007.
53Examples
Adapted from Dan Kleins tutorial slides at NAACL
2007.
54Issues with Perceptron
- Overtraining test / held-out accuracy usually
rises, then falls. - Regularization
- if the data isnt separable, weights often thrash
around. - Finds a barely separating solution
Taken from Dan Kleins tutorial slides at NAACL
2007.
55Margin Infused Relaxed Algorithm (MIRA) (Crammer
and Singer, 2003)
- Use n-best list to update weights.
- separate by a margin at least as large as a loss
function - and keep the weight changes as small as possible.
56Loss function in letter-to-phoneme
- Describe the loss of an incorrect prediction
compared to the correct one. - Word error (0/1), phoneme error, or combination.
57Results
- Incomplete !!!
- MIRA outperforms Perceptron.
- Using 0/1 loss and combination loss are better
than the phoneme loss function alone. - Overall, results show better performance than
previous work.
58Possible term projects
59Possible term projects
- Explore more linguistic features.
- Explore machine translation systems for
letter-to-phoneme conversion. - Unsupervised approaches for letter-to-phoneme
conversion. - Other cool ideas to improve on a partial system
- Data for evaluation are provided
- Alignments are provided.
- L2P model are provided.
60Linguistic features
- Looking for linguistic features to help L2P
- Most systems incorporate letter feature (n-gram)
type in some ways. - The new features (must) be obtained by using
(only) word information. - Works been already done
- Syllabification Susans thesis
- Find syllabification break on letters using SVM
approach.
61Machine translation approach
- L2P problem can be seen as a (simple) machine
translation problem. - Where, wed like to translate letters to
phonemes. - Consider L2P ? MT
- Letters ? words
- Words ? sentences
- Phonemes ? target sentences
- Moses -- a baseline SMT system, ACL 2007
- http//www.statmt.org/wmt07/baseline.html
- May need to also look at GIZA, Pharaoh, Carmel,
etc.
62Unsupervised approaches
- Assuming, we dont have examples of word-phoneme
pairs to train a model. - We can start from a list of possible
letter-phoneme mappings - Or assuming, we have a small set of example pairs
(100 pairs). - Dont expect to outperform the supervised
approach but take advantage of being unsupervised
methods
63References
- Collins, M. 2002. Discriminative training methods
for hidden Markov models theory and experiments
with perceptron algorithms. In Proceedings of the
Acl-02 Conference on Empirical Methods in Natural
Language Processing - Volume 10 Annual Meeting of
the ACL. Association for Computational
Linguistics, Morristown, NJ, 1-8 - Crammer, K. and Singer, Y. 2003.
Ultraconservative online algorithms for
multiclass problems. J. Mach. Learn. Res. 3 (Mar.
2003), 951-991. - Kristina Toutanova and Robert C. Moore. 2001.
Pronunciation modeling for improved spelling
correction. In ACL02 pp144-151, 2001. - John Kominek and Alan W Black, Learning
Pronunciation Dictionaries Language Complexity
and Word Selection Strategies, NAACL06, pp.
232-239, 2006. - Walter M. P. Daelemans and Antal P. J. van den
Bosch. 1997. Language-independent data-oriented
grapheme-to-phoneme conversion. In Progress in
Speech Synthesis, pages 77.89. Springer, New
York. - Alan W Black, Kevin Lenzo, and Vincent Pagel.
1998. Issues in building general letter to sound
rules. In The Third ESCA Workshop in Speech
Synthesis, pages 77-80.
64References
- Robert I. Damper, Yannick Marchand, John DS.
Marsters, and Alexander I. Bazin. 2005. Aligning
text and phonemes for speech technology
applications using an EM-like algorithm,
International Journal of Speech Technology,
8(2)147-160, June 2005. - Eric Sven Ristad and Peter N. Yianilos. 1998.
Learning string-edit distance. IEEE
Transactions on Pattern Analysis and Machine
Intelligence, 20(5)522.532. - Walter Daelemans, Jakub Zavrel, Ko Van Der Sloot,
and Antal Van Den Bosch. 2004. TiMBL Tilburg
Memory Based Leaner, version 5.1, reference
guide. In ILK Technical Report Series 04-02.,
2004.