Letter-to-phoneme conversion - PowerPoint PPT Presentation

1 / 64

About This Presentation

Title:

Letter-to-phoneme conversion

Description:

(Toutanova and Moore, 2001) Linguistic interest of relationships between letters and phonemes. ... look at GIZA , Pharaoh, Carmel, etc. University of Alberta ... – PowerPoint PPT presentation

Number of Views:106

Avg rating:3.0/5.0

Slides: 65

Provided by: Kar80

Category:

more less

Transcript and Presenter's Notes

Title: Letter-to-phoneme conversion

1
Letter-to-phoneme conversion

Sittichai Jiampojamarn
sj_at_cs.ualberta.ca
CMPUT 500 / HUCO 612
September 26, 2007

2
Outline

Part I
Introduction to letter-phoneme conversion
Part II
Many-to-Many alignments and Hidden Markov Models
to Letter-to-phoneme conversion., NAACL 2007
Part III
On-going work discriminative approaches for
letter-to-phoneme conversion
Part IV
Possible term projects for CMPUT 500 / HUGO 612

3
The task

Converting words to their pronunciations
study -gt s t ? d I
band -gt b æ n d
phoenix -gt f i n I k s
king -gt k I ?
Words ? sequences of letters.
Pronunciations ? sequence of phonemes.
Ignoring syllabifications, and stresses.

4
Why is it important?

Major component in speech synthesis systems
Word similarity based on pronunciation
Spelling correction. (Toutanova and Moore, 2001)
Linguistic interest of relationships between
letters and phonemes.
Not a trivial task, but tractable.

5
Trivial solutions ?

Dictionary searching answers on database
Great effort to construct such large lexicon
database.
Cant handle new words and misspellings.
Rule-based approaches
Work well on non-complex languages
Fail on complex languages
Each word creates its own rules. --- end up with
remembering word-phoneme pairs.

6
John Kominek and Alan W. Black, Learning
Pronunciation Dictionaries Language Complexity
and Word Selection Strategies, In proceeding of
HLT-NAACL 2006, June 4-9, pp.232-239
7
Learning-based approaches

Training data
Examples of words and their phonemes.
Hidden structure
band ? b æ n d
b ?b, a ? æ, n ? n, d ? d
abode ? ? b o d
a ? ?, b ? b, o ? o, d ? d, e ? _

8
Alignments

To train L2P, we need alignments between letters
and phonemes

a -gt ? b -gt b o -gt o d -gt d e
-gt _
9
Overview standard process
10
Letter-to-phoneme alignments

Previous work assumed one-to-one alignment for
simplicity (Daelemans and Bosch, 1997 Black et
al., 1998 Damper et al., 2005).
Expectation-Maximization (EM) algorithms are used
to optimize the alignment parameters.
Matching all possible letters and phonemes
iteratively until the parameters converge.

11
1-to-1 alignments

Initially, alignments parameters can start from
uniform distribution, or counting all possible
letter-phoneme mapping. Ex. abode ? ? b o d

P(a, ?) 4/5 P(b,b) 3/5
12
1-to-1 alignments

Find the best possible alignments based on
current alignment parameters.

Based on the alignments found, update the
parameters.

13
Finding the best possible alignments

Dynamic programming
Standard weighted minimum edit distance algorithm
style.
Consider the alignment parameter P(l,p) is a
mapping score component.
Try to find alignments which give the maximum
score.
Allow to have null phonemes but not null letters
It is hard to incorporate null letters in the
testing data

14
Visualization
15
Visualization
16
Visualization
17
Visualization
18
Visualization
19
Visualization
20
Visualization
21
Visualization
22
Visualization
23
Visualization
24
Problems with 1-to-1 alignments

Double letters two letters map to one phoneme.
(e.g. ng ?, sh ?, ph f)

25
Problem with 1-to-1 alignments

Double phonemes one letter maps to two phonemes.
(e.g. x k s, u j u)

26
Previous solutions for double phonemes

Preprocess using a fix list of phonemes.
k s -gt X
j u -gt U

Lose "j" and "u"
27
Applying many-to-many alignments and Hidden
Markov Models to Letter-to-Phoneme conversion

Sittichai Jiampojamarn, Grzegorz Kondrak and
Tarek Sherif
Proceedings of the Annual Conference of the North
American Chapter of the Association for
Computational Linguistics (NAACL-HLT 2007),
Rochester, NY, April 2007, pp.372-379.

28
Overview system
29
Many-to-many alignments

EM-based method.
Extended from the forward-backward training of a
one-to-one stochastic transducer (Ristad and
Yianilos, 1998).
Allow one or two letters to map to null, one, or
two phonemes.

30
Many-to-many alignments
31
Many-to-many alignments
32
Many-to-many alignments
33
Prediction problem

Should the prediction model generate phonemes
from one or two letters ?
gash g æ ? gasholder g æ s h o l d ? r

34
Letter chunking

A bigram letter chunking prediction automatic
discovers double letters.

Ex. longs
35
Overview system
36
Phoneme prediction

Once the training examples are aligned, we need a
phoneme prediction model.
Classification task or sequence prediction?

37
Instance based learning

Store the training examples.
The predicted class is assigned by searching the
most similar training instance.
The similarity functions
Hamming distance, Euclidean distance, etc.

38
Basic HMMs

A basic sequence-based prediction method.
In L2P,
letters are observations
phonemes are states
Output phoneme sequences depend on both emission
and transition probabilities.

39
Applying HMM

Use an instance based learning to produce a list
of candidate phones with confidence values
conf(phonei) for each letteri. (emission
probability).
Use a language model of phoneme sequence in the
training data to obtain transition probability
P(phonei phonei-1, phonei-n).

40
Visualization
Buried -gt b E r aI d 2.38 x 10-8
Buried -gt b E r I d 2.23 x 10-6
41
Evaluation

Data sets
English CMUDict (112K), Celex (65K).
Dutch Celex (116K).
German Celex (49K).
French Brulex (27K).
IB1 algorithm implemented in TiMBL package as the
classifier.(W. Daelemans et al., 2004.)
Results are reported in word accuracy rate based
on 10-fold cross validation.

42
(No Transcript)
43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
Messages

Many-to-many alignments show significant
improvements over one-to-one traditional
alignments.
HMM-like approach helps when a local classify has
difficulty to predict phonemes.

47
Criticism

Joint models
Alignments, chunking, prediction, and HMM.
Error propagation
Errors from one model to other models which are
unlikely to re-correct.
Can we combine and optimize at once ? Or at least
allow the system to re-correct past errors ?

48
On-going work

Discriminative approach
for letter-to-phoneme conversion

49
Online discriminative learning

Let x is an input word and y is an output
phonemes.
represents features describing x and
y.
is a weight vector for

50
Online training algorithm

Initially,
For k iterations
For all letter-phoneme sequence pairs (x,y)
update weights according to and

51
Perceptron update (Collins, 2002)

Simple update training method.
Try to move the weights to the direction of
correct answers when predicting wrong answers.

52
Examples

Separable case

Adapted from Dan Kleins tutorial slides at NAACL
2007.
53
Examples

Non-separable case

Adapted from Dan Kleins tutorial slides at NAACL
2007.
54
Issues with Perceptron

Overtraining test / held-out accuracy usually
rises, then falls.
Regularization
if the data isnt separable, weights often thrash
around.
Finds a barely separating solution

Taken from Dan Kleins tutorial slides at NAACL
2007.
55
Margin Infused Relaxed Algorithm (MIRA) (Crammer
and Singer, 2003)

Use n-best list to update weights.
separate by a margin at least as large as a loss
function
and keep the weight changes as small as possible.

56
Loss function in letter-to-phoneme

Describe the loss of an incorrect prediction
compared to the correct one.
Word error (0/1), phoneme error, or combination.

57
Results

Incomplete !!!
MIRA outperforms Perceptron.
Using 0/1 loss and combination loss are better
than the phoneme loss function alone.
Overall, results show better performance than
previous work.

58
Possible term projects
59
Possible term projects

Explore more linguistic features.
Explore machine translation systems for
letter-to-phoneme conversion.
Unsupervised approaches for letter-to-phoneme
conversion.
Other cool ideas to improve on a partial system
Data for evaluation are provided
Alignments are provided.
L2P model are provided.

60
Linguistic features

Looking for linguistic features to help L2P
Most systems incorporate letter feature (n-gram)
type in some ways.
The new features (must) be obtained by using
(only) word information.
Works been already done
Syllabification Susans thesis
Find syllabification break on letters using SVM
approach.

61
Machine translation approach

L2P problem can be seen as a (simple) machine
translation problem.
Where, wed like to translate letters to
phonemes.
Consider L2P ? MT
Letters ? words
Words ? sentences
Phonemes ? target sentences
Moses -- a baseline SMT system, ACL 2007
http//www.statmt.org/wmt07/baseline.html
May need to also look at GIZA, Pharaoh, Carmel,
etc.

62
Unsupervised approaches

Assuming, we dont have examples of word-phoneme
pairs to train a model.
We can start from a list of possible
letter-phoneme mappings
Or assuming, we have a small set of example pairs
(100 pairs).
Dont expect to outperform the supervised
approach but take advantage of being unsupervised
methods

63
References

Collins, M. 2002. Discriminative training methods
for hidden Markov models theory and experiments
with perceptron algorithms. In Proceedings of the
Acl-02 Conference on Empirical Methods in Natural
Language Processing - Volume 10 Annual Meeting of
the ACL. Association for Computational
Linguistics, Morristown, NJ, 1-8
Crammer, K. and Singer, Y. 2003.
Ultraconservative online algorithms for
multiclass problems. J. Mach. Learn. Res. 3 (Mar.
2003), 951-991.
Kristina Toutanova and Robert C. Moore. 2001.
Pronunciation modeling for improved spelling
correction. In ACL02 pp144-151, 2001.
John Kominek and Alan W Black, Learning
Pronunciation Dictionaries Language Complexity
and Word Selection Strategies, NAACL06, pp.
232-239, 2006.
Walter M. P. Daelemans and Antal P. J. van den
Bosch. 1997. Language-independent data-oriented
grapheme-to-phoneme conversion. In Progress in
Speech Synthesis, pages 77.89. Springer, New
York.
Alan W Black, Kevin Lenzo, and Vincent Pagel.
1998. Issues in building general letter to sound
rules. In The Third ESCA Workshop in Speech
Synthesis, pages 77-80.

64
References

Robert I. Damper, Yannick Marchand, John DS.
Marsters, and Alexander I. Bazin. 2005. Aligning
text and phonemes for speech technology
applications using an EM-like algorithm,
International Journal of Speech Technology,
8(2)147-160, June 2005.
Eric Sven Ristad and Peter N. Yianilos. 1998.
Learning string-edit distance. IEEE
Transactions on Pattern Analysis and Machine
Intelligence, 20(5)522.532.
Walter Daelemans, Jakub Zavrel, Ko Van Der Sloot,
and Antal Van Den Bosch. 2004. TiMBL Tilburg
Memory Based Leaner, version 5.1, reference
guide. In ILK Technical Report Series 04-02.,
2004.