Title: Korean Phoneme Discrimination
1Korean Phoneme Discrimination
2Motivation
- Certain Korean phonemes are very difficult for
English speakers to distinguish - ?(IPA s)
- ?(IPA s? )
3Cepstral Analysis
- Need to modify sounds into a format meaningful to
the network - Mel Frequency Cepstral Coefficients (MFCC) are a
popular method of feature extraction. - MFCC take a discrete Fourier transform based on a
modified scale.
The mel scale
4Publication of Interest
- Recurrent Neural Networks for Phoneme Recognition
- Takuya Koizumi, Mikio Mori, Shuji Taniguchi, and
Mitsutoshi Maruya - Dept. of Information Science, Fukui University,
Japan - Applied recurrent neural networks to classify
phonemes from a Japanese word database
5Overview of recurrent neural networks
- In contrast with feed-forward networks, recurrent
neural networks can have cycles. - This means that the input can be split up among
multiple time steps. - In this publication, two types of recurrent
neural networks were studied.
6Type 1 RNN
7Type 2 RNN
8Benefits of recurrent neural networks
- Feedforward multi-layer neural networks are
inherently unable to deal with time-varying
information - In particular, some consonants are difficult to
distinguish.
9Group Classification Scheme
- In addition to having a single network classify
all phonemes, a two level hierarchy was
developed - Classify to which phonetic group a phoneme
belongs (unvoiced plosives, voiced plosives,
unvoiced frictaves, voiced frictavesglides,
nasals, vowels). - Classify phonemes within a specific phonetic
group
10Results
- Overall, recurrent neural networks were superior
to feed-forward neural networks (MLNN). - Overall, the group classification scheme was more
effective than a single RNN. - In most cases, the Type 1 RNN outperformed the
Type 2 RNN. - Training affects weights of all the
connections in the Type 1 RNN, while it affects
only part of the connections in the Type 2 RNN
11Detailed Results
Accuracies () Type 1 RNN Type 2 RNN MLNN
Single Network 84.9 75.1 68.5
Group Classification 91.9 88.1 81.3
Intra-group Recognition (average) 95.2 92.2 89.8
Overall Group Classification Scheme 88.1 -- --
12Application to Korean Classification Problem
- For unvoiced fricatives, the group to which ? and
? belong, the network performed as follows
Type 1 RNN Type 2 RNN MLNN
Accuracy () 87.6 84.0 81.1
13Questions?