Korean Phoneme Discrimination - PowerPoint PPT Presentation

About This Presentation
Title:

Korean Phoneme Discrimination

Description:

Korean Phoneme Discrimination Ben Lickly Motivation Certain Korean phonemes are very difficult for English speakers to distinguish: (IPA: s) (IPA: s ... – PowerPoint PPT presentation

Number of Views:155
Avg rating:3.0/5.0
Slides: 14
Provided by: ben2180
Learn more at: https://www.cs.hmc.edu
Category:

less

Transcript and Presenter's Notes

Title: Korean Phoneme Discrimination


1
Korean Phoneme Discrimination
  • Ben Lickly

2
Motivation
  • Certain Korean phonemes are very difficult for
    English speakers to distinguish
  • ?(IPA s)
  • ?(IPA s? )

3
Cepstral Analysis
  • Need to modify sounds into a format meaningful to
    the network
  • Mel Frequency Cepstral Coefficients (MFCC) are a
    popular method of feature extraction.
  • MFCC take a discrete Fourier transform based on a
    modified scale.

The mel scale
4
Publication of Interest
  • Recurrent Neural Networks for Phoneme Recognition
  • Takuya Koizumi, Mikio Mori, Shuji Taniguchi, and
    Mitsutoshi Maruya
  • Dept. of Information Science, Fukui University,
    Japan
  • Applied recurrent neural networks to classify
    phonemes from a Japanese word database

5
Overview of recurrent neural networks
  • In contrast with feed-forward networks, recurrent
    neural networks can have cycles.
  • This means that the input can be split up among
    multiple time steps.
  • In this publication, two types of recurrent
    neural networks were studied.

6
Type 1 RNN
7
Type 2 RNN
8
Benefits of recurrent neural networks
  • Feedforward multi-layer neural networks are
    inherently unable to deal with time-varying
    information
  • In particular, some consonants are difficult to
    distinguish.

9
Group Classification Scheme
  • In addition to having a single network classify
    all phonemes, a two level hierarchy was
    developed
  • Classify to which phonetic group a phoneme
    belongs (unvoiced plosives, voiced plosives,
    unvoiced frictaves, voiced frictavesglides,
    nasals, vowels).
  • Classify phonemes within a specific phonetic
    group

10
Results
  • Overall, recurrent neural networks were superior
    to feed-forward neural networks (MLNN).
  • Overall, the group classification scheme was more
    effective than a single RNN.
  • In most cases, the Type 1 RNN outperformed the
    Type 2 RNN.
  • Training affects weights of all the
    connections in the Type 1 RNN, while it affects
    only part of the connections in the Type 2 RNN

11
Detailed Results
Accuracies () Type 1 RNN Type 2 RNN MLNN
Single Network 84.9 75.1 68.5
Group Classification 91.9 88.1 81.3
Intra-group Recognition (average) 95.2 92.2 89.8
Overall Group Classification Scheme 88.1 -- --
12
Application to Korean Classification Problem
  • For unvoiced fricatives, the group to which ? and
    ? belong, the network performed as follows

Type 1 RNN Type 2 RNN MLNN
Accuracy () 87.6 84.0 81.1
13
Questions?
Write a Comment
User Comments (0)
About PowerShow.com