PRONUNCIATION DICTIONARIES - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

PRONUNCIATION DICTIONARIES

Description:

... dictionary? ... be added to Malay dictionary. Simplified Speech Recognition Architecture ... free, non-commercial applications, available online ... – PowerPoint PPT presentation

Number of Views:332
Avg rating:3.0/5.0
Slides: 13
Provided by: kk13
Category:

less

Transcript and Presenter's Notes

Title: PRONUNCIATION DICTIONARIES


1
PRONUNCIATION DICTIONARIES
  • Dr. Bali RANAIVO-MALANÇON
  • Unit Terjemahan Melalui Komputer
  • Universiti Sains Malaysia

2
DefinitionWhat is a pronunciation dictionary?
  • A pronunciation dictionary (or Phonetic
    dictionary) is a list of words following by their
    phonetic transcriptions.
  • Phonetic transcriptions
  • Canonical
  • pronunciation

Variant pronunciations
Phonological rules to generate variant
pronunciations
3
A few linguistic basic knowledge
Notation ltgt orthographic representation ltbuahgt
character representation b, u, a,
h // phonemic representation /buah/ phonetic
representation buwah
PHONOLOGY (or phonemics) study distinctive sound
units, the patterns they form, and the rules
which regulate their use Phonemes / Phones /r/
PHONETICS study the inventory and structure of
the sounds of language Allophones r R ?
4
Examples of pronunciation dictionaries
Verbmobil "Ubernachtungen Qyb6n'axtUN_at_n "Uberna
chtungskosten Qyb6n'axtUNsk"Ost_at_n "Ubernachtun
gsm"oglichk Qyb6n'axtUNsm"2klICk
PHONOLEX "Ubernachtungsgeldes CLnom ORsb
TPptra Qyb6naxtUNsgEld_at_s "Ubernachtungskost
en ORvm TPmanu Qyb6n'axtUNsk"Ost_at_n
yb6naxtUNskOst_at_n 1 VM MAUS yb6naxtUNskOsn 1 VM
MAUS
CMUdict (Carnegie Mellon Pronouncing dictionary)
5
ApplicationsWhy do we need pronunciation
dictionaries?
  • Speech technologies to help phonetic labeling
  • Automatic Speech Recognition (ASR)
  • - Tan Tien Pieng -
  • Text-To-Speech (TTS)
  • - Nur Hana Samsudin -
  • Pronunciation can be added to Malay dictionary

6
Simplified Speech Recognition ArchitectureJurafsk
y D., Martin J. H. (2000) Speech and Language
Processing, Prentice-Hall, Inc.
Speech Waveform
7
MBROLA Malay Diphone Database
  • Diphones
  • Speech units that begin in the middle of a phone
    and end in the middle of the following one.
  • Concatenative synthesis
  • Minimize concatenation problems
  • Require an affordable amount of memory
  • MBROLA (Multi Band Resynthesis OverLap Add)
  • Speech synthesizer based on the concatenation of
    diphones
  • Faculté Polytechnique de Mons, Belgium, 1996,
  • Synthesizers for many languages, e.g. Indonesian,
    British, American English, Arab
  • Synthesizers Diphone database free,
    non-commercial applications, available online
  • As MBROLA provides all facilities (programs,
    guidelines, assistance, etc.) to build a
    synthesizer, we can focus our research only on
    preparing the diphone data to built the Malay
    synthesizer

8
Building diphone database
Pronunciation Dictionary saya, saja
List of phones a, j, s,
Combine two phones
List of diphones aj, ja, sa,
List of diphones aa, aj, as, ja, jj,
js, sa, sj, ss,
9
ResourcesWhat do we have today to build the
Malay pronunciation dictionary?
  • Linguistic resources
  • List of Malay words ? 60,000 words or tokens
  • List of Malay abbreviations and their expansions
  • List of Malay proper names
  • Malay corpus novels, academic
  • Phonological rules (Dr Tajul)
  • Programs, Techniques, Algorithms
  • Grapheme-to-phoneme converter
  • Statistical techniques

UTMKs future researches on speech Applications
of the Malay pronunciation dictionary
  • From readings (books, reports, etc.)
  • Knowledge about pronunciation dictionary
  • applications,
  • needs,
  • techniques, algorithms, implementation

10
Building the pronunciation dictionary
  • Define phoneme inventories and use
    machine-readable phonetic alphabets (ASCII-IPA
    alphabets), e.g. SAMPA, TIMIT, etc.)
  • IPA SAMPA TIMIT
  • ? S sh she
  • ? jh joke
  • ? N ng sing
  • Define phonological rules in a form adapted to
    computation
  • Etymology information
  • Arab ltmaafgt ma?af
  • Malay ltgunaangt guna?an
  • Morphological analysis
  • ltpakaigt paka?
  • ltdiketuaigt dik?tuwaji
  • Rewriting rules order rules
  • Two-level morphology without rule-ordering
  • Implementation using finite-state transducers

11
Building
  • Differentiate homographs,
  • semak_Noun s?ma?
  • semak_Verb sema?
  • Pronunciation of
  • proper names
  • abbreviations, e.g. Proton
  • numbers, e.g. Boeing 747
  • some characters, e.g. _at_ and . in
    ranaivo_at_cs.usm.my
  • Grapheme to phoneme converter
  • Experts checking

12
Conclusion
  • Structure of Malay pronunciation dictionary
  • word, lexcat, etym, pht, nbph
  • lexcat lexical category
  • etym etymology
  • MAL(ay), IND(onesian), ENG(lish), AR(a)B,
    OTH(er)
  • pht phonetic transcription
  • using one ASCII-API alphabets (not defined
    yet)
  • nbph number of phones
  • Set of phonological rules to derive variant
    pronunciations
  • TERIMA KASIH
Write a Comment
User Comments (0)
About PowerShow.com