Title: 4.5 Machine learning of phonological rules
14.5 Machine learning of phonological rules
Presenter ??? (Ester)
- Machine learning systems automatically induce a
model for some domain, given some data and
potentially other information. - Supervised algorithms are given correct answers
for some of the data and use these answers to
induce generalizations to apply to further data. - Unsupervised algorithms work only from data, plus
potentially some learning biases.
2Learning rules (cont.)
- Ex Johnson (1984)/Touretzky et. al (1990) learn
SPE (Sound Pattern of English (SPE)-style rules
from a corpus of input/output pairs. - Ex Gildea Jurafsky (1996) specialize a
learning algorithm (based on OSTIA algorithm) for
a subtype of FSTs (called subsequential
transducers) to learn two-level phonological
transducers from a corpus of input/output pairs. - Required learning biases Faithfulness and
Community - -- Faithfulness underlying segments tend to
be realized similarly on the surface. - -- Community similar segments behave
similarly. - If SPE-style rules can be implemented as FSTs
automatically, why learn the FSTs directly?
34.6 Mapping text to phones for TTS
- Text-To-Speech (TTS)
- Map orthography to phonetic transcription
- Add in prosody
- Map phonetic transcription prosody to acoustic
signal - Pronunciation dictionaries are used for both
text-to-speech (TTS) and automatic speech
recognition (ASR) systems. They give the
pronunciation of words as strings of phones,
sometimes including syllabification and stress.
4Pronunciation dictionaries (cont.)
- A list of words and their pronunciations
- No morphological or phonological rules
- Three large pronunciation dictionaries
- Designed for ASR, but can be adapted for speech
synthesis.
5 ARPAbet phone set and corresponding IPA symbols
6Problems for TTS and ASR dictionaries
- Homographs (two distinct words spelled the same)
- TTS dictionaries essential
- ASR dictionaries usually ignored
- (2) Proper names dictionaries often dont
include many proper names, e.g. Dr. , 2/3. - (3) Function words highly variable
pronunciations, e.g. and, I, a, the, of, etc. - One significant difference between TTS and ASR
- --TTS dictionaries dont represent dialectal
variation - --ASR dictionaries represent dialectal
variation
7Beyond dictionary lookup Text analysis
- Text analysis
- The text-analysis component of a
text-to-speech system maps from orthography to
strings of phones. This is usually done with a
large dictionary augmented with a system (such as
a transducer) for handling productive morphology,
pronunciation changes, names, numbers, and
acronyms.
8Problems for pronunciation modeling
- Names
- -- Impossible to list all proper names in
advance. - -- Come from any language and have variable
spellings. - -- Include not only peoples names but also
company names and product names. - -- 21 of 33 million words of AP newswire
were names. - Morphological productivity e.g. names, acronyms
and new words are often inflected. - Numbers with different possible pronunciations
- -- Serial, combined, paired, hundreds,
trailing unit
9FST-based approach
- Five components
- 1. large morpheme pronunciation dictionary,
encoded as an FST - 2. FSAs for morphology (possible sequencing of
morphemes) - 3. FSTs for morphophonology (like spelling change
rules), e.g., pronunciation of -s in different
contexts - 4. heuristics and letter-to-sound (LTS) rules/
transducers for names and acronyms - 5. default LTS rules/transducers for other
unknown words
10Architecture
- Lexical (underlying), intermediate and surface
levels all contain two tapes, one for orthography
and one for pronunciation, separated by , e.g.
ck. - Lexicon-FST composed of two-level lexicon
(lexical level and intermediate level) plus
FSAs/FSTs for morphology (two levels separated by
the ) - e.g. PL? sz (see Fig. 4.21-4.23)
- The automaton adds the morphological features
N and PL at the lexical level, and also
adds the plural suffix sz at the intermediate
level. - FST1... FSTn orthographic rules (or spelling
rules) and phonological rules, all run in
parallel so as to map between this intermediate
level and the surface level.
11Architecture (cont.)
- Text-to-speech applications Mapping between
lexicon and surface form for orthography and
phonology simultaneously, e.g. text generation,
reading text out loud applications
12Names
- Liberman Church (1992) attempt to handle most
frequent 250,000 name tokens - -- Dictionary of pronunciations of 50,000
names 59 - -- Stress-neutral suffixes (-s, -son,
-ville) 84 - -- Name-name compounds and rhyming
heuristics 89 - -- Prefixes, stress-changing suffixes and
suffix- exchanges ?? - -- Letter-to-sound (LTS) rules for the
remainder.
134.7 Prosody in TTS
- Prosody operates on longer linguistic units than
phones, and is the study of suprasegmental
phenomena. - Phonological aspects of prosody prominence,
structure and tune. - Prominence stress (lexical and sentential) and
accent - -- lexical stress e.g. table (stress on the
first syllable) - -- accent
- ? unaccented e.g. function words (there,
the, a) - ? larger accent patterns e.g. new truck
(accent on the right word)
14Phonological aspects of prosody
- (2) Prosodic structure some words group
naturally together and some words have a
noticeable break or disjuncture between them. - Prosodic phrasing
- -- Intonational phrases larger prosodic units
- e.g. I wanted to go to London, but could only
get tickets for France. - -- Intermediate phrases lesser prosodic
phrase boundaries - e.g. I wanted to go to London
15Phonological aspects of prosody (cont.)
- (3) Tune intonational melody of an utterance
- Intonational tunes include pitch accent
- Pitch accents occur on stressed syllables and
form a characteristic pattern in the F0 contour.
16English pitch accents (Pierrehumbert 1980)
- H high (on a stressed syllable)
- L low (on a stressed syllable)
- LH rise, starting on a stressed syllable
- LH rise, ending on a stressed syllable
- HL fall, ending on a stressed syllable
- (HL apparently not needed)
- e.g. oh, really ? excited version (LH)
- ? sceptical version
(LH) - ? angry version (L)
17Other components of the English system
- ToBI model has two phrase accents (L- and H-) and
two boundary tones (L and H) - Phrase accents occur at an intermediate phrase
boundary. - ? L-
- ? H-
- Boundary tones are used at the ends of phrases
to control whether the intonational tune rises or
falls. - ? L
- ? H
18Phonetic or acoustic aspects of prosody
- Prominence Prominent syllables are generally
louder and longer than that non-prominent
syllables. - Prosodic structure Prosodic phrase boundaries
are often accompanied by pauses, by lengthening
of the syllable just before the boundary, and
sometimes lowering of pitch at the boundary. - Tune Tune is manifested in the fundamental
frequency (F0) contour.
19Prosody in speech synthesis
- Major task for a TTS system
- --To generate linguistic representations of
prosody - --To generate acoustic patterns which will be
manifested in the output speech waveform - Output of a TTS system
- A sequence of phones, each of which has a
duration and an F0 (pitch) value. - --Duration of each phone dependent on
phonetic context - --F0 value influenced by lexical stress,
accented element, intonational tune (e.g. a
final rise for questions) -
20The FESTIVAL speech synthesis system
- sample Do you really want to see all of it?
Fig. 4.26 The F0 contour of the FESTIVAL
214.8 Human processing of phonology and morphology
- The distinction of representation via rules from
representation via lexical listing. ? wrong - Data-driven models of morphological learning and
representation e.g. English past tense -ed - --Connectionist model regular morpheme -ed
emerges from its frequent interaction with other
forms. - --Dual processing model regular forms like
-ed are represent as symbolic rules, but
subregular examples (broke, brought) are
represented by connectionist-style pattern
associators.