4.5 Machine learning of phonological rules - PowerPoint PPT Presentation

1 / 21

About This Presentation

Title:

4.5 Machine learning of phonological rules

Description:

This is usually done with a large dictionary augmented with a system (such as a ... Name-name compounds and rhyming heuristics: 89 ... – PowerPoint PPT presentation

Number of Views:467

Avg rating:3.0/5.0

Slides: 22

Provided by: MCS100

Category:

more less

Transcript and Presenter's Notes

Title: 4.5 Machine learning of phonological rules

1
4.5 Machine learning of phonological rules
Presenter ??? (Ester)

Machine learning systems automatically induce a
model for some domain, given some data and
potentially other information.
Supervised algorithms are given correct answers
for some of the data and use these answers to
induce generalizations to apply to further data.
Unsupervised algorithms work only from data, plus
potentially some learning biases.

2
Learning rules (cont.)

Ex Johnson (1984)/Touretzky et. al (1990) learn
SPE (Sound Pattern of English (SPE)-style rules
from a corpus of input/output pairs.
Ex Gildea Jurafsky (1996) specialize a
learning algorithm (based on OSTIA algorithm) for
a subtype of FSTs (called subsequential
transducers) to learn two-level phonological
transducers from a corpus of input/output pairs.
Required learning biases Faithfulness and
Community
-- Faithfulness underlying segments tend to
be realized similarly on the surface.
-- Community similar segments behave
similarly.
If SPE-style rules can be implemented as FSTs
automatically, why learn the FSTs directly?

3
4.6 Mapping text to phones for TTS

Text-To-Speech (TTS)
Map orthography to phonetic transcription
Add in prosody
Map phonetic transcription prosody to acoustic
signal
Pronunciation dictionaries are used for both
text-to-speech (TTS) and automatic speech
recognition (ASR) systems. They give the
pronunciation of words as strings of phones,
sometimes including syllabification and stress.

4
Pronunciation dictionaries (cont.)

A list of words and their pronunciations
No morphological or phonological rules
Three large pronunciation dictionaries
Designed for ASR, but can be adapted for speech
synthesis.

5
ARPAbet phone set and corresponding IPA symbols
6
Problems for TTS and ASR dictionaries

Homographs (two distinct words spelled the same)
TTS dictionaries essential
ASR dictionaries usually ignored
(2) Proper names dictionaries often dont
include many proper names, e.g. Dr. , 2/3.
(3) Function words highly variable
pronunciations, e.g. and, I, a, the, of, etc.
One significant difference between TTS and ASR
--TTS dictionaries dont represent dialectal
variation
--ASR dictionaries represent dialectal
variation

7
Beyond dictionary lookup Text analysis

Text analysis
The text-analysis component of a
text-to-speech system maps from orthography to
strings of phones. This is usually done with a
large dictionary augmented with a system (such as
a transducer) for handling productive morphology,
pronunciation changes, names, numbers, and
acronyms.

8
Problems for pronunciation modeling

Names
-- Impossible to list all proper names in
advance.
-- Come from any language and have variable
spellings.
-- Include not only peoples names but also
company names and product names.
-- 21 of 33 million words of AP newswire
were names.
Morphological productivity e.g. names, acronyms
and new words are often inflected.
Numbers with different possible pronunciations
-- Serial, combined, paired, hundreds,
trailing unit

9
FST-based approach

Five components
1. large morpheme pronunciation dictionary,
encoded as an FST
2. FSAs for morphology (possible sequencing of
morphemes)
3. FSTs for morphophonology (like spelling change
rules), e.g., pronunciation of -s in different
contexts
4. heuristics and letter-to-sound (LTS) rules/
transducers for names and acronyms
5. default LTS rules/transducers for other
unknown words

10
Architecture

Lexical (underlying), intermediate and surface
levels all contain two tapes, one for orthography
and one for pronunciation, separated by , e.g.
ck.
Lexicon-FST composed of two-level lexicon
(lexical level and intermediate level) plus
FSAs/FSTs for morphology (two levels separated by
the )
e.g. PL? sz (see Fig. 4.21-4.23)
The automaton adds the morphological features
N and PL at the lexical level, and also
adds the plural suffix sz at the intermediate
level.
FST1... FSTn orthographic rules (or spelling
rules) and phonological rules, all run in
parallel so as to map between this intermediate
level and the surface level.

11
Architecture (cont.)

Text-to-speech applications Mapping between
lexicon and surface form for orthography and
phonology simultaneously, e.g. text generation,
reading text out loud applications

12
Names

Liberman Church (1992) attempt to handle most
frequent 250,000 name tokens
-- Dictionary of pronunciations of 50,000
names 59
-- Stress-neutral suffixes (-s, -son,
-ville) 84
-- Name-name compounds and rhyming
heuristics 89
-- Prefixes, stress-changing suffixes and
suffix- exchanges ??
-- Letter-to-sound (LTS) rules for the
remainder.

13
4.7 Prosody in TTS

Prosody operates on longer linguistic units than
phones, and is the study of suprasegmental
phenomena.
Phonological aspects of prosody prominence,
structure and tune.
Prominence stress (lexical and sentential) and
accent
-- lexical stress e.g. table (stress on the
first syllable)
-- accent
? unaccented e.g. function words (there,
the, a)
? larger accent patterns e.g. new truck
(accent on the right word)

14
Phonological aspects of prosody

(2) Prosodic structure some words group
naturally together and some words have a
noticeable break or disjuncture between them.
Prosodic phrasing
-- Intonational phrases larger prosodic units
e.g. I wanted to go to London, but could only
get tickets for France.
-- Intermediate phrases lesser prosodic
phrase boundaries
e.g. I wanted to go to London

15
Phonological aspects of prosody (cont.)

(3) Tune intonational melody of an utterance
Intonational tunes include pitch accent
Pitch accents occur on stressed syllables and
form a characteristic pattern in the F0 contour.

16
English pitch accents (Pierrehumbert 1980)

H high (on a stressed syllable)
L low (on a stressed syllable)
LH rise, starting on a stressed syllable
LH rise, ending on a stressed syllable
HL fall, ending on a stressed syllable
(HL apparently not needed)
e.g. oh, really ? excited version (LH)
? sceptical version
(LH)
? angry version (L)

17
Other components of the English system

ToBI model has two phrase accents (L- and H-) and
two boundary tones (L and H)
Phrase accents occur at an intermediate phrase
boundary.
? L-
? H-
Boundary tones are used at the ends of phrases
to control whether the intonational tune rises or
falls.
? L
? H

18
Phonetic or acoustic aspects of prosody

Prominence Prominent syllables are generally
louder and longer than that non-prominent
syllables.
Prosodic structure Prosodic phrase boundaries
are often accompanied by pauses, by lengthening
of the syllable just before the boundary, and
sometimes lowering of pitch at the boundary.
Tune Tune is manifested in the fundamental
frequency (F0) contour.

19
Prosody in speech synthesis

Major task for a TTS system
--To generate linguistic representations of
prosody
--To generate acoustic patterns which will be
manifested in the output speech waveform
Output of a TTS system
A sequence of phones, each of which has a
duration and an F0 (pitch) value.
--Duration of each phone dependent on
phonetic context
--F0 value influenced by lexical stress,
accented element, intonational tune (e.g. a
final rise for questions)

20
The FESTIVAL speech synthesis system

sample Do you really want to see all of it?

Fig. 4.26 The F0 contour of the FESTIVAL
21
4.8 Human processing of phonology and morphology

The distinction of representation via rules from
representation via lexical listing. ? wrong
Data-driven models of morphological learning and
representation e.g. English past tense -ed
--Connectionist model regular morpheme -ed
emerges from its frequent interaction with other
forms.
--Dual processing model regular forms like
-ed are represent as symbolic rules, but
subregular examples (broke, brought) are
represented by connectionist-style pattern
associators.