4.5 Machine learning of phonological rules - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

4.5 Machine learning of phonological rules

Description:

This is usually done with a large dictionary augmented with a system (such as a ... Name-name compounds and rhyming heuristics: 89 ... – PowerPoint PPT presentation

Number of Views:467
Avg rating:3.0/5.0
Slides: 22
Provided by: MCS100
Category:

less

Transcript and Presenter's Notes

Title: 4.5 Machine learning of phonological rules


1
4.5 Machine learning of phonological rules
Presenter ??? (Ester)
  • Machine learning systems automatically induce a
    model for some domain, given some data and
    potentially other information.
  • Supervised algorithms are given correct answers
    for some of the data and use these answers to
    induce generalizations to apply to further data.
  • Unsupervised algorithms work only from data, plus
    potentially some learning biases.

2
Learning rules (cont.)
  • Ex Johnson (1984)/Touretzky et. al (1990) learn
    SPE (Sound Pattern of English (SPE)-style rules
    from a corpus of input/output pairs.
  • Ex Gildea Jurafsky (1996) specialize a
    learning algorithm (based on OSTIA algorithm) for
    a subtype of FSTs (called subsequential
    transducers) to learn two-level phonological
    transducers from a corpus of input/output pairs.
  • Required learning biases Faithfulness and
    Community
  • -- Faithfulness underlying segments tend to
    be realized similarly on the surface.
  • -- Community similar segments behave
    similarly.
  • If SPE-style rules can be implemented as FSTs
    automatically, why learn the FSTs directly?

3
4.6 Mapping text to phones for TTS
  • Text-To-Speech (TTS)
  • Map orthography to phonetic transcription
  • Add in prosody
  • Map phonetic transcription prosody to acoustic
    signal
  • Pronunciation dictionaries are used for both
    text-to-speech (TTS) and automatic speech
    recognition (ASR) systems. They give the
    pronunciation of words as strings of phones,
    sometimes including syllabification and stress.

4
Pronunciation dictionaries (cont.)
  • A list of words and their pronunciations
  • No morphological or phonological rules
  • Three large pronunciation dictionaries
  • Designed for ASR, but can be adapted for speech
    synthesis.

5
ARPAbet phone set and corresponding IPA symbols
6
Problems for TTS and ASR dictionaries
  • Homographs (two distinct words spelled the same)
  • TTS dictionaries essential
  • ASR dictionaries usually ignored
  • (2) Proper names dictionaries often dont
    include many proper names, e.g. Dr. , 2/3.
  • (3) Function words highly variable
    pronunciations, e.g. and, I, a, the, of, etc.
  • One significant difference between TTS and ASR
  • --TTS dictionaries dont represent dialectal
    variation
  • --ASR dictionaries represent dialectal
    variation

7
Beyond dictionary lookup Text analysis
  • Text analysis
  • The text-analysis component of a
    text-to-speech system maps from orthography to
    strings of phones. This is usually done with a
    large dictionary augmented with a system (such as
    a transducer) for handling productive morphology,
    pronunciation changes, names, numbers, and
    acronyms.

8
Problems for pronunciation modeling
  • Names
  • -- Impossible to list all proper names in
    advance.
  • -- Come from any language and have variable
    spellings.
  • -- Include not only peoples names but also
    company names and product names.
  • -- 21 of 33 million words of AP newswire
    were names.
  • Morphological productivity e.g. names, acronyms
    and new words are often inflected.
  • Numbers with different possible pronunciations
  • -- Serial, combined, paired, hundreds,
    trailing unit

9
FST-based approach
  • Five components
  • 1. large morpheme pronunciation dictionary,
    encoded as an FST
  • 2. FSAs for morphology (possible sequencing of
    morphemes)
  • 3. FSTs for morphophonology (like spelling change
    rules), e.g., pronunciation of -s in different
    contexts
  • 4. heuristics and letter-to-sound (LTS) rules/
    transducers for names and acronyms
  • 5. default LTS rules/transducers for other
    unknown words

10
Architecture
  • Lexical (underlying), intermediate and surface
    levels all contain two tapes, one for orthography
    and one for pronunciation, separated by , e.g.
    ck.
  • Lexicon-FST composed of two-level lexicon
    (lexical level and intermediate level) plus
    FSAs/FSTs for morphology (two levels separated by
    the )
  • e.g. PL? sz (see Fig. 4.21-4.23)
  • The automaton adds the morphological features
    N and PL at the lexical level, and also
    adds the plural suffix sz at the intermediate
    level.
  • FST1... FSTn orthographic rules (or spelling
    rules) and phonological rules, all run in
    parallel so as to map between this intermediate
    level and the surface level.

11
Architecture (cont.)
  • Text-to-speech applications Mapping between
    lexicon and surface form for orthography and
    phonology simultaneously, e.g. text generation,
    reading text out loud applications

12
Names
  • Liberman Church (1992) attempt to handle most
    frequent 250,000 name tokens
  • -- Dictionary of pronunciations of 50,000
    names 59
  • -- Stress-neutral suffixes (-s, -son,
    -ville) 84
  • -- Name-name compounds and rhyming
    heuristics 89
  • -- Prefixes, stress-changing suffixes and
    suffix- exchanges ??
  • -- Letter-to-sound (LTS) rules for the
    remainder.

13
4.7 Prosody in TTS
  • Prosody operates on longer linguistic units than
    phones, and is the study of suprasegmental
    phenomena.
  • Phonological aspects of prosody prominence,
    structure and tune.
  • Prominence stress (lexical and sentential) and
    accent
  • -- lexical stress e.g. table (stress on the
    first syllable)
  • -- accent
  • ? unaccented e.g. function words (there,
    the, a)
  • ? larger accent patterns e.g. new truck
    (accent on the right word)

14
Phonological aspects of prosody
  • (2) Prosodic structure some words group
    naturally together and some words have a
    noticeable break or disjuncture between them.
  • Prosodic phrasing
  • -- Intonational phrases larger prosodic units
  • e.g. I wanted to go to London, but could only
    get tickets for France.
  • -- Intermediate phrases lesser prosodic
    phrase boundaries
  • e.g. I wanted to go to London

15
Phonological aspects of prosody (cont.)
  • (3) Tune intonational melody of an utterance
  • Intonational tunes include pitch accent
  • Pitch accents occur on stressed syllables and
    form a characteristic pattern in the F0 contour.

16
English pitch accents (Pierrehumbert 1980)
  • H high (on a stressed syllable)
  • L low (on a stressed syllable)
  • LH rise, starting on a stressed syllable
  • LH rise, ending on a stressed syllable
  • HL fall, ending on a stressed syllable
  • (HL apparently not needed)
  • e.g. oh, really ? excited version (LH)
  • ? sceptical version
    (LH)
  • ? angry version (L)

17
Other components of the English system
  • ToBI model has two phrase accents (L- and H-) and
    two boundary tones (L and H)
  • Phrase accents occur at an intermediate phrase
    boundary.
  • ? L-
  • ? H-
  • Boundary tones are used at the ends of phrases
    to control whether the intonational tune rises or
    falls.
  • ? L
  • ? H

18
Phonetic or acoustic aspects of prosody
  • Prominence Prominent syllables are generally
    louder and longer than that non-prominent
    syllables.
  • Prosodic structure Prosodic phrase boundaries
    are often accompanied by pauses, by lengthening
    of the syllable just before the boundary, and
    sometimes lowering of pitch at the boundary.
  • Tune Tune is manifested in the fundamental
    frequency (F0) contour.

19
Prosody in speech synthesis
  • Major task for a TTS system
  • --To generate linguistic representations of
    prosody
  • --To generate acoustic patterns which will be
    manifested in the output speech waveform
  • Output of a TTS system
  • A sequence of phones, each of which has a
    duration and an F0 (pitch) value.
  • --Duration of each phone dependent on
    phonetic context
  • --F0 value influenced by lexical stress,
    accented element, intonational tune (e.g. a
    final rise for questions)

20
The FESTIVAL speech synthesis system
  • sample Do you really want to see all of it?

Fig. 4.26 The F0 contour of the FESTIVAL
21
4.8 Human processing of phonology and morphology
  • The distinction of representation via rules from
    representation via lexical listing. ? wrong
  • Data-driven models of morphological learning and
    representation e.g. English past tense -ed
  • --Connectionist model regular morpheme -ed
    emerges from its frequent interaction with other
    forms.
  • --Dual processing model regular forms like
    -ed are represent as symbolic rules, but
    subregular examples (broke, brought) are
    represented by connectionist-style pattern
    associators.
Write a Comment
User Comments (0)
About PowerShow.com