Goals and Objectives - PowerPoint PPT Presentation

1 / 212
About This Presentation
Title:

Goals and Objectives

Description:

Goals and Objectives – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 213
Provided by: stevegr4
Category:
Tags: eel | goals | objectives

less

Transcript and Presenter's Notes

Title: Goals and Objectives


1
From Here to Utility Melding Phonetic Insight
With Speech Technology Steven
Greenberg International Computer Science
Institute 1947 Center Street, Berkeley, CA
94704 http//www.icsi.berkeley.edu/steveng steven
g_at_icsi.berkeley.edu
2
Acknowledgements and Thanks
Automatic Feature Classification and
Analysis Joy Hollenback, Shawn Chang, Leah
Hitchcock Research Funding U.S. National
Science Foundation U.S. Department of Defense
3
Road Map of the Presentation
  • What is Truth?
  • The story of Rashomon, a film by Akira Kurosawa
  • Its application to spoken language

4
Road Map of the Presentation
  • What is Truth?
  • The story of Rashomon, a film by Akira Kurosawa
  • Its application to spoken language
  • The Varieties of Scientific Experience
  • The Fundamental Duality
  • The Eternal Pentangle
  • The Inner Triangle

5
Road Map of the Presentation
  • What is Truth?
  • The story of Rashomon, a film by Akira Kurosawa
  • Its application to spoken language
  • The Varieties of Scientific Experience
  • The Fundamental Duality
  • The Eternal Pentangle
  • The Inner Triangle
  • The Importance of Being Phonetically Annotated
  • A Corpus-Centric Perspective on Spoken Language
  • Phonetic Annotation of Spontaneous American
    English Discourse

6
Road Map of the Presentation
  • What is Truth?
  • The story of Rashomon, a film by Akira Kurosawa
  • Its application to spoken language
  • The Varieties of Scientific Experience
  • The Fundamental Duality
  • The Eternal Pentangle
  • The Inner Triangle
  • The Importance of Being Phonetically Annotated
  • A Corpus-Centric Perspective on Spoken Language
  • Phonetic Annotation of Spontaneous American
    English Discourse
  • Phonetic Dissection of Automatic Speech
    Recognition Systems
  • Stress Accent and Word Error Rate
  • Syllable Structure and Word Error Rate

7
Road Map of the Presentation
  • What is Truth?
  • The story of Rashomon, a film by Akira Kurosawa
  • Its application to spoken language
  • The Varieties of Scientific Experience
  • The Fundamental Duality
  • The Eternal Pentangle
  • The Inner Triangle
  • The Importance of Being Phonetically Annotated
  • A Corpus-Centric Perspective on Spoken Language
  • Phonetic Annotation of Spontaneous American
    English Discourse
  • Phonetic Dissection of Automatic Speech
    Recognition Systems
  • Stress Accent and Word Error Rate
  • Syllable Structure and Word Error Rate
  • The Relation Between Stress Accent and Vocalic
    Identity
  • The Relation Between Segmental Duration and Vowel
    Height
  • Durational Differences Between Stressed and
    Unstressed Vowels
  • The Relation Between Vowel Height and Stress
    Accent

8
Road Map of the Presentation
  • What is Truth?
  • The story of Rashomon, a film by Akira Kurosawa
  • Its application to spoken language
  • The Varieties of Scientific Experience
  • The Fundamental Duality
  • The Eternal Pentangle
  • The Inner Triangle
  • The Importance of Being Phonetically Annotated
  • A Corpus-Centric Perspective on Spoken Language
  • Phonetic Annotation of Spontaneous American
    English Discourse
  • Phonetic Dissection of Automatic Speech
    Recognition Systems
  • Stress Accent and Word Error Rate
  • Syllable Structure and Word Error Rate
  • The Relation Between Stress Accent and Vocalic
    Identity
  • The Relation Between Segmental Duration and Vowel
    Height
  • Durational Differences Between Stressed and
    Unstressed Vowels
  • The Relation Between Vowel Height and Stress
    Accent
  • Spoken Language What is Truth?
  • Fundamental Questions Remain Unanswered

9
Part One WHAT IS TRUTH?
  • The Story of Rashomon
  • Its Moral for the Study of Spoken Language

10
Rashomon What is Truth?
It is twelfth-century Japan, and a nobleman has
died .
11
Rashomon What is Truth?
This we learn from a conversation between a
woodcutter, a priest and a peasant under a gate
in the ancient city of Kyoto .
12
Rashomon What is Truth?
The woodcutter and the priest have just come from
a judicial inquest into the death, and are
telling the peasant what they have heard
13
Rashomon What is Truth?
The woodcutter and the priest have just come from
a judicial inquest into the death, and are
telling the peasant what they have heard The
woodcutter testified at the inquest, having
witnessed the sequence of events resulting in the
Noblemans death
14
Rashomon What is Truth?
The story begins with the capture of the
notorious bandit, Tajomaru, who is the accused in
the noblemans death .
15
Rashomon What is Truth?
The nobleman and his wife had been traveling
through the forest .
16
Rashomon What is Truth?
When, all of a sudden,
17
Rashomon What is Truth?
When, all of a sudden, they are confronted by
Tajomaru, who halts their progress .
18
Rashomon What is Truth?
The nobleman and bandit go off alone into a
thicket, where the former winds up being subdued
by the latter
19
Rashomon What is Truth?
The nobleman is tied to a tree and forced to
watch as his wife is violated by the bandit
20
Rashomon What is Truth?
The wife, at first, resists .
21
Rashomon What is Truth?
But eventually drops the dagger and submits
22
Rashomon What is Truth?
So far, all parties concerned agree (roughly) as
to the course of events, but from this point on
the picture becomes murky, with each participant
telling a somewhat different version of the
story
23
Rashomon What is Truth?
In two versions (Tajomarus and the woodcutters)
the wife insists that her husband and the bandit
fight for her honor. The noblemans death results
from losing the duel.
24
Rashomon What is Truth?
In the wifes version, the bandit departs, with
the husband still tied to the tree. The husband
proceeds to taunt his wife, telling her how
ashamed he is of her!
25
Rashomon What is Truth?
She cuts the rope binding her husband to the tree
and asks to be killed! The wife promptly faints
and when she awakens, finds the dagger in the
chest of her (now very dead) husband
26
Rashomon What is Truth?
In yet another version (the husbands through a
spirit medium) his wife betrays him and tries to
convince the bandit to kill the husband
27
Rashomon What is Truth?
However, the bandit is repulsed by this
suggestion and quickly departs .
28
Rashomon What is Truth?
However, the bandit is repulsed by this
suggestion and quickly departs . The nobleman,
still tied to the tree, picks up the dagger and
plunges it into his chest, thus taking his own
life
29
Rashomon What is Truth?
However, the bandit is repulsed by this
suggestion and quickly departs . The nobleman,
still tied to the tree, picks up the dagger and
plunges it into his chest, thus taking his own
life Some time later the (now very dead) nobleman
is aware of someone (it is not clear who)
removing the dagger from his chest
30
Rashomon What is Truth?
The film ends as the priest, woodcutter and
peasant mull over the significance of the
disparate accounts of the noblemans death,
seeking some kernel of truth in the morass of
ambiguity and uncertainty
31
Rashomon What is Truth?
The film ends as the priest, woodcutter and
peasant mull over the significance of the
disparate accounts of the noblemans death,
seeking some kernel of truth in the morass of
ambiguity and uncertainty It is unclear whether
ANY witness has been entirely truthful
32
Rashomon What is Truth?
The film ends as the priest, woodcutter and
peasant mull over the significance of the
disparate accounts of the noblemans death,
seeking some kernel of truth in the morass of
ambiguity and uncertainty It is unclear whether
ANY witness has been entirely truthful (probably
not)
33
Rashomon What is Truth?
The story of Rashomon is cited often in
philosophical discussions of truth
34
Rashomon What is Truth?
The story of Rashomon is cited often in
philosophical discussions of truth As nothing
is known (or knowable) with absolute certainty,
all knowledge is relative (and hence ephemeral)
35
Rashomon What is Truth?
The story of Rashomon is cited often in
philosophical discussions of truth As nothing
is known (or knowable) with absolute certainty,
all knowledge is relative (and hence ephemeral)
The concept of truth is a chimera
36
Rashomon What is Truth?
The story of Rashomon is cited often in
philosophical discussions of truth As nothing
is known (or knowable) with absolute certainty,
all knowledge is relative (and hence ephemeral)
The concept of truth is a chimera
37
Rashomon What is Truth?
The story of Rashomon is cited often in
philosophical discussions of truth As nothing
is known (or knowable) with absolute certainty,
all knowledge is relative (and hence ephemeral)
The concept of truth is a chimera and therefore
unworthy of pursuit
38
Rashomon What is Truth?
Yet, there is an alternative interpretation, one
that questions not the concept of truth itself,
but rather the capacity of its assimilation
through a single vantage point
39
Rashomon What is Truth?
Yet, there is an alternative interpretation, one
that questions not the concept of truth itself,
but rather the capacity of its assimilation
through a single vantage point Perhaps the true
message of Rashomon is that deep and ever-lasting
knowledge can only be gained through exposure to
a variety of perspectives,
40
Rashomon What is Truth?
Yet, there is an alternative interpretation, one
that questions not the concept of truth itself,
but rather the capacity of its assimilation
through a single vantage point Perhaps the true
message of Rashomon is that deep and ever-lasting
knowledge can only be gained through exposure to
a variety of perspectives, No single source
providing sufficient depth and detail to
comprehend a situation as complex (and as tragic)
as the murder of a man
41
Spoken Language What is Truth?
Can an intellectual domain as complex as spoken
language be fully understood through the
testimony of a single perspective?
42
Spoken Language What is Truth?
Can an intellectual domain as complex as spoken
language be fully understood through the
testimony of a single perspective? Or must
orthogonal varieties of evidence be sought with
which to reconstruct the truth?
43
Spoken Language What is Truth?
Can an intellectual domain as complex as spoken
language be fully understood through the
testimony of a single perspective? Or must
orthogonal varieties of evidence be sought with
which to reconstruct the truth? How does true
insight proceed from objective study of spoken
language?
44
Spoken Language What is Truth?
Can an intellectual domain as complex as spoken
language be fully understood through the
testimony of a single perspective? Or must
orthogonal varieties of evidence be sought with
which to reconstruct the truth? How does true
insight proceed from objective study of spoken
language? Is it possible to fully comprehend the
multivocal nature of a scientific domain from the
sole vantage point of a laboratory?
45
Spoken Language What is Truth?
Can an intellectual domain as complex as spoken
language be fully understood through the
testimony of a single perspective? Or must
orthogonal varieties of evidence be sought with
which to reconstruct the truth? How does true
insight proceed from objective study of spoken
language? Is it possible to fully comprehend the
multivocal nature of a scientific domain from the
sole vantage point of a laboratory? Or does the
spirit of Rashomon compel us to seek testimony
from other sources in the pursuit of objective
knowledge?
46
Part Two THE VARIETIES OF SCIENTIFIC
EXPERIENCE
  • The Fundamental Duality
  • The Eternal Pentangle
  • The Inner Triangle

47
The Fundamental Duality
Technology and science appear to oppose each
other in perspective
48
The Fundamental Duality
  • Technology and science appear to oppose each
    other in perspective
  • Technology is concerned with what works

The Art of the Workable
49
The Fundamental Duality
  • Technology and science appear to oppose each
    other in perspective
  • Technology is concerned with what works (and can
    sell)

The Art of the Sellable
The Art of the Workable
50
The Fundamental Duality
  • Technology and science appear to oppose each
    other in perspective
  • Technology is concerned with what works (and can
    sell)
  • Science is concerned with what is

The Art of the Workable
The Art of the Sellable
The Art of the Soluble
51
The Fundamental Duality
  • Technology and science appear to oppose each
    other in perspective
  • Technology is concerned with what works (and can
    sell)
  • Science is concerned with what is (and can be
    published)

The Art of the Sellable
The Art of the Workable
The Art of the Soluble
The Art of the Publishable
52
The Fundamental Duality
There is an essential tension between Science
and Technology
The Art of the Sellable
The Art of the Workable
The Art of the Soluble
The Art of the Publishable
53
The Fundamental Duality
  • There is an essential tension between Science
    and Technology
  • Science is often deemed pure

The Art of the Sellable
The Art of the Workable
The Art of the Soluble
The Art of the Publishable
54
The Fundamental Duality
  • There is an essential tension between Science
    and Technology
  • Science is often deemed pure
  • Technology is usually perceived as applied

The Art of the Sellable
The Art of the Workable
The Art of the Soluble
The Art of the Publishable
55
The Fundamental Duality
  • There is an essential tension between Science
    and Technology
  • Science is often deemed pure
  • Technology is usually perceived as applied (and
    therefore not quite as pure)

The Art of the Sellable
The Art of the Workable
The Art of the Soluble
The Art of the Publishable
56
The Eternal Pentangle
Speech Research Provides an Excellent Example of
the Tension between Science and Technology
57
The Eternal Pentangle
Speech Research Provides an Excellent Example of
the Tension between Science and Technology
58
The Eternal Pentangle
  • Speech Research Provides an Excellent Example of
    the Tension between Science and Technology
  • Phonetic insight is on the side of the angels

59
The Eternal Pentangle
  • Speech Research Provides an Excellent Example of
    the Tension between Science and Technology
  • Phonetic insight is on the side of the angels
    (a.k.a. science)

Phonetic Insight
60
The Eternal Pentangle
  • Speech Research Provides an Excellent Example of
    the Tension between Science and Technology
  • Phonetic insight is on the side of the angels
    (a.k.a. science)
  • While speech technology is on the side of the
    apes

Phonetic Insight
61
The Eternal Pentangle
  • Speech Research Provides an Excellent Example of
    the Tension between Science and Technology
  • Phonetic insight is on the side of the angels
    (a.k.a. science)
  • While speech technology is on the side of the
    apes (a.k.a. the real world)

The Real World
Phonetic Insight
62
The Inner Triangle
The Inner Triangle of the Eternal Pentangle Can
Potentially Shed Light on this Philosophical (and
Methodological) Conundrum
63
The Inner Triangle
  • The Inner Triangle of the Eternal Pentangle Can
    Potentially Shed Light on this Philosophical (and
    Methodological) Conundrum
  • Manual annotation provides the empirical
    foundation with which to train machine
    algorithms

64
The Inner Triangle
  • The Inner Triangle of the Eternal Pentangle Can
    Potentially Shed Light on this Philosophical (and
    Methodological) Conundrum
  • Manual annotation provides the empirical
    foundation with which to train machine
    algorithms
  • Statistical characterization of the annotated
    material provides the basis for structuring the
    machine learning regime

65
The Inner Triangle
  • The Inner Triangle of the Eternal Pentangle Can
    Potentially Shed Light on this Philosophical (and
    Methodological) Conundrum
  • Manual annotation provides the empirical
    foundation with which to train machine
    algorithms
  • Statistical characterization of the annotated
    material provides the basis for structuring the
    machine learning regime
  • Machine learning provides a method for evaluating
    phonetic knowledge

66
The Inner Triangle
  • The Inner Triangle of the Eternal Pentangle Can
    Potentially Shed Light on this Philosophical (and
    Methodological) Conundrum
  • Manual annotation provides the empirical
    foundation with which to train machine
    algorithms
  • Statistical characterization of the annotated
    material provides the basis for structuring the
    machine learning regime
  • Machine learning provides a method for evaluating
    phonetic knowledge
  • Phonetic knowledge can be used to efficiently
    train machine algorithms

67
The Inner Triangle
  • The Inner Triangle of the Eternal Pentangle Can
    Potentially Shed Light on this Philosophical (and
    Methodological) Conundrum
  • Manual annotation provides the empirical
    foundation with which to train machine
    algorithms
  • Statistical characterization of the annotated
    material provides the basis for structuring the
    machine learning regime
  • Machine learning provides a method for evaluating
    phonetic knowledge
  • Phonetic knowledge can be used to efficiently
    train machine algorithms
  • Statistical characterization can serve as a
    reality check on phonetic knowledge

68
The Inner Triangle
Thus, the three apices of the Inner Triangle feed
into each other and provide insight and
perspective difficult to achieve from a single
vantage point
69
The Inner Triangle
  • Thus, the three apices of the Inner Triangle feed
    into each other and provide insight and
    perspective difficult to achieve from a single
    vantage point
  • In a manner analogous to Rashomon, insight may be
    gained from this multi- dimensional perspective
    that deepens our knowledge of spoken language

70
The Inner Triangle
  • Thus, the three apices of the Inner Triangle feed
    into each other and provide insight and
    perspective difficult to achieve from a single
    vantage point
  • In a manner analogous to Rashomon, insight may be
    gained from this multi- dimensional perspective
    that deepens our knowledge of spoken language
  • And thus enables the development of superior
    technology that truly works in the real world

71
The Inner Triangle
  • Thus, the three apices of the Inner Triangle feed
    into each other and provide insight and
    perspective difficult to achieve from a single
    vantage point
  • In a manner analogous to Rashomon, insight may be
    gained from this multi- dimensional perspective
    that deepens our knowledge of spoken language
  • And thus enables the development of superior
    technology that truly works in the real world
  • The development of sterling technology provides
    (in principle) a means to fund further basic
    technology-driven research

72
The Inner Triangle
  • Thus, the three apices of the Inner Triangle feed
    into each other and provide insight and
    perspective difficult to achieve from a single
    vantage point
  • In a manner analogous to Rashomon, insight may be
    gained from this multi- dimensional perspective
    that deepens our knowledge of spoken language
  • And thus enables the development of superior
    technology that truly works in the real world
  • The development of sterling technology provides
    (in principle) a means to fund further basic
    technology-driven research
  • And that, in turn, results in further
    technological advances

73
The Inner Triangle
  • Thus, the three apices of the Inner Triangle feed
    into each other and provide insight and
    perspective difficult to achieve from a single
    vantage point
  • In a manner analogous to Rashomon, insight may be
    gained from this multi- dimensional perspective
    that deepens our knowledge of spoken language
  • And thus enables the development of superior
    technology that truly works in the real world
  • The development of sterling technology provides
    (in principle) a means to fund further basic
    technology-driven research
  • And that, in turn, results in further
    technological advances
  • And so on

74
The Inner Triangle
  • Thus, the three apices of the Inner Triangle feed
    into each other and provide insight and
    perspective difficult to achieve from a single
    vantage point
  • In a manner analogous to Rashomon, insight may be
    gained from this multi-dimensional perspective
    that deepens our knowledge of spoken language
  • And thus enables the development of superior
    technology that truly works in the real world
  • The development of sterling technology provides
    (in principle) a means to fund further basic
    technology-driven research
  • And that, in turn, results in further
    technological advances
  • And so on (forever after)

75
Part Three THE IMPORTANCE OF BEING PHONETICALLY
ANNOTATED
  • A Corpus-Centric Perspective on Spoken Language
  • Phonetic Annotation of Spontaneous American
    English Discourse

76
Phonetic Annotation is Useful, Because
  • Many Properties of Spontaneous Spoken Language
    Differ from Those of Laboratory and Citation
    Speech

77
Phonetic Annotation is Useful, Because
  • Many Properties of Spontaneous Spoken Language
    Differ from Those of Laboratory and Citation
    Speech
  • There are systematic patterns in real speech
    that potentially reveal underlying principles
    of linguistic organization

78
Phonetic Annotation is Useful, Because
  • Many Properties of Spontaneous Spoken Language
    Differ from Those of Laboratory and Citation
    Speech
  • There are systematic patterns in real speech
    that potentially reveal underlying principles
    of linguistic organization
  • Such Corpora Provide Empirical Material for the
    Study of Spoken Language

79
Phonetic Annotation is Useful, Because
  • Many Properties of Spontaneous Spoken Language
    Differ from Those of Laboratory and Citation
    Speech
  • There are systematic patterns in real speech
    that potentially reveal underlying principles
    of linguistic organization
  • Such Corpora Provide Empirical Material for the
    Study of Spoken Language
  • Such data provide an important basis for
    scientific insight and understanding

80
Phonetic Annotation is Useful, Because
  • Many Properties of Spontaneous Spoken Language
    Differ from Those of Laboratory and Citation
    Speech
  • There are systematic patterns in real speech
    that potentially reveal underlying principles
    of linguistic organization
  • Such Corpora Provide Empirical Material for the
    Study of Spoken Language
  • Such data provide an important basis for
    scientific insight and understanding
  • And facilitate development of new models of
    spoken language

81
Phonetic Annotation is Useful, Because
  • Many Properties of Spontaneous Spoken Language
    Differ from Those of Laboratory and Citation
    Speech
  • There are systematic patterns in real speech
    that potentially reveal underlying principles
    of linguistic organization
  • Such Corpora Provide Empirical Material for the
    Study of Spoken Language
  • Such data provide an important basis for
    scientific insight and understanding
  • And facilitate development of new models of
    spoken language
  • They Also Provide Training Material for
    Technology Applications in

82
Phonetic Annotation is Useful, Because
  • Many Properties of Spontaneous Spoken Language
    Differ from Those of Laboratory and Citation
    Speech
  • There are systematic patterns in real speech
    that potentially reveal underlying principles
    of linguistic organization
  • Such Corpora Provide Empirical Material for the
    Study of Spoken Language
  • Such data provide an important basis for
    scientific insight and understanding
  • And facilitate development of new models of
    spoken language
  • They Also Provide Training Material for
    Technology Applications in
  • Automatic speech recognition, particularly
    pronunciation models

83
Phonetic Annotation is Useful, Because
  • Many Properties of Spontaneous Spoken Language
    Differ from Those of Laboratory and Citation
    Speech
  • There are systematic patterns in real speech
    that potentially reveal underlying principles
    of linguistic organization
  • Such Corpora Provide Empirical Material for the
    Study of Spoken Language
  • Such data provide an important basis for
    scientific insight and understanding
  • And facilitate development of new models of
    spoken language
  • They Also Provide Training Material for
    Technology Applications in
  • Automatic speech recognition, particularly
    pronunciation models
  • Speech synthesis, in pronunciation models as well
    as in

84
Phonetic Annotation is Useful, Because
  • Many Properties of Spontaneous Spoken Language
    Differ from Those of Laboratory and Citation
    Speech
  • There are systematic patterns in real speech
    that potentially reveal underlying principles
    of linguistic organization
  • Such Corpora Provide Empirical Material for the
    Study of Spoken Language
  • Such data provide an important basis for
    scientific insight and understanding
  • And facilitate development of new models of
    spoken language
  • They Also Provide Training Material for
    Technology Applications in
  • Automatic speech recognition, particularly
    pronunciation models
  • Speech synthesis, in pronunciation models as well
    as in
  • Cross-linguistic transfer of technology
    algorithms, etc.

85
Phonetic Annotation is Useful, Because
  • Many Properties of Spontaneous Spoken Language
    Differ from Those of Laboratory and Citation
    Speech
  • There are systematic patterns in real speech
    that potentially reveal underlying principles
    of linguistic organization
  • Such Corpora Provide Empirical Material for the
    Study of Spoken Language
  • Such data provide an important basis for
    scientific insight and understanding
  • And facilitate development of new models of
    spoken language
  • They Also Provide Training Material for
    Technology Applications in
  • Automatic speech recognition, particularly
    pronunciation models
  • Speech synthesis, in pronunciation models as well
    as in
  • Cross-linguistic transfer of technology
    algorithms, etc.
  • They Promote Development of NOVEL Algorithms for
    Speech Technology

86
Phonetic Annotation is Useful, Because
  • Many Properties of Spontaneous Spoken Language
    Differ from Those of Laboratory and Citation
    Speech
  • There are systematic patterns in real speech
    that potentially reveal underlying principles
    of linguistic organization
  • Such Corpora Provide Empirical Material for the
    Study of Spoken Language
  • Such data provide an important basis for
    scientific insight and understanding
  • And facilitate development of new models of
    spoken language
  • They Also Provide Training Material for
    Technology Applications in
  • Automatic speech recognition, particularly
    pronunciation models
  • Speech synthesis, in pronunciation models as well
    as in
  • Cross-linguistic transfer of technology
    algorithms, etc.
  • They Promote Development of NOVEL Algorithms for
    Speech Technology
  • Including pronunciation models and lexical
    representations for

87
Phonetic Annotation is Useful, Because
  • Many Properties of Spontaneous Spoken Language
    Differ from Those of Laboratory and Citation
    Speech
  • There are systematic patterns in real speech
    that potentially reveal underlying principles
    of linguistic organization
  • Such Corpora Provide Empirical Material for the
    Study of Spoken Language
  • Such data provide an important basis for
    scientific insight and understanding
  • And facilitate development of new models of
    spoken language
  • They Also Provide Training Material for
    Technology Applications in
  • Automatic speech recognition, particularly
    pronunciation models
  • Speech synthesis, in pronunciation models as well
    as in
  • Cross-linguistic transfer of technology
    algorithms, etc.
  • They Promote Development of NOVEL Algorithms for
    Speech Technology
  • Including pronunciation models and lexical
    representations for
  • automatic speech recognition and speech
    synthesis, as well as

88
Phonetic Annotation is Useful, Because
  • Many Properties of Spontaneous Spoken Language
    Differ from Those of Laboratory and Citation
    Speech
  • There are systematic patterns in real speech
    that potentially reveal underlying principles
    of linguistic organization
  • Such Corpora Provide Empirical Material for the
    Study of Spoken Language
  • Such data provide an important basis for
    scientific insight and understanding
  • And facilitate development of new models of
    spoken language
  • They Also Provide Training Material for
    Technology Applications in
  • Automatic speech recognition, particularly
    pronunciation models
  • Speech synthesis, in pronunciation models as well
    as in
  • Cross-linguistic transfer of technology
    algorithms, etc.
  • They Promote Development of NOVEL Algorithms for
    Speech Technology
  • Including pronunciation models and lexical
    representations for
  • automatic speech recognition and speech
    synthesis, as well as
  • Multi-tier representations of spoken language

89
Phonetic Annotation is Useful, Because
  • Many Properties of Spontaneous Spoken Language
    Differ from Those of Laboratory and Citation
    Speech
  • There are systematic patterns in real speech
    that potentially reveal underlying principles
    of linguistic organization
  • Such Corpora Provide Empirical Material for the
    Study of Spoken Language
  • Such data provide an important basis for
    scientific insight and understanding
  • And facilitate development of new models of
    spoken language
  • They Also Provide Training Material for
    Technology Applications in Automatic speech
    recognition, particularly pronunciation models
  • Speech synthesis, in pronunciation models as well
    as in
  • Cross-linguistic transfer of technology
    algorithms, etc.
  • They Promote Development of NOVEL Algorithms for
    Speech Technology
  • Including pronunciation models and lexical
    representations for
  • automatic speech recognition and speech
    synthesis, as well as
  • Multi-tier representations of spoken language
  • All of Which Can be Used for Gaining Further
    Insight into Spoken Language

90
Corpus-Centric View of Spoken Language
Each Tier of Linguistic Organization Provides a
Unique Perspective
91
Corpus-Centric View of Spoken Language
Each Tier of Linguistic Organization Provides a
Unique Perspective However, integrating the
annotated material across levels is tricky
92
Corpus-Centric View of Spoken Language
Each Tier of Linguistic Organization Provides a
Unique Perspective However, integrating the
annotated material across levels is tricky . And
a lot of work!!
93
Corpus-Centric View of Spoken Language
Each Tier of Linguistic Organization Provides a
Unique Perspective However, integrating the
annotated material across levels is tricky . And
a lot of work!! Lets Focus on a Specific Aspect
of Linguistic Organization in Order to Exemplify
the Concepts Involved
94
Corpus-Centric View of Spoken Language
Each Tier of Linguistic Organization Provides a
Unique Perspective However, integrating the
annotated material across levels is tricky . And
a lot of work!! Lets Focus on a Specific Aspect
of Linguistic Organization in Order to Exemplify
the Concepts Involved In order to do so, we first
consider the nature of the transcription material
used
95
Phonetic Transcription of Spontaneous English
Telephone Dialogues of 5-10 minutes duration,
from the SWITCHBOARD CORPUS, have been
phonetically annotated (labeled and segmented)
   
96
Phonetic Transcription of Spontaneous English
Telephone Dialogues of 5-10 minutes duration,
from the SWITCHBOARD CORPUS, have been
phonetically annotated (labeled and
segmented) Most of this Material has been
Manually Annotated    
97
Phonetic Transcription of Spontaneous English
Telephone Dialogues of 5-10 minutes duration,
from the SWITCHBOARD CORPUS, have been
phonetically annotated (labeled and
segmented) Most of this Material has been
Manually Annotated     4 hours labeled
at the phone level and segmented at the syllabic
level
98
Phonetic Transcription of Spontaneous English
Telephone Dialogues of 5-10 minutes duration,
from the SWITCHBOARD CORPUS, have been
phonetically annotated (labeled and
segmented) Most of this Material has been
Manually Annotated     4 hours labeled
at the phone level and segmented at the syllabic
level 1 hour labeled and segmented at the
phonetic-segment level
99
Phonetic Transcription of Spontaneous English
Telephone Dialogues of 5-10 minutes duration,
from the SWITCHBOARD CORPUS, have been
phonetically annotated (labeled and
segmented) Most of this Material has been
Manually Annotated     4 hours labeled
at the phone level and segmented at the syllabic
level 1 hour labeled and segmented at the
phonetic-segment level The remaining material has
been segmented at the phonetic-segment level
using automatic methods
100
Phonetic Transcription of Spontaneous English
Telephone Dialogues of 5-10 minutes duration,
from the SWITCHBOARD CORPUS, have been
phonetically annotated (labeled and
segmented) Most of this Material has been
Manually Annotated     4 hours labeled
at the phone level and segmented at the syllabic
level 1 hour labeled and segmented at the
phonetic-segment level The remaining material has
been segmented at the phonetic-segment level
using automatic methods 45 minutes of
stress-accent-labeled material
101
Phonetic Transcription of Spontaneous English
Telephone Dialogues of 5-10 minutes duration,
from the SWITCHBOARD CORPUS, have been
phonetically annotated (labeled and
segmented) Most of this Material has been
Manually Annotated     4 hours labeled
at the phone level and segmented at the syllabic
level 1 hour labeled and segmented at the
phonetic-segment level The remaining material has
been segmented at the phonetic-segment level
using automatic methods 45 minutes of
stress-accent-labeled material An additional four
hours of material automatically labeled with
respect to accent (this latter material not used
in the current analysis, but will be available
soon)  
102
Phonetic Transcription of Spontaneous English
Telephone Dialogues of 5-10 minutes duration,
from the SWITCHBOARD CORPUS, have been
phonetically annotated (labeled and
segmented) Most of this Material has been
Manually Annotated     4 hours labeled
at the phone level and segmented at the syllabic
level 1 hour labeled and segmented at the
phonetic-segment level The remaining material has
been segmented at the phonetic-segment level
using automatic methods 45 minutes of
stress-accent-labeled material An additional four
hours of material automatically labeled with
respect to accent (this latter material not used
in the current analysis, but will be available
soon)   There is a Lot of Diversity in the
Material Transcribed
103
Phonetic Transcription of Spontaneous English
Telephone Dialogues of 5-10 minutes duration,
from the SWITCHBOARD CORPUS, have been
phonetically annotated (labeled and
segmented) Most of this Material has been
Manually Annotated     4 hours labeled
at the phone level and segmented at the syllabic
level 1 hour labeled and segmented at the
phonetic-segment level The remaining material has
been segmented at the phonetic-segment level
using automatic methods 45 minutes of
stress-accent-labeled material An additional four
hours of material automatically labeled with
respect to accent (this latter material not used
in the current analysis, but will be available
soon)   There is a Lot of Diversity in the
Material Transcribed Spans speech of both genders
(ca. 50/50), reflecting a wide range of American
dialectal variation, speaking rate and voice
quality
104
Phonetic Transcription of Spontaneous English
Telephone Dialogues of 5-10 minutes duration,
from the SWITCHBOARD CORPUS, have been
phonetically annotated (labeled and
segmented) Most of this Material has been
Manually Annotated     4 hours labeled
at the phone level and segmented at the syllabic
level 1 hour labeled and segmented at the
phonetic-segment level The remaining material has
been segmented at the phonetic-segment level
using automatic methods 45 minutes of
stress-accent-labeled material An additional four
hours of material automatically labeled with
respect to accent (this latter material not used
in the current analysis, but will be available
soon)   There is a Lot of Diversity in the
Material Transcribed Spans speech of both genders
(ca. 50/50), reflecting a wide range of American
dialectal variation, speaking rate and voice
quality Transcription System A variant of
Arpabet, with phonetic diacritics such
as_gl,_cr, _fr, _n, _vl, _vd
105
Phonetic Transcription of Spontaneous English
The Data are Available at .
106
Phonetic Transcription of Spontaneous English
The Data are Available at . http//www.ics
i/berkeley.edu/real/stp
107
Phonetic Transcription of Spontaneous English
The Data are Available at . http//www.ics
i/berkeley.edu/real/stp This Means there is
Phonetically Validated Material at the Level of
the
108
Phonetic Transcription of Spontaneous English
The Data are Available at . http//www.ics
i/berkeley.edu/real/stp This Means there is
Phonetically Validated Material at the Level of
the WORD
109
Phonetic Transcription of Spontaneous English
The Data are Available at . http//www.ics
i/berkeley.edu/real/stp This Means there is
Phonetically Validated Material at the Level of
the WORD SYLLABLE
110
Phonetic Transcription of Spontaneous English
The Data are Available at . http//www.ics
i/berkeley.edu/real/stp This Means there is
Phonetically Validated Material at the Level of
the WORD SYLLABLE PHONETIC SEGMENT
111
Phonetic Transcription of Spontaneous English
The Data are Available at . http//www.ics
i/berkeley.edu/real/stp This Means there is
Phonetically Validated Material at the Level of
the WORD SYLLABLE PHONETIC
SEGMENT ARTICULATORY-ACOUSTIC FEATURE
112
Phonetic Transcription of Spontaneous English
The Data are Available at . http//www.ics
i/berkeley.edu/real/stp This Means there is
Phonetically Validated Material at the Level of
the WORD SYLLABLE PHONETIC
SEGMENT ARTICULATORY-ACOUSTIC FEATURE
and STRESS ACCENT
113
Phonetic Transcription of Spontaneous English
The Data are Available at . http//www.ics
i/berkeley.edu/real/stp This Means there is
Phonetically Validated Material at the Level of
the WORD SYLLABLE PHONETIC
SEGMENT ARTICULATORY-ACOUSTIC FEATURE
and STRESS ACCENT (as well as at the utterance
level)
114
The Eternal Pentangle (Redux)
Lets re-examine the eternal triangle from the
perspective of manual annotation for three
linguistic tiers.
115
Phonetic Transcription
How was the Labeling and Segmentation Performed?
116
Phonetic Transcription
How was the Labeling and Segmentation
Performed? VERY carefully . by UC-Berkeley
linguistics students
117
Phonetic Transcription
How was the Labeling and Segmentation
Performed? VERY carefully . by UC-Berkeley
linguistics students Using a display of the
signal waveform,
118
Phonetic Transcription
How was the Labeling and Segmentation
Performed? VERY carefully . by UC-Berkeley
linguistics students Using a display of the
signal waveform, spectrogram,
119
Phonetic Transcription
How was the Labeling and Segmentation
Performed? VERY carefully . by UC-Berkeley
linguistics students Using a display of the
signal waveform, spectrogram, word transcription
120
Phonetic Transcription
How was the Labeling and Segmentation
Performed? VERY carefully . by UC-Berkeley
linguistics students Using a display of the
signal waveform, spectrogram, word transcription
and forced alignments (automatic estimates of
phones and boundaries)
121
Phonetic Transcription
How was the Labeling and Segmentation
Performed? VERY carefully . by UC-Berkeley
linguistics students Using a display of the
signal waveform, spectrogram, word transcription
and forced alignments (automatic estimates of
phones and boundaries) audio (listening at
multiple time scales - phone, word, utterance)
122
Phonetic Transcription
How was the Labeling and Segmentation
Performed? VERY carefully . by UC-Berkeley
linguistics students Using a display of the
signal waveform, spectrogram, word transcription
and forced alignments (automatic estimates of
phones and boundaries) audio (listening at
multiple time scales - phone, word, utterance) on
Sun workstations
123
Phonetic Transcription
How was the Labeling and Segmentation
Performed? VERY carefully . by UC-Berkeley
linguistics students Using a display of the
signal waveform, spectrogram, word transcription
and forced alignments (automatic estimates of
phones and boundaries) audio (listening at
multiple time scales - phone, word, utterance) on
Sun workstations Additionally, automatic
segmentation and labeling of articulatory manner
was used as a guide for phonetic labeling and
segmentation in the current year
124
Phonetic Transcription
In addition to phonetic labels and syllabic
segmentation,
125
Phonetic Transcription
In addition to phonetic labels and syllabic
segmentation, 45 minutes of this material was
labeled with respect to stress accent for each
syllable Three levels of stress were marked -
FULLY Stressed, Unstressed and Intermediate Stress
126
Phonetic Transcription
Such material can be used to perform statistical
characterization of spontaneous speech as well
as train machine algorithms to label and segment
additional material
127
Phonetic Transcription
Such material can be used to perform statistical
characterization of spontaneous speech as well
as train machine algorithms to label and segment
additional material In addition, the
transcription material can be used to evaluate
the performance of automatic speech
recognition systems
128
Phonetic Transcription
Such material can be used to perform statistical
characterization of spontaneous speech as well
as train machine algorithms to label and segment
additional material In addition, the
transcription material can be used to evaluate
the performance of automatic speech
recognition systems Lets first consider how this
transcription can be used for ASR evaluation
129
Phonetic Transcription
Such material can be used to perform statistical
characterization of spontaneous speech as well
as train machine algorithms to label and segment
additional material In addition, the
transcription material can be used to evaluate
the performance of automatic speech
recognition systems Lets first consider how this
transcription can be used for ASR
evaluation Well focus on stress-accent, but then
relate this to syllable structure
130
Part Four PHONETIC DISSECTION OF AUTOMATIC
SPEECH RECOGNITION SYSTEMS A Case Study
Stress Accent and Word Error Rate Syllable
Structure and Word Error Rate
In Collaboration with Shawn Chang
131
The Eternal Pentangle (Redux)
Lets re-examine the eternal triangle from the
perspective of automatic speech recognition .
132
Generation of Evaluation Data - 1
A complex sequence of data formatting was
required to place the speech recognition data
of 8 separate sites into register with the
transcription material (and vice versa)
133
Generation of Evaluation Data - 2
But, lets not sweat the details during this
presentation
134
Generation of Evaluation Data - 2
Lets not sweat the details during this
presentation Interested parties may consult the
relevant papers (Greenberg, Hollenback and Chang,
2000 Greenberg and Chang, 2000)
at www.icsi.berkeley.edu/steveng
135
Generation of Evaluation Data - 3
Recognition performance was analyzed with
reference to ca. 50 separate acoustic,
linguistic and structural parameters
136
Summary of Corpus Acoustic Properties
  • LEXICAL PROPERTIES
  • Lexical Identity
  • Unigram Frequency
  • Number of Syllables in Word
  • Number of Phones in Word
  • Word Duration
  • Speaking Rate
  • Prosodic Prominence
  • Energy Level
  • Lexical Compounds
  • Non-Words
  • Word Position in Utterance
  • SYLLABLE PROPERTIES
  • Syllable Structure
  • Syllable Duration
  • Syllable Energy
  • Prosodic Prominence
  • Prosodic Context
  • PHONE PROPERTIES
  • Phonetic Identity
  • Phone Frequency
  • Position within the Word
  • Position within the Syllable
  • Phone Duration
  • Speaking Rate
  • Phonetic Context
  • Contiguous Phones Correct
  • Contiguous Phones Wrong
  • Phone Segmentation
  • Articulatory Features
  • Articulatory Feature Distance
  • Phone Confusion Matrices
  • OTHER PROPERTIES
  • Speaker (Dialect, Gender)
  • Utterance Difficulty
  • Utterance Energy
  • Utterance Duration

137
Summary of Corpus Acoustic Properties
  • LEXICAL PROPERTIES
  • Lexical Identity
  • Unigram Frequency
  • Number of Syllables in Word
  • Number of Phones in Word
  • Word Duration
  • Speaking Rate
  • Prosodic Prominence
  • Energy Level
  • Lexical Compounds
  • Non-Words
  • Word Position in Utterance
  • SYLLABLE PROPERTIES
  • Syllable Structure
  • Syllable Duration
  • Syllable Energy
  • Prosodic Prominence
  • Prosodic Context
  • PHONE PROPERTIES
  • Phonetic Identity
  • Phone Frequency
  • Position within the Word
  • Position within the Syllable
  • Phone Duration
  • Speaking Rate
  • Phonetic Context
  • Contiguous Phones Correct
  • Contiguous Phones Wrong
  • Phone Segmentation
  • Articulatory Features
  • Articulatory Feature Distance
  • Phone Confusion Matrices
  • OTHER PROPERTIES
  • Speaker (Dialect, Gender)
  • Utterance Difficulty
  • Utterance Energy
  • Utterance Duration

138
Summary of Corpus Acoustic Properties
  • LEXICAL PROPERTIES
  • Lexical Identity
  • Unigram Frequency
  • Number of Syllables in Word
  • Number of Phones in Word
  • Word Duration
  • Speaking Rate
  • Prosodic Prominence
  • Energy Level
  • Lexical Compounds
  • Non-Words
  • Word Position in Utterance
  • SYLLABLE PROPERTIES
  • Syllable Structure
  • Syllable Duration
  • Syllable Energy
  • Prosodic Prominence
  • Prosodic Context
  • PHONE PROPERTIES
  • Phonetic Identity
  • Phone Frequency
  • Position within the Word
  • Position within the Syllable
  • Phone Duration
  • Speaking Rate
  • Phonetic Context
  • Contiguous Phones Correct
  • Contiguous Phones Wrong
  • Phone Segmentation
  • Articulatory Features
  • Articulatory Feature Distance
  • Phone Confusion Matrices
  • OTHER PROPERTIES
  • Speaker (Dialect, Gender)
  • Utterance Difficulty
  • Utterance Energy
  • Utterance Duration

139
Summary of Corpus Acoustic Properties
  • LEXICAL PROPERTIES
  • Lexical Identity
  • Unigram Frequency
  • Number of Syllables in Word
  • Number of Phones in Word
  • Word Duration
  • Speaking Rate
  • Prosodic Prominence
  • Energy Level
  • Lexical Compounds
  • Non-Words
  • Word Position in Utterance
  • SYLLABLE PROPERTIES
  • Syllable Structure
  • Syllable Duration
  • Syllable Energy
  • Prosodic Prominence
  • Prosodic Context
  • PHONE PROPERTIES
  • Phonetic Identity
  • Phone Frequency
  • Position within the Word
  • Position within the Syllable
  • Phone Duration
  • Speaking Rate
  • Phonetic Context
  • Contiguous Phones Correct
  • Contiguous Phones Wrong
  • Phone Segmentation
  • Articulatory Features
  • Articulatory Feature Distance
  • Phone Confusion Matrices
  • OTHER PROPERTIES
  • Speaker (Dialect, Gender)
  • Utterance Difficulty
  • Utterance Energy
  • Utterance Duration

140
What is (usually) Meant by Stress Accent?
  • Prosody is supposed to pertain to extra-phonetic
    cues in the acoustic signal

141
What is (usually) Meant by Stress Accent?
  • Prosody is supposed to pertain to extra-phonetic
    cues in the acoustic signal
  • The pattern of variation over a sequence of
    SYLLABLES pertaining to syllabic DURATION,
    AMPLITUDE and PITCH (fo) variation over time

142
What is (usually) Meant by Stress Accent?
  • Prosody is supposed to pertain to extra-phonetic
    cues in the acoustic signal
  • The pattern of variation over a sequence of
    SYLLABLES pertaining to syllabic DURATION,
    AMPLITUDE and PITCH (fo) variation over time
  • But, the plot thickens (considerably) . as well
    shortly see

143
Stress Accent and Word Error Rate
The effect of stress accent is most discernable
among word-deletion errors
  • Data are averaged across all eight sites

144
Stress Accent and Word Error Rate
The effect of stress accent is most discernable
among word-deletion errors There is no essential
relation between accent and word-substitution
errors
  • Data are averaged across all eight sites

145
Syllable Structure and Word Error Rate
Lets now consider syllable structure with
respect to ASR word error
146
Syllable Structure and Word Error Rate
Lets now consider syllable structure with
respect to ASR word error There is a certain
similarity with the pattern observed for stress
accent .
147
Syllable Structure and Word Error Rate
Vowel-initial forms show the greatest error,
particularly for word deletions
  • C Consonant
  • V Vowel
  • Data are averaged across all eight sites

148
Syllable Structure and Word Error Rate
Vowel-initial forms show the greatest error,
particularly for word deletions Polysyllabic
forms manifest the lowest error, especially for
word deletions
  • C Consonant
  • V Vowel
  • Data are averaged across all eight sites

149
Syllable Structure and Word Error Rate
Vowel-initial forms show the greatest error,
particularly for word deletions Polysyllabic
forms manifest the lowest error, especially for
word deletions The vowel-initial forms tend to be
unstressed, so .
  • C Consonant
  • V Vowel
  • Data are averaged across all eight sites

150
Syllable Structure and Word Error Rate
Vowel-initial forms show the greatest error,
particularly for word deletions Polysyllabic
forms manifest the lowest error, especially for
word deletions The vowel-initial forms tend to be
unstressed, so . Perhaps the similarity in
pattern is not so surprising after all
  • C Consonant
  • V Vowel
  • Data are averaged across all eight sites

151
The Plot So Far
  • The Proportion of Word (Deletion) Errors is Much
    Higher Among Unstressed Syllables

152
The Plot So Far
  • The Proportion of Word (Deletion) Errors is Much
    Higher Among Unstressed Sylla
Write a Comment
User Comments (0)
About PowerShow.com