speechrec09 - PowerPoint PPT Presentation

1 / 40

About This Presentation

Title:

speechrec09

Description:

1.Overview Speech Recognition Systems &Architectures ... name:Daves. filler. Davis. Dave. Word and Concept Graph. I. want. Samuel. the. 1. 2. 3. 4. 4. of ... – PowerPoint PPT presentation

Number of Views:40

Avg rating:3.0/5.0

Slides: 41

Provided by: wende1

Category:

more less

Transcript and Presenter's Notes

Title: speechrec09

1
Speech Recognition How far have we got now,
after 13 weeks?
Look ... no hands!
Look ... no hands!
2
Letzte Woche
1.Overview Speech Recognition Systems
Architectures 2. Acoustic modeling feature
extraction (1) 3. Feature extraction (2) 4.
LDA, Grundlagen der Klassifikation 5.
Wahrscheinlichkeitsrechnung, Klassifikation 6.
Vektorquantisierung, ML, Kullback-Leibler 7.
Mischverteilungen, EM, Wortmodelle (tying),
Suche 8. Kont. HMMs 9. wordgraphs, confidence
measur., acoustic adaptation 10. MLLR
Herleitung, Grammatik 11. Language model, LM
adaptation, phonology 12. Large vocab.,
Sprachverstehen, dialogue control Design
of computer speech recognition systems
3
Quelle Script D. Klakow, RWTH 1999
4
(No Transcript)
5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
Diese Woche
1.Overview Speech Recognition Systems
Architectures 2. Acoustic modeling feature
extraction (1) 3. Feature extraction (2) 4.
LDA, Grundlagen der Klassifikation 5.
Wahrscheinlichkeitsrechnung, Klassifikation 6.
Vektorquantisierung, ML, Kullback-Leibler 7.
Mischverteilungen, EM, Wortmodelle (tying),
Suche 8. Kont. HMMs 9. wordgraphs, confidence
measur., acoustic adaptation 10. MLLR
Herleitung, Grammatik 11. Language model, LM
adaptation, phonology 12. Large vocab.,
Sprachverstehen, dialogue control Design
of computer speech recognition systems
10
Erkennung kontinuierlicher Sprache mit großem
Vokabular
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
Sprachverstehen, Dialogkontrolle
25
System Architecture
Telephone Network
Network Interface and I / O Control
Speech Recognition
Language Understanding
Dialogue Control
Speech Output
db
26
Language Understanding

Attributed Stochastic Context-Free Grammarfor
meaningful phrases (concepts)
Stochastic bigram language model for'filler'
phrases
gt Concept graph
Separation of task-knowledge and
site-knowledgeDatabase specific non-terminals
and their likelihoods are resolved by calling the
directory database

27
Language Model P(W)

Stochastic N-gram
can not model underlying structure of language
Syntactic grammar
many sentences can not be fully covered
parses are highly ambiguous
sentences can often not be parsed in real-time
But A simple grammar may be used to describe
meaningful phrases (concepts) like ltdategt or
lttimegt

28
Hybrid Approaches

Combination of phrase-structure grammars and
stochastic N-gram language models
Stochastic context-free grammar (SCFG) to model
concepts and to provide production probabilities
Ways of integrating SCFGs and N-grams
smoothing of N-gram with a SCFG
partial parsing with a SCFG, N-gram model for
non-grammatical phrases (fillers)
partial parsing with a SCFG, top-level N-gram for
sequence of words and phrases

29
Phrase Grammar Stochastic Filler LM

SCFG for each concept
Partial parsing of the input
Bigram language model for those parts of the
input which are not covered by the grammar
(fillers)
Top-level concept bigram
ExampleWie sieht es bei Ihnen am Montag aus
FILLER ltdategt FILLER

30
Grammar Phrase LM

SCFG for the concepts
Partial parsing of the input
Combined word/phrase language model
ExampleWie sieht es bei Ihnen am Montag ausWie
sieht es bei Ihnen ltdategt aus
This approach also models the conceptual context
of the filler words

31
Word and Concept Graph
want
I
the
of
email
Mike
Daves

1
2
3
4
4
5
6
7
8
Dave
Samuel
Davis
Parser
filler
filler
filler
filler
requestemail
fnameMike
nameDaves
1
2
3
4
4
5
6
7
8
fname Dave
filler
fnameSamuel
nameDavis
32
Integrating the Database

Incorporate database knowledge in search for
the best path Seide et al. ICSLP96
Model a-priori distribution of information items
Reject a hypothesis if it is
contradictory in itself
inconsistent with the system prompt
inconsistent with the system's 'belief'
using an N-best method Tran et al. ICSLP 96
gt Significant decrease of the attribute error
rate

33
Dialogue Control

Mixed-initiative dialogue
Slot filling strategy
information can be given in arbitrary order
not all slots have to be filled
dialogue continues as long as additional
information is needed to disambiguate database
entries
Implicit verification

34
Experimental Results

Training
acoustics 12.1h train timetable information
grammar and LMs 2348 dialogues (8214 turns)
Test 1164 dialogues (3198 turns)
(AER attribute error rate, WER word error rate)
Average search depth 3.8

35
Telefonauskunft Lexikon

Lexicon contains 1580 words
683 last names
301 first names
117 names of groups, departments, ...

36
Example Dialogue

S Hello, this is PADIS. How can I help you?
U I'd like to talk to Mike.
S So, you want to talk to Mike Anderson?
U Yes, please.
S The extension 506 is busy right now.Can I do
anything else for you?
U Oh yes. Can you give me the email address of
Mary, please?
S Which Mary do you want the email address of?
U Uh, she is in the speech recognition group.
S So you want the email address of Mary White
from the speech recognition group?
U Exactly
S The email address is ...

37
Sprachdialogsysteme über Telefon

Anwendung Zugauskunft, Reiseauskunft,
Telefonauskunft, Home Banking etc.
Echtzeitverarbeitung und hohe Sprachverstehensrate
Sprachverstehen unter den Randbedingungen der
Spracherkennung (Erkennungsfehler)
Dialogführung berücksichtigt die Unsicherheit
beim Verstehen
Erweiterungen vom Operator zum Agenten, von der
Auskunft zu Transaktion

38
Demonstration Zugauskunfts-Prototyp (Philips)