Title: speechrec09
1Speech Recognition How far have we got now,
after 13 weeks?
Look ... no hands!
Look ... no hands!
2Letzte Woche
1.Overview Speech Recognition Systems
Architectures 2. Acoustic modeling feature
extraction (1) 3. Feature extraction (2) 4.
LDA, Grundlagen der Klassifikation 5.
Wahrscheinlichkeitsrechnung, Klassifikation 6.
Vektorquantisierung, ML, Kullback-Leibler 7.
Mischverteilungen, EM, Wortmodelle (tying),
Suche 8. Kont. HMMs 9. wordgraphs, confidence
measur., acoustic adaptation 10. MLLR
Herleitung, Grammatik 11. Language model, LM
adaptation, phonology 12. Large vocab.,
Sprachverstehen, dialogue control Design
of computer speech recognition systems
3Quelle Script D. Klakow, RWTH 1999
4(No Transcript)
5(No Transcript)
6(No Transcript)
7(No Transcript)
8(No Transcript)
9Diese Woche
1.Overview Speech Recognition Systems
Architectures 2. Acoustic modeling feature
extraction (1) 3. Feature extraction (2) 4.
LDA, Grundlagen der Klassifikation 5.
Wahrscheinlichkeitsrechnung, Klassifikation 6.
Vektorquantisierung, ML, Kullback-Leibler 7.
Mischverteilungen, EM, Wortmodelle (tying),
Suche 8. Kont. HMMs 9. wordgraphs, confidence
measur., acoustic adaptation 10. MLLR
Herleitung, Grammatik 11. Language model, LM
adaptation, phonology 12. Large vocab.,
Sprachverstehen, dialogue control Design
of computer speech recognition systems
10Erkennung kontinuierlicher Sprache mit großem
Vokabular
11(No Transcript)
12(No Transcript)
13(No Transcript)
14(No Transcript)
15(No Transcript)
16(No Transcript)
17(No Transcript)
18(No Transcript)
19(No Transcript)
20(No Transcript)
21(No Transcript)
22(No Transcript)
23(No Transcript)
24Sprachverstehen, Dialogkontrolle
25System Architecture
Telephone Network
Network Interface and I / O Control
Speech Recognition
Language Understanding
Dialogue Control
Speech Output
db
26Language Understanding
- Attributed Stochastic Context-Free Grammarfor
meaningful phrases (concepts) - Stochastic bigram language model for'filler'
phrases - gt Concept graph
- Separation of task-knowledge and
site-knowledgeDatabase specific non-terminals
and their likelihoods are resolved by calling the
directory database
27Language Model P(W)
- Stochastic N-gram
- can not model underlying structure of language
- Syntactic grammar
- many sentences can not be fully covered
- parses are highly ambiguous
- sentences can often not be parsed in real-time
- But A simple grammar may be used to describe
meaningful phrases (concepts) like ltdategt or
lttimegt
28Hybrid Approaches
- Combination of phrase-structure grammars and
stochastic N-gram language models - Stochastic context-free grammar (SCFG) to model
concepts and to provide production probabilities - Ways of integrating SCFGs and N-grams
- smoothing of N-gram with a SCFG
- partial parsing with a SCFG, N-gram model for
non-grammatical phrases (fillers) - partial parsing with a SCFG, top-level N-gram for
sequence of words and phrases
29Phrase Grammar Stochastic Filler LM
- SCFG for each concept
- Partial parsing of the input
- Bigram language model for those parts of the
input which are not covered by the grammar
(fillers) - Top-level concept bigram
- ExampleWie sieht es bei Ihnen am Montag aus
FILLER ltdategt FILLER
30Grammar Phrase LM
- SCFG for the concepts
- Partial parsing of the input
- Combined word/phrase language model
- ExampleWie sieht es bei Ihnen am Montag ausWie
sieht es bei Ihnen ltdategt aus - This approach also models the conceptual context
of the filler words
31Word and Concept Graph
want
I
the
of
email
Mike
Daves
1
2
3
4
4
5
6
7
8
Dave
Samuel
Davis
Parser
filler
filler
filler
filler
requestemail
fnameMike
nameDaves
1
2
3
4
4
5
6
7
8
fname Dave
filler
fnameSamuel
nameDavis
32Integrating the Database
- Incorporate database knowledge in search for
- the best path Seide et al. ICSLP96
- Model a-priori distribution of information items
- Reject a hypothesis if it is
- contradictory in itself
- inconsistent with the system prompt
- inconsistent with the system's 'belief'
- using an N-best method Tran et al. ICSLP 96
- gt Significant decrease of the attribute error
rate
33Dialogue Control
- Mixed-initiative dialogue
- Slot filling strategy
- information can be given in arbitrary order
- not all slots have to be filled
- dialogue continues as long as additional
information is needed to disambiguate database
entries - Implicit verification
34Experimental Results
- Training
- acoustics 12.1h train timetable information
- grammar and LMs 2348 dialogues (8214 turns)
- Test 1164 dialogues (3198 turns)
- (AER attribute error rate, WER word error rate)
- Average search depth 3.8
35Telefonauskunft Lexikon
- Lexicon contains 1580 words
- 683 last names
- 301 first names
- 117 names of groups, departments, ...
36Example Dialogue
- S Hello, this is PADIS. How can I help you?
- U I'd like to talk to Mike.
- S So, you want to talk to Mike Anderson?
- U Yes, please.
- S The extension 506 is busy right now.Can I do
anything else for you? - U Oh yes. Can you give me the email address of
Mary, please? - S Which Mary do you want the email address of?
- U Uh, she is in the speech recognition group.
- S So you want the email address of Mary White
from the speech recognition group? - U Exactly
- S The email address is ...
37Sprachdialogsysteme über Telefon
- Anwendung Zugauskunft, Reiseauskunft,
Telefonauskunft, Home Banking etc. - Echtzeitverarbeitung und hohe Sprachverstehensrate
- Sprachverstehen unter den Randbedingungen der
Spracherkennung (Erkennungsfehler) - Dialogführung berücksichtigt die Unsicherheit
beim Verstehen - Erweiterungen vom Operator zum Agenten, von der
Auskunft zu Transaktion
38Demonstration Zugauskunfts-Prototyp (Philips)
- Allgemein erreichbar unter (0241) 60 40 20
- Zugverbindungen zw. 1200 deutschen Bahnhöfen
- Sprecherunabhängig, über Telefon, 1800 Wörter
Vokabular - Freier Dialog, fragt nach (im Gegensatz zu IVR
interactive voice response) - Bestätigung des Verstandenen während des Dialoges
- Weitere Systeme für Schweizer Deutsch,
Holländisch und Französisch
39Einige deutsche Demo-Telefonnummern (ohne Gewähr!)
- Zugauskunft (Deutsche Bahn) (01805) 99 66 22
- Zugauskunft (Philips) (0241)
60 40 20 - Vermittlung (Philips Forschung) (0241) 6003 666
- Einige der Systeme sind IVR-Systeme (interactive
voice response)
40Übernächste Woche
- Klausur Prozess- und Rechnerarchitektur Bildver
arbeitung Sprachverarbeitung - 19.07.01, 0900 1200 Uhr, Raum G22a / 13
davon 1h Sprachverarbeitung - Keine Unterlagen
- Verständnisfragen zu den Themen der Vorlesung
- Ggf. Rechnungen zu Themen aus Ãœbungen und zu oft
wiederkehrenden Konzepten