speechrec09 - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

speechrec09

Description:

1.Overview Speech Recognition Systems &Architectures ... name:Daves. filler. Davis. Dave. Word and Concept Graph. I. want. Samuel. the. 1. 2. 3. 4. 4. of ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 41
Provided by: wende1
Category:

less

Transcript and Presenter's Notes

Title: speechrec09


1
Speech Recognition How far have we got now,
after 13 weeks?
Look ... no hands!
Look ... no hands!
2
Letzte Woche
1.Overview Speech Recognition Systems
Architectures 2. Acoustic modeling feature
extraction (1) 3. Feature extraction (2) 4.
LDA, Grundlagen der Klassifikation 5.
Wahrscheinlichkeitsrechnung, Klassifikation 6.
Vektorquantisierung, ML, Kullback-Leibler 7.
Mischverteilungen, EM, Wortmodelle (tying),
Suche 8. Kont. HMMs 9. wordgraphs, confidence
measur., acoustic adaptation 10. MLLR
Herleitung, Grammatik 11. Language model, LM
adaptation, phonology 12. Large vocab.,
Sprachverstehen, dialogue control Design
of computer speech recognition systems
3
Quelle Script D. Klakow, RWTH 1999
4
(No Transcript)
5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
Diese Woche
1.Overview Speech Recognition Systems
Architectures 2. Acoustic modeling feature
extraction (1) 3. Feature extraction (2) 4.
LDA, Grundlagen der Klassifikation 5.
Wahrscheinlichkeitsrechnung, Klassifikation 6.
Vektorquantisierung, ML, Kullback-Leibler 7.
Mischverteilungen, EM, Wortmodelle (tying),
Suche 8. Kont. HMMs 9. wordgraphs, confidence
measur., acoustic adaptation 10. MLLR
Herleitung, Grammatik 11. Language model, LM
adaptation, phonology 12. Large vocab.,
Sprachverstehen, dialogue control Design
of computer speech recognition systems
10
Erkennung kontinuierlicher Sprache mit großem
Vokabular
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
Sprachverstehen, Dialogkontrolle
25
System Architecture
Telephone Network
Network Interface and I / O Control
Speech Recognition
Language Understanding
Dialogue Control
Speech Output
db
26
Language Understanding
  • Attributed Stochastic Context-Free Grammarfor
    meaningful phrases (concepts)
  • Stochastic bigram language model for'filler'
    phrases
  • gt Concept graph
  • Separation of task-knowledge and
    site-knowledgeDatabase specific non-terminals
    and their likelihoods are resolved by calling the
    directory database

27
Language Model P(W)
  • Stochastic N-gram
  • can not model underlying structure of language
  • Syntactic grammar
  • many sentences can not be fully covered
  • parses are highly ambiguous
  • sentences can often not be parsed in real-time
  • But A simple grammar may be used to describe
    meaningful phrases (concepts) like ltdategt or
    lttimegt

28
Hybrid Approaches
  • Combination of phrase-structure grammars and
    stochastic N-gram language models
  • Stochastic context-free grammar (SCFG) to model
    concepts and to provide production probabilities
  • Ways of integrating SCFGs and N-grams
  • smoothing of N-gram with a SCFG
  • partial parsing with a SCFG, N-gram model for
    non-grammatical phrases (fillers)
  • partial parsing with a SCFG, top-level N-gram for
    sequence of words and phrases

29
Phrase Grammar Stochastic Filler LM
  • SCFG for each concept
  • Partial parsing of the input
  • Bigram language model for those parts of the
    input which are not covered by the grammar
    (fillers)
  • Top-level concept bigram
  • ExampleWie sieht es bei Ihnen am Montag aus
    FILLER ltdategt FILLER

30
Grammar Phrase LM
  • SCFG for the concepts
  • Partial parsing of the input
  • Combined word/phrase language model
  • ExampleWie sieht es bei Ihnen am Montag ausWie
    sieht es bei Ihnen ltdategt aus
  • This approach also models the conceptual context
    of the filler words

31
Word and Concept Graph
want
I
the
of
email
Mike
Daves

1
2
3
4
4
5
6
7
8
Dave
Samuel
Davis
Parser
filler
filler
filler
filler
requestemail
fnameMike
nameDaves
1
2
3
4
4
5
6
7
8
fname Dave
filler
fnameSamuel
nameDavis
32
Integrating the Database
  • Incorporate database knowledge in search for
  • the best path Seide et al. ICSLP96
  • Model a-priori distribution of information items
  • Reject a hypothesis if it is
  • contradictory in itself
  • inconsistent with the system prompt
  • inconsistent with the system's 'belief'
  • using an N-best method Tran et al. ICSLP 96
  • gt Significant decrease of the attribute error
    rate

33
Dialogue Control
  • Mixed-initiative dialogue
  • Slot filling strategy
  • information can be given in arbitrary order
  • not all slots have to be filled
  • dialogue continues as long as additional
    information is needed to disambiguate database
    entries
  • Implicit verification

34
Experimental Results
  • Training
  • acoustics 12.1h train timetable information
  • grammar and LMs 2348 dialogues (8214 turns)
  • Test 1164 dialogues (3198 turns)
  • (AER attribute error rate, WER word error rate)
  • Average search depth 3.8

35
Telefonauskunft Lexikon
  • Lexicon contains 1580 words
  • 683 last names
  • 301 first names
  • 117 names of groups, departments, ...

36
Example Dialogue
  • S Hello, this is PADIS. How can I help you?
  • U I'd like to talk to Mike.
  • S So, you want to talk to Mike Anderson?
  • U Yes, please.
  • S The extension 506 is busy right now.Can I do
    anything else for you?
  • U Oh yes. Can you give me the email address of
    Mary, please?
  • S Which Mary do you want the email address of?
  • U Uh, she is in the speech recognition group.
  • S So you want the email address of Mary White
    from the speech recognition group?
  • U Exactly
  • S The email address is ...

37
Sprachdialogsysteme über Telefon
  • Anwendung Zugauskunft, Reiseauskunft,
    Telefonauskunft, Home Banking etc.
  • Echtzeitverarbeitung und hohe Sprachverstehensrate
  • Sprachverstehen unter den Randbedingungen der
    Spracherkennung (Erkennungsfehler)
  • Dialogführung berücksichtigt die Unsicherheit
    beim Verstehen
  • Erweiterungen vom Operator zum Agenten, von der
    Auskunft zu Transaktion

38
Demonstration Zugauskunfts-Prototyp (Philips)
  • Allgemein erreichbar unter (0241) 60 40 20
  • Zugverbindungen zw. 1200 deutschen Bahnhöfen
  • Sprecherunabhängig, über Telefon, 1800 Wörter
    Vokabular
  • Freier Dialog, fragt nach (im Gegensatz zu IVR
    interactive voice response)
  • Bestätigung des Verstandenen während des Dialoges
  • Weitere Systeme für Schweizer Deutsch,
    Holländisch und Französisch

39
Einige deutsche Demo-Telefonnummern (ohne Gewähr!)
  • Zugauskunft (Deutsche Bahn) (01805) 99 66 22
  • Zugauskunft (Philips) (0241)
    60 40 20
  • Vermittlung (Philips Forschung) (0241) 6003 666
  • Einige der Systeme sind IVR-Systeme (interactive
    voice response)

40
Übernächste Woche
  • Klausur Prozess- und Rechnerarchitektur Bildver
    arbeitung Sprachverarbeitung
  • 19.07.01, 0900 1200 Uhr, Raum G22a / 13
    davon 1h Sprachverarbeitung
  • Keine Unterlagen
  • Verständnisfragen zu den Themen der Vorlesung
  • Ggf. Rechnungen zu Themen aus Ãœbungen und zu oft
    wiederkehrenden Konzepten
Write a Comment
User Comments (0)
About PowerShow.com