VoiceXML: Speech Recognition Grammars - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

VoiceXML: Speech Recognition Grammars

Description:

sixteen dollars and fifty seven cents. ten dollars. nine ... ten million five hundred thousand and fifty three. minus one point five. plus one point five ... – PowerPoint PPT presentation

Number of Views:225
Avg rating:3.0/5.0
Slides: 36
Provided by: Michael2145
Category:

less

Transcript and Presenter's Notes

Title: VoiceXML: Speech Recognition Grammars


1
VoiceXML Speech Recognition Grammars
2
Acknowledgements
  • Prof. Mctear, Natural Language Processing,
    http//www.infj.ulst.ac.uk/nlp/index.html,
    University of Ulster.
  • Bevocal documentation

3
Overview
  • Types of grammar
  • Grammar design and use
  • Optional items in a grammar
  • Semantic tags
  • DTMF grammars
  • Grammar rules
  • Built-in grammars
  • Grammar scope

4
What is a grammar
  • A grammar defines the words and patterns of words
    that a user can say at any particular point in a
    dialogue
  • Uses
  • speech recognition to constrain the speech
    recognition process by specifying permissible
    sequences of words
  • language understanding to determine the
    structure and/or meaning of a sequence of words
    e.g.
  • Transfer one hundred dollars from my checking to
    my savings account
  • might be parsed and transformed into the
    structure
  • lttransfergt
  • ltcommandgt transfer lt/commandgt
  • ltdestinationgt savings lt/destinationgt
  • ltsourcegt checking lt/sourcegt
  • ltamountgt 100 lt/amountgt
  • lt/transfergt

5
Types of grammar
  • Finite-state and phrase structure
  • take the form of rules with a left-hand and
    right-hand side e.g.noun_phrase -gt determiner
    adjective noun
  • flight -gt ltdestinationgt ltdategt lttimegt
  • used in language understanding and speech
    recognition
  • N-gram (used in speech recognition)
  • based on probabilities of word combinationse.g.
    bigrams, trigrams

6
Grammar in VoiceXML
  • May be specified
  • Inline i.e. embedded into a VoiceXML page
  • External i.e. stored as files on Web servers,
    etc.
  • Grammar formats
  • XML, ABNF (Augmented BNF syntax), Java Speech
    Grammar format (JSGF), GSL (Nuances Grammar
    Specification language)
  • W3C specification embodies XML and ABNF
  • IBM Voice Toolkit supports the XML and ABNF
    grammar formats
  • Bevocal Café, Voxpilot and Tellme support the XML
    and GSL grammar formats
  • For further details on the W3C Speech Recognition
    Grammar Specification, see http//www.w3.org/TR/sp
    eech-grammar/

7
Inline and External Grammar Definitions
  • An external grammar is defined in an external
    file and referenced in the VoiceXML document
  • In an external grammar document, all rules must
    be named
  • In external GSL grammar file, the contents of
    that file should not be inside a CDATA section
    and should not contain a ltgrammargt element.
  • GSL2.0 ...grammar rule definitions...
  • An inline grammar is defined within the ltgrammargt
    element in a VoiceXML document.
  • In an inline grammar, if the grammar consists of
    exactly 1 rule, that rule does not have to have a
    name.
  • GSL grammars use special characters wrap your
    inline grammar as a section of CDATA
  • ltgrammar ...usage attributes...gt lt!CDATA
    ...grammar header... ...grammar rule
    definitions... gt lt/grammargt

8
ltoptiongt element
  • Specifies a set of possible responses for a field
  • If the number of possible responses is small,
    then a set of ltoptiongt elements can be used
    instead of a ltgrammargt element
  • ltformgt
  • ltfield namechoice"gt          ltpromptgt       
         Say students, courses, or reports
  • lt/promptgt           ltoptiongtstudentslt/optiongt 
              ltoptiongtcourseslt/optiongt          
    ltoptiongtreportslt/optiongt
  • lt/fieldgt
  • lt/formgt

ltoptiongt can also be used for alternative DTMF
input e.g. ltoption dtmf 1 value balance gt
balance lt/optiongt
9
Grammar Design
  • A grammar should cover all the ways that a user
    might say something
  • Alternative choices within a category
    e.g.studentname john rosemary etc
  • Alternative words for the same concept
    e.g.comms communications
  • Alternative sentences that have the same meaning
    e.g.(student john scott taking
    databases)(databases john scott)(john scott
    taking the course databases)
  • Note careful wording of prompts can constrain
    the user to saying what has been predicted by the
    grammar designer

These examples use the GSL grammar format, which
is more suitable than the XML format for the
presentation of examples
10
Grammars for words
  • Simple words (or touch-tone strings) tokens

Alternative words
11
Making items optional
12
Making items optional-2
  • ( news weather sports ?please )
  • ( ? (i'd like) (tell me) ?the news weather
    sports ?please )

13
Repeating items
  • XML
  • repeat "0-1" means the item is optional i.e.
    zero or one time
  • repeat "n- means the item is repeated n or
    more times e.g. 0- zero or more times
  • repeat "m-n" means the item re repeated between
    m and n times (inclusive) e.g. 1-3 between
    one and three times
  • repeat "n" means the item is repeated exactly n
    times
  • GSL
  • (item) - the item is repeated 1 or more times
  • (item) - the item is repeated 0 or more times
  • ?(item) the item is optional

14
Grammar Slots (Tags)
  • Grammar slots are used in grammars to return a
    value representing the meaning of the word(s)
    recognised e.g. checking account and checking
    should return the same value.
  • GSL
  • ltfield name MainMenugt
  • lt!CDATA
  • ( ? (i'd like) (tell me) ?the
  • (news  ?reports)    ltselection newsgt
  • (weather  ?info information) ltselection
    weathergt
  • (sports  ?updates news)  ltselection sportsgt
  • ?please )
  • gt
  • ltfilledgt
  • ltassign nameselected" exprMainMenu.selec
    tion"/gt

15
Grammar rules sentences
  • Grammars often consist of sub-grammars e.g.
  • GSL 2.0
  • ColoredOjbectpublic (Color Object)
  • Color
  • red pink ltcolor redgt
  • yellow canary ltcolor yellowgt
  • green khaki ltcolor greengt
  • Object
  • truck car ltobject vehiclegt
  • ball block ltobject toygt
  • shirt blouse ltobject clothinggt
  • "yellow shirt" "canary blouse"gt color yellow
    object clothing

Colored Object

Object
Color
16
Grammar with sub-rules
  • Sub-grammars and rules are referenced in XML form
    using a rule reference. A rule reference can
    point to a local grammar, or an external grammar
    rule contained in another file or even on another
    server on the Internet.
  • Design of a grammar consisting of sub-grammars
    requires considerable planning to ensure that all
    possible utterances are covered and also to avoid
    redundancies as well as repetitions in the
    grammar.
  • It is often useful to map out the grammar
    diagrammatically or using a simple format such as
    GSL or ABNF before attempting to code the rules
    in XML format.

17
Rule Scope - GSL
  • Each defined rule has a scope of either private
    or public.
  •  A rule with public scope is
  • visible outside its grammar and can be
    referenced by name from other grammars
  • can be activated for recognition (can serve as a
    top-level rule)
  •  A rule with private scope is
  • visible only within its containing grammar
  • may be referenced only by other rules within the
    same grammar.
  • To mark a rule as public, the format is
    RuleNamepublic ruleExpansion
  • If no rules in the grammar are explicitly marked
    with public, then all rules in the grammar are
    public.
  • If any rule in the grammar is marked with
    public, then all public rules must be so marked.
  • The root rule in a GSL grammar is always the
    first public rule.
  • For example, the following set of definitions
    creates one public rule named Snapper and two
    private rules named SnapperType and FishColors
  • SnapperType mutton FishColors
  • FishColors black gray red
  • Snapperpublic (SnapperType snapper)

18
Rule scope - XML
  • By default, VoiceXML 2.0 grammar rules are
    private. This means that the rules can only be
    referenced within the same grammar file.
  • To allow a grammar rule to be referenced from an
    external source, such as a VoiceXML document or
    another grammar, the rule needs to be scoped as
    public using the scope attribute
  • ltrule id choice scope public gt
  • ltruleref uri"studentname"/gt
  • lt/rulegt
  • ltrule id studentname"gt
  • ltone-ofgt
  • ltitemgt john lt/itemgt
  • ltitemgt rosemary lt/itemgt
  • lt/one-ofgt
  • lt/rulegt

Can be referenced from outside grammar
References a rule in same grammar
Not public, can only be referenced by a rule in
same grammar
19
Grammar Headers - GSL
  • Inline
  • ltgrammar type"application/x-nuance-gsl"gt
  • External
  • GSL2.0
  • ...grammar rule definitions...
  • No definition of top-level rule
  • Referencing an external grammar or a top level
    rule in a grammar
  • ltgrammar src"foo.gsl"gt
  • ltgrammar src"foo.gslMonth"gt

20
Grammar Headers - XML
  • Inline
  • ltgrammar type"application/srgsxml"
    root"source version1.0gt
  • lt! grammar rule(s) -gt
  • lt/grammargt
  • External
  • lt?xml version"1.0" encoding"iso-8859-1"?gt
  • lt!DOCTYPE grammar PUBLIC "-//W3C//DTD GRAMMAR
    1.0//EN"
  • "http//www.w3.org/TR/speech-grammar/grammar.dtd"gt
  • ltgrammar version"1.0" xmlns"http//www.w3.org/20
    01/06/grammar"
  • tag-format"semantics/1.0"
  • mode"voice" roottransfergt
  • lt! grammar rule(s) -gt
  • lt/grammargt
  • Note the root node for the grammar must be
    defined

21
Grammar Scope
  • Grammar elements can be included within any
    VoiceXML element that receives user input
  • field
  • link for transitions to other documents e.g.
    operator.vxml
  • menu grammar implicitly specified by the
    ltchoicegt element
  • form for mixed-initiative dialogues
  • by default the scope of a grammar is limited to
    the elements in which it is defined
  • scope can be set using the scope attribute e.g.
    grammars defined within forms or menus can be
    given document scope
  • grammars defined in the root document scope to
    the entire application

22
Using Grammar Effectively
  • A grammar should cover effectively the range of
    responses that can be encountered to a prompt
  • this can include the essential input as well as
    extraneous words and phrases
  • a grammar that is too large will hinder speech
    processing and lead potentially to more
    misrecognitions
  • scope is important grammars should not overlap
  • excessive use of global grammars (defined in the
    root document) can increase the possibility of
    overlapping

23
Tutorial Exercise 1. Using tags
  • Integrate the following rule and its grammar into
    an application that takes in the name of a
    student and the name of a course and outputs the
    student's name along with a course code.
  • ltrule id"rule2" scope"public"gt
  • ltone-ofgt
  • ltitemgt
  • ltone-ofgt
  • ltitemgt comms lt/itemgt
  • ltitemgt communications
    lt/itemgt
  • lt/one-ofgt
  • lttaggt"01"lt/taggt
  • lt/itemgt
  • ltitemgt algorithms lttaggt"02"lt/taggtlt/itemgt
  • ltitemgt programming lttaggt"03"lt/taggtlt/item
    gt
  • ltitemgt databases lttaggt"04"lt/taggtlt/itemgt
  • lt/one-ofgt
  • lt/rulegt

24
DTMF
  • DTMF (touch-tone) can be used as an alternative
    to speech input, particularly when speech
    recognition is unreliable or problematic.
  • In VoiceXML 2.0 dtmf is included as a value of
    the mode attribute in the ltgrammargt element
  • ltgrammar mode"dtmf" type"application/srgsxml"
    version "1.0" root"digit"gt
  • ltrule id "digit" scope "public"gt
  • ltone-ofgt
  • ltitemgt 1 lttaggt students" lt/taggt lt/itemgt
  • ltitemgt 2 lttaggt courses" lt/taggtlt/itemgt
  • ltitemgt 3 lttaggt reports" lt/taggt lt/itemgt
  • lt/one-ofgt
  • lt/rulegt
  • lt/grammargt

25
DTMF and / or speech in GSL
  • GSL 2.0
  • Rating(
  • ?(i feel ?like) (it is ?a) (its ?a)
  • one dtmf-1 ltnumRating 1gt
  • two dtmf-2 ltnumRating 2gt
  • three dtmf-3 ltnumRating 3gt
  • .

26
DTMF after counts
  • Prompt counts can be used, e.g. to give the user
    an opportunity to choose using speech, then
    advise use of keypad if speech is unsuccessful
  • ltnomatch count"1"gt
  • ltreprompt/gt
  • lt/nomatchgt
  • ltnomatch count"2"gt
  • please use your keypad
  • lt/nomatchgt

27
Tutorial Exercise 2 DTMF and speech
  • Create a file with choices (student details
    course details reports) that allows speech as
    well as DTMF input
  • Include a nomatch (or noinput) event that asks
    the user to use the keypad on the second time
    that speech input is unsuccessful.
  • The system should confirm with words rather than
    DTMF
  • ltgrammar mode"dtmf" type"application/srgsxml"
    version "1.0" root"digit"gt
  • ltrule id "digit" scope "public"gt
  • ltone-ofgt
  • ltitemgt 1 lttaggt "student details" lt/taggt lt/itemgt
  • ltgrammar type"application/srgsxml"
    root"choice" version"1.0"gt
  • ltrule id "choice" scope "public"gt
  • ltone-ofgt
  • ltitemgt student details lttaggt "student details"
    lt/taggt lt/itemgt

28
Built-In Grammars
  • Built-in grammars are provided in VoiceXML
  • boolean (true or false in DTMF 1 is true, 2 is
    false)
  • date
  • digits (e.g. three four seven)
  • currency
  • number (e.g. three hundred and forty seven)
  • phone
  • time
  • specifying within the ltfieldgt element
  • ltfield name age type numbergt

29
Built-In Grammar Digits
  • Digit recognition is performed in VoiceXML by
    using a built-in grammar for digits that is
    declared as a field type. For example
  • ltfield namepin" type "digits"gt
  • The user can say one or more digits between 0 and
    9 and the result will be a string of digits.
  • If the field value is used in a prompt, it will
    be spoken as a sequence of digits e.g. one five
    six four.
  • You can also parameterise the digit built-in
    grammar as follows
  • digits?minlengthn - a string of at least n
    digits
  • digits?maxlengthn - a string of at most n digits
  • digits? lengthn - a string of exactly n digits
  • e.g.
  • ltfield type"digits?minlength3maxlength5gt

30
Digits grammar example
  • ltformgt
  • ltfield namepin" type"digits?length4"gt
  • ltpromptgtwhat is your pin?lt/promptgt
  • lt/fieldgt
  • ltblockgt
  • ltpromptgt
  • Confirming your pin is ltsay-as interpret-asvxml
    digits"gt ltvalue exprpin"/gtlt/say-asgt
  • lt/promptgt
  • lt/blockgt
  • lt/formgt

31
Built-in grammar boolean
  • The boolean grammar contains ways of saying yes
    or no
  • The particular words within the boolean grammar
    are dependent on the locale i.e. the language
    type e.g. US English, UK English, etc.
  • The words may also vary from one platform to
    another
  • IBM Voice Toolkit UK English
  • yes, true, positive, right, ok, sure,
    affirmative, check, yep, correct, no, false,
    negative, wrong,not, nope, incorrect
  • The return value sent is a boolean true or false.
  • If the field name is subsequently used in a value
    element within a prompt, the TTS engine will
    speak either yes or no.
  • Users can also provide DTMF input 1 is yes, and
    2 is no.

32
Boolean grammar example
  • ltform scope"dialog"gt
  • ltfield namepin" type"digits?length4"
    modal"false"gt
  • ltprompt version"1.0"gt
  • what is your pin?
  • lt/promptgt
  • lt/fieldgt
  • ltfield name"confirm" type"boolean"
    modal"false"gt
  • ltprompt version"1.0"gt
  • Please confirm your pin is ltsay-as
    interpret-asvxmldigits"gtltvalue
    exprpin"/gtlt/say-asgt
  • lt/promptgt
  • lt/fieldgt
  • lt/formgt

33
Sample input for built-in field types
34
Sample input for UK English built-in field types
(continued)
35
Tutorial Exercise 3. Built-in grammars
  • Aim to include built-in grammars
  • Create an application in which the user has to
    speak their account number, which consists of 6
    digits (use built-in digit grammar).
  • Extend the application with other built-in
    grammars, such as date.
  • Experiment with the use of the DTMF simulator to
    enter the values for account number, date, etc.
Write a Comment
User Comments (0)
About PowerShow.com