Title: Overview of Language Model Classes and Release Progress
1Overview of Language Model Classes and Release
Progress
XML
ABNF
IHD
BNF
JSGF
BNF
Daniel May Intelligent Electronic Systems Human
and Systems Engineering Department of Electrical
and Computer Engineering
2- Language Model Classes
- LanguageModelIHD Explanation of IHD-gtBNF and
BNF-gtIHD conversions. - LanguageModelABNF Explanation and example of
ABNF-gtBNF conversion algorithm. - LanguageModelBNF Explanation of graph
minimization algorithm - LanguageModelXML and LanguageModelJSGF
- Network Utilities isip_network_builder,
isip_network_converter - Release Progress
- Outstanding Issues
- Plan
- Deadline
3- What is Normalized BNF?
- Normalized BNF consists only of the following
three rule forms - 1. (RULE_NAME) ?(TERMINAL),(NON_TERMINAL)
- 2. (RULE_NAME) ?(NON_TERMINAL)
- 3. (RULE_NAME) ?(EPSILON)
- IHD?BNF
- Straightforward conversion process
- Each IHD arc is converted to a normalized BNF
rule - Example
IHD
BNF
4- BNF ? IHD
- Straightforward conversion process
- Simply the reverse of the IHD?BNF process
- Unique nodes identified by unique instances of
- (RULE_NAME)?(TERMINAL)
- Concatenation tokens (,) correspond to arcs and
are weighted - Example
BNF
IHD
5- ABNF ? BNF
- Complicated!
- Accomplished using a recursive algorithm that
extracts sets of right symbols and left
symbols and builds a set of normalized BNF
rules. - A set of right and left symbols is found when a
concatenation, Kleene star () or Kleene plus
() is encountered. - If n left symbols and m right symbols are found,
n x m BNF rules are created. - ABNF rules are processed one at a time
- We iterate over the tokens in each rule from
left to right and look for concatenation, Kleene
star, and Kleene plus tokens. - When one of these tokens is encountered, the
recursive methods findLeftSymbols() and
findRightSymbols() are called. Each returns a
set of symbols.
6- Example
- We must first construct a set of nodes using
unique combinations of - (RULE_NAME)?(TERMINAL)
IHD
ABNF
Nodes
7IHD
ABNF
BNF Rules
This rule contains no tokens of interest, so we
move on to the next rule.
8IHD
ABNF
BNF Rules
As we iterate from left to right, we encounter a
concatenation token. The findLeftSymbols method
returns A.
9IHD
ABNF
BNF Rules
When findRightSymbols is called, we encounter a
Kleene star.
10IHD
ABNF
BNF Rules
The findRightSymbols method must be called on the
token following the next concatenation at this
nesting level.
11IHD
ABNF
BNF Rules
Next, findRightSymbols is called on the token
following the Kleene star. In this case, its an
opening parenthesis.
12IHD
ABNF
BNF Rules
For an opening parenthesis, we call
findRightSymbols on the token following it.
13IHD
ABNF
BNF Rules
We also look for alternation tokens, and call
findRightSymbols on tokens following the them.
14IHD
ABNF
BNF Rules
The Kleene plus is ignored since it isnt
currently relevant, and findRightSymbols is
called on the open parenthesis.
15IHD
ABNF
BNF Rules
Now we can construct a set of BNF rules from the
right and left symbols.
16IHD
ABNF
BNF Rules
The next token of interest is a Kleene star. For
these, we want a self loop on all rule segments
following alternations.
17IHD
ABNF
BNF Rules
Since the following token is an open parenthesis,
we find all rule segments separated by
alternation tokens.
18IHD
ABNF
BNF Rules
A different set of rules is created for each
segment.
19IHD
ABNF
BNF Rules
findRightSymbols is called on the first token of
each segment, and findLeftSymbols is called on
the last.
20IHD
ABNF
BNF Rules
findRightSymbols is called on the first token of
each segment, and findLeftSymbols is called on
the last.
21IHD
ABNF
BNF Rules
findRightSymbols is called on the first token of
each segment, and findLeftSymbols is called on
the last.
22IHD
ABNF
BNF Rules
findRightSymbols is called on the first token of
each segment, and findLeftSymbols is called on
the last.
23IHD
ABNF
BNF Rules
The next token of interest is another
concatenation. Again, we find a set of right and
left symbols and build rules.
24IHD
ABNF
BNF Rules
The next token of interest is another
concatenation. Again, we find a set of right and
left symbols and build rules.
25IHD
ABNF
BNF Rules
The next token of interest is another
concatenation, but this time, the right symbol is
a non terminal.
26IHD
ABNF
BNF Rules
When findRightSymbols is called on a non
terminal, findRightSymbols is called on the first
token of the rule referenced.
27IHD
ABNF
BNF Rules
BNF start rules are found by calling
findRightSymbols on the first token of the ABNF
start rules.
28- Weights
- ABNF does not have a mechanism for defining
weights on arcs because ABNF has no knowledge of
arcs. Arcs are just implied by the grammar
representation. - When converting from IHD to any other format
that uses ABNF as an intermediate, weights are
included on the open parenthesis tokens preceding
non terminal and terminal symbols. - In some cases, the ABNF rules must be
restructured to support weights. This will only
be the case if the source of the grammar is not
ISIP internal. - Testing
- The ABNF?BNF algorithm has been thoroughly
tested on ABNF grammars derived from XML, but
more testing needs to be done on arbitrary ABNF
grammars.
29- Graph Minimization
- Converting from XML introduces redundancy.
Although resulting graphs are equivalent to the
originals, theyre much larger and nearly
impossible to interpret visually. - The minimize method in LanguageModelBNF can be
used to remove redundancy once the language model
is in BNF representation. - The algorithm iterates over all rule pairs and
determines whether or not the rules can be merged
into a single rule. - Rules can be merged if the non terminal of both
rules reference the same terminal and if the
weights on the concatenation tokens are the same.
When two rules are merged, the other rules must
all be updated. - Example
30- Example
- Testing
- Currently, this minimization algorithm has been
tested by visually inspecting the original graph
and resulting graph and verifying that they are
equivalent. - The isip_lm_tester tool will be able to test it
more thoroughly once the language model parsing
capability is complete.
31- Class LanguageModelXML and LanguageModelJSGF
- LanguageModelXML
- Wesley has completed this class and checked it
in. Minor changes are made every once and a
while, but overall, the conversions from BNF to
XML and XML to ABNF are working fine. - LanguageModelJSGF
- This class will be implemented similarly to
LanguageModelXML. - The underlying JSGF representation is ABNF.
- JSGF parsing algorithms already exist, but
currently, the JSGF tokens are converted directly
to IHD. - This was supposed to be finished several weeks
ago, but issues regarding ABNF to BNF conversion
and graph minimization have caused delays.
32- Other Language Model Related Utilities
- isip_network_converter
- Changes have been made to incorporate XML, BNF,
and ABNF. - A minimize option has been added that invokes the
minimization routine when the language model is
in BNF representation. - isip_network_builder
- The changes to allow network_builder to save in
other formats are pending - isip_lm_tester
- Won is in the process of adding parsing
capability to this tool. Currently, the tool can
only generate random transcriptions. - Soon, it will be able to parse transcriptions and
verify that they are valid given a particular
language model.
33- Outstanding Issues
- LanguageModelJSGF (Daniel)
- Diagnose methods and documentation (Daniel,
Seungchan, Ted) - isip_lm_tester parsing capability (Won)
- isip_transform and isip_transform_builder
(Sridhar) - Varmint backlog (Everyone)
- Schedule/Deadline
- March 10 All code and documentation will be
completed, tested, and checked in (code freeze). - After March 10, we will begin running regression
and code integrity tests. - March 31 Release Date