Title: AudioMath: Speaking Mathematics with MathML
1AudioMath Speaking Mathematics with MathML
- Helder Filipe Ferreira hfilipe_at_fe.up.pt
- Laboratory of Speech Processing,
Electro-acoustics, Signals and Instrumentation - Laboratory of Signals and Systems
- Faculty of Engineering University of Porto,
Portugal - Second European Workshop on MathML Scientific
e-Contents - 16 to 18 September 2004. Kuopio. Finland.
2AudioMath Speaking Mathematics with MathMLThis
Presentation
- Introduction
- Audio Rendering of MathML
- Mathematical Audio Rendering Tools Overview
- AudioMath
- Final conclusions and remarks
3AudioMath Speaking Mathematics with MathMLThis
Presentation
- Introduction
- Preliminaries
- Background on publication of mathematical
e-contents - Web Accessibility and Mathematics
- Audio Rendering of MathML
- Mathematical Audio Rendering Tools Overview
- AudioMath
- Final conclusions and remarks
4AudioMath Speaking Mathematics with
MathMLPreliminaries (1/2)
- Provocative questions
- How can a blind person surpass the difficulty in
reading on-line documents with mathematical
expressions? - Why wasn't this completely solved yet?
- Is it not necessary? Is not easy?
- These questions are only the top of the iceberg
of a big problem with accessibility in the
internet. This concerns technical, scientific or
even simple documents containing math
expressions.
5AudioMath Speaking Mathematics with
MathMLPreliminaries (2/2)
- One possible solution
- The use of Text-to-Speech (TTS) technology to
create audio versions of the mathematical
contents. - Audio medium is accessible and general purpose
TTS engines are available - In principle, math can be spoken out
- Which leads us into several questions on how to
do it - How to read math?
- Which are the cognitive problems behind?
- What technique should be used to code the math
information?
6AudioMath Speaking Mathematics with MathMLThis
Presentation
- Introduction
- Preliminaries
- Background on publication of mathematical
e-contents - Web Accessibility and Mathematics
- Audio Rendering of MathML
- Mathematical Audio Rendering Tools Overview
- AudioMath
- Final conclusions and remarks
7AudioMath Speaking Mathematics with
MathMLBackground on publication of mathematical
e-contents (1/4)
- The publication of scientific documents
containing mathematical formulae is extremely
demanding. - The appearance of the TeX system by Donald Knuth
solved the majority of problems with printed
documents. - Then WYSIWYG editors, and subsequently markup
languages for Internet appeared, such us HTML. - However HTML per se doesnt allow the use of a
mathematical description language directly into
the document. - So, other solutions were envisioned
8AudioMath Speaking Mathematics with
MathMLBackground on publication of mathematical
e-contents (2/4)
- ...
- (X)HTML Images (jpg, gif, png, SVG) Math
expressions as images (raster or vector types)
(non accessible). - HTML Symbol Fonts / (X)HTML CSS Use tables
to structure information. No semantic meaning of
the math expression. Ex translator TtH. - Applets Using applets to generate mathematical
expressions. Slow and non accessible process. Ex
WebEQ. - Word / RTF / PDF / Postscript / TeX / LaTeX The
HTML documents produced by these materials
represent math expressions in the form of images
(non accessible). Ex TeX4ht.
9AudioMath Speaking Mathematics with
MathMLBackground on publication of mathematical
e-contents (3/4)
- ...
- MathML (Mathematical Markup Language) its one of
the most accessible solutions - Allows the visualization of math expressions on
the web - Dynamic and interactive contents
- Publishing technical information in electronic
format - Swap math data between applications
- Interpretation of math expressions in non-visual
media - Existence of tools that allow conversion between
Latex/Tex and MathML
10AudioMath Speaking Mathematics with
MathMLBackground on publication of mathematical
e-contents (4/4)
- Why the use of MathML?
- Developed by W3C and becoming a standard.
- The rapidly growing use of it by several relevant
organizations associated with the teaching and
learning of mathematical contents, as well as the
involvement of software houses. - Emergence of editors and applications that create
and manipulate MathML documents. - Existence of conversion tools for the main
publishing formats. - The fact that it is a markup language allows its
parsing, interpretation and conversion to other
formats, and consequently a higher accessibility,
portability and platform independence. - This presentation will focus on MathML as the
supporting technology for speaking mathematics.
11AudioMath Speaking Mathematics with MathMLThis
Presentation
- Introduction
- Preliminaries
- Background on publication of mathematical
e-contents - Web Accessibility and Mathematics
- Audio Rendering of MathML
- Mathematical Audio Rendering Tools Overview
- AudioMath
- Final conclusions and remarks
12AudioMath Speaking Mathematics with MathMLWeb
Accessibility and Mathematics (1/3)
- Though non-technical publications are now
readily available to blind people the degree of
access diminishes as the amount of technical
information increases. ... It is also true to
say that once the technical documents have been
acquired their reading is a struggle and far from
a pleasurable experience. Fitzpatrick
Monaghan, BULAG 1999.
Amount of technical information
Access to information by blind people
13AudioMath Speaking Mathematics with MathMLWeb
Accessibility and Mathematics (2/3)
- How hard can it be to communicate about math on
the Internet? The truth is, its a fairly
difficult task. The Math Forum
http//mathforum.org/typesetting - Nowadays the accessibility of technical and
scientific documents online is very reduced. - Elements that can compose a web document
- Text
- Images graphics or pictures
- Math expressions
- Applets
- Scripts
- Multimedia (Flash, Shockwave, ...)
14AudioMath Speaking Mathematics with MathMLWeb
Accessibility and Mathematics (3/3)
- Is there web accessibility on currently available
technical representations of mathematical
formulae? - Images
- Applets
- SVG
- MathML
- (X)HTMLCSS
- How can blind people read online documents
containing mathematical expressions, then? - Usually they cant... except for a few cases to
be seen later (topic Mathematical Audio Rendering
Tools Overview).
Stars indicate the degree of the accessibility,
effectiveness and manipulation we can achieve
with the technique.
15AudioMath Speaking Mathematics with MathMLThis
Presentation
- Introduction
- Audio Rendering of MathML
- Speaking Mathematics
- Which MathML markup to use?
- Interpretation of MathML tag set elements and
attributes - Math Formulae Navigation
- Mathematical Audio Rendering Tools Overview
- AudioMath
- Final conclusions and remarks
16AudioMath Speaking Mathematics with
MathMLSpeaking Mathematics (1/4)
- Its known that people read text letter-by-letter
beginning at or near the leftmost symbol in a
text line and often engage in backward scans to
retrieve information that was previously read. - But reading mathematics is very different than a
simple text. - To start with, mathematics is a two-dimensional
written language. Almost all the other ordinary
languages are primarily spoken and later on
written, and both in a one-dimensional form (in a
clearly defined sequence).
17AudioMath Speaking Mathematics with
MathMLSpeaking Mathematics (2/4)
- For a non-visually impaired person, understanding
a mathematical formula requires a repeated scan
and jumping over secondary portions. - Example
- 1st scan its a square root
- Another scan it has a fraction on it
- Another scan ab/c in numerator and dxe in
denominator -
- However this can be a very complex task to blind
people. Therefore studies should be done to
understand how we should provide a correct access
to math expressions.
18AudioMath Speaking Mathematics with
MathMLSpeaking Mathematics (3/4)
- The few studies that can give us some clues, that
we know of, are - Research being done by Arthur Karshmer
- PhD thesis by Stevens (related to math prosody)
- PhD thesis by Raman (related to audio rendering
of latex) - MathSpeak project (related to Nemeth Code)
- AudioMath project
- Problems, far from solved, in speaking
mathematics include - Navigation in a mathematical formulae
- Complete study on mathematical prosody
- Almost lack of studies on cognitive issues about
reading mathematics to humans
19AudioMath Speaking Mathematics with
MathMLSpeaking Mathematics (4/4)
- No standard protocol exists for articulating
mathematical expressions as it does for
articulating the words of an English sentence.
The MathSpeak Project.
20AudioMath Speaking Mathematics with MathMLThis
Presentation
- Introduction
- Audio Rendering of MathML
- Speaking Mathematics
- Which MathML markup to use?
- Interpretation of MathML tag set elements and
attributes - Math Formulae Navigation
- Mathematical Audio Rendering Tools Overview
- AudioMath
- Final conclusions and remarks
21AudioMath Speaking Mathematics with MathMLWhich
MathML markup to use? (1/7)
- Representation of a math expression is perceived
by two distinct but associated concepts - Visual structure or notation
- Ex a/b, ab-1
- MathML Presentation Markup
22AudioMath Speaking Mathematics with MathMLWhich
MathML markup to use? (2/7)
- and
- Meaning that it represents
- Ex a divided by b
- MathML Content Markup
- The relationship between notation (Presentation
Markup) and meaning (Content Markup) is not
univocal.
23AudioMath Speaking Mathematics with MathMLWhich
MathML markup to use? (3/7)
- Presentation Markup
- Notation of a math expression
- Amaya and Mozilla browsers have native support
- About 30 elements and 50 attributes
- Ambiguous on the semantics
- Not the best to use for audio rendering, however
its adaptation its possible - Transforming Presentation markup into Content
markup its not advised, however the OpenMath
group has been working on such stylesheets.
24AudioMath Speaking Mathematics with MathMLWhich
MathML markup to use? (4/7)
- Content Markup
- Meaning of a math expression
- Netscape and Internet Explorer support this
- About 150 elements and 12 attributes
- Mostly used to transfer MathML between
applications - Ambiguous in notation
- Best to use for audio rendering
- Can be converted into Presentation markup
- Limited in the tag set (only covers the basic
algebra, arithmetic, logic, set theory, ... )
25AudioMath Speaking Mathematics with MathMLWhich
MathML markup to use? (5/7)
- Its obvious that Content markup would suit
better for audio rendering... - However there are 2 major open issues
- MathML content markup only covers basic math. The
operators dictionary and support its not very
big. (OpenMath seems to be doing a better job on
that ). - And, so far, the majority of published MathML
online is using the Presentation Markup (authors
use WYSIWYG editors). - Therefore, MathML Presentation Markup is for the
moment the best choice.
26AudioMath Speaking Mathematics with MathMLWhich
MathML markup to use? (6/7)
- However, take the following example
- In MathML Presentation Markup the indexes 2 and
3 will be coded as subscript and
superscript. - This gives no information if the expression
refers to the cubic power of the element A2 or
the permutation of 3 elements taken 2 at a
time. - So, the downside is that MathML Presentation
Markup requires a relatively much bigger effort
in the interpretation of mathematical
expressions.
27AudioMath Speaking Mathematics with MathMLWhich
MathML markup to use? (7/7)
- Conclusions
- An application that does MathML Audio Rendering
should support both Content and Presentation
Markup. - By starting first with the Presentation Markup we
need to develop some kind of interpretation that
allow us to construct the corresponding to the
content markup.
28AudioMath Speaking Mathematics with MathMLThis
Presentation
- Introduction
- Audio Rendering of MathML
- Speaking Mathematics
- Which MathML markup to use?
- Interpretation of MathML tag set elements and
attributes - Math Formulae Navigation
- Mathematical Audio Rendering Tools Overview
- AudioMath
- Final conclusions and remarks
29AudioMath Speaking Mathematics with
MathMLInterpretation of MathML tag set elements
and attributes (1/8)
- Not all the elements and attributes need to be
processed for audio rendering... - Styles and visual attributes not always provide
extra information about the expression. And
rarely enhance the audio description. - However if Presentation Markup is used some style
attributes might be important to disambiguate
meaning. - Example
30AudioMath Speaking Mathematics with
MathMLInterpretation of MathML tag set elements
and attributes (2/8)
- Comments on Presentation Markup elements and
attributes for audio rendering (1/5) - Special attributes
- class, style, id, xref, xlinkhref, other
usually not used. - Token elements attributes
- mathbackground, mathsize, fontsize, fontweight,
fontfamily, color usually not used. - mathcolor might be interesting to know the color
of the symbol. - mathvariant style variants are important.
- Italic might indicate a function
- Bold might indicate a vector
- Fraktur might indicate Lie algebra
31AudioMath Speaking Mathematics with
MathMLInterpretation of MathML tag set elements
and attributes (3/8)
- Comments on Presentation Markup elements and
attributes for audio rendering (2/5) - Elements
- maction not used in general, however, it might
be interesting to know actions if we want to give
access to formula manipulation and interaction. - maligngroup, malignmark not used.
- menclose important to identify the operator.
- merror not used. MathML test should be done
before the audio rendering. - mfenced important to know the delimiters.
32AudioMath Speaking Mathematics with
MathMLInterpretation of MathML tag set elements
and attributes (4/8)
- Comments on Presentation Markup elements and
attributes for audio rendering (3/5) -
- mfrac used.
- linethickness important to identify if its a
fraction or a combinatorial number. - mglyph used.
- alt important to speak out the description.
- mi, mn important.
- mlabeledtr used to speak out the table label.
- multiscripts used but its attributes aren't
needed. - mo important.
- moveablelimits, fence, separator, accent might
be used to perceive the operators behavior. - mover used.
33AudioMath Speaking Mathematics with
MathMLInterpretation of MathML tag set elements
and attributes (5/8)
- Comments on Presentation Markup elements and
attributes for audio rendering (4/5) -
- mpadded not needed.
- mphantom all the contents inside should be
ignored. - mprescripts used.
- mroot used.
- mrow important to give clues about sub
expressions (where to place pauses, for
instance). - ms used.
- mspace not needed.
- msqrt used.
34AudioMath Speaking Mathematics with
MathMLInterpretation of MathML tag set elements
and attributes (6/8)
- Comments on Presentation Markup elements and
attributes for audio rendering (5/5) -
- mstyle used.
- displaystyle needed because it can change some
behaviors of ltmogt. - msub, msubsup, msup all used.
- mtable, mtd, mtr used but attributes not needed.
- mtext used.
- munder, munderover used.
- none not needed.
35AudioMath Speaking Mathematics with
MathMLInterpretation of MathML tag set elements
and attributes (7/8)
- Comments on Content Markup elements and
attributes for audio rendering - Special attributes
- class, style, id, xref, xlinkhref, other
usually not used. - encoding needed for interpretation.
- definitionURL this seems a W3Cs escape route to
the lack of a bigger list of operators. But there
is no standard on what we can find on the URL. So
its unpredictable how it should render in audio. - Elements
- annotation, annotation-xml, semantics might be
used if the application has the capability to
process other types of markup or languages. - apply is important to delimit the operator
action area. - Operator elements needed for interpretation.
36AudioMath Speaking Mathematics with
MathMLInterpretation of MathML tag set elements
and attributes (8/8)
- Comments on MathML special characters for audio
rendering - ApplyFunction
- Ex f(x)
- Without ApplyFunction renders f open
parenthesis x close parenthesis. - With ApplyFunction renders f of x.
- InvisibleTimes
- Ex xy
- Without InvisibleTimes renders x y.
- With InvisibleTimes renders x times y.
37AudioMath Speaking Mathematics with MathMLThis
Presentation
- Introduction
- Audio Rendering of MathML
- Speaking Mathematics
- Which MathML markup to use?
- Interpretation of MathML tag set elements and
attributes - Math Formulae Navigation
- Mathematical Audio Rendering Tools Overview
- AudioMath
- Final conclusions and remarks
38AudioMath Speaking Mathematics with MathMLMath
Formulae Navigation (1/4)
- Reminder The more complex the expression the
more difficult it is to read it or understand a
spoken version of it. - Therefore some kind of navigation has to be
provided. - AudioMath hypothesis (under study)
- Use of content layers controlled by the user
- Controlled by keyboard
39AudioMath Speaking Mathematics with MathMLMath
Formulae Navigation (2/4)
- Example
- Level 0 This is a fraction.
- Level 1 Fraction with numerator minus b plus
minus and denominator 2 times a. - Level 2 numerator minus b plus minus
- Level 2.1 minus b
- Level 2.2 square root of b squared minus
- Level 3 denominator 2 times a
40AudioMath Speaking Mathematics with MathMLMath
Formulae Navigation (3/4)
- Interaction with the user (proposal)
- Some kind of navigation by keyboard.
- There are 4 directions in the navigation
- Up arrow to climb a level
- Down arrow get down one level
- Left arrow get next selection inside a level
- Right arrow get one selection back inside that
level - Special keys
- Enter read level
- Esc get back home (level 0)
41AudioMath Speaking Mathematics with MathMLMath
Formulae Navigation (4/4)
- Questions remaining unanswered
- How many layers/levels should there be?
- Who defines the layers? The user agent or the
external plug-in or the application that reads
the math document? - What should be the level of detail on the layers?
- Will this improve enormously the quality of the
perceiveness of the audio rendering? - Will the users adapt to this?
42AudioMath Speaking Mathematics with MathMLThis
Presentation
- Introduction
- Audio Rendering of MathML
- Mathematical Audio Rendering Tools Overview
- ASTER
- Introduction
- Brief analysis on its audio rendering of Latex
- MathPlayer 2.0
- Introduction
- Brief analysis on its audio rendering of MathML
- AudioMath
- Final conclusions and remarks
43AudioMath Speaking Mathematics with
MathMLASTER Introduction
- ASTER (Audio System for Technical Readings) is an
application that accepts TeX notation as input
and produces audio rendering as output. - Developed by T.V. Raman in 1994 during his PhD.
- Use of the Emacs front-end (Linux).
- ASTER has 3 main components
- Latex parser creates an internal representation
easier for the program to manipulate. - AFL (Audio Formatting Language) used to render
the parsed text using speech and other sounds. - Browser used to help the audio rendering.
44AudioMath Speaking Mathematics with MathMLThis
Presentation
- Introduction
- Audio Rendering of MathML
- Mathematical Audio Rendering Tools Overview
- ASTER
- Introduction
- Brief analysis on its audio rendering of Latex
- MathPlayer 2.0
- Introduction
- Brief analysis on its audio rendering of MathML
- AudioMath
- Final conclusions and remarks
45AudioMath Speaking Mathematics with
MathMLASTER Brief analysis on its audio
rendering of Latex (1/2)
- Demos from the ASTER site - http//www.cs.cornell.
edu/Info/People/raman/aster/demo.html
46AudioMath Speaking Mathematics with
MathMLASTER Brief analysis on its audio
rendering of Latex (2/2)
- A few comments
- ASTER was a breakthrough. TV Ramans work its
considered a bible in mathematics audio
rendering. - It supports a large number of mathematical
formulae in Latex. - No math formulae navigation support. It gets
complicated with complex math expressions.
However ASTER uses a variable substitution
process for resolving this problem. And complex
expressions can be divided in sub expressions on
users request.
47AudioMath Speaking Mathematics with MathMLThis
Presentation
- Introduction
- Audio Rendering of MathML
- Mathematical Audio Rendering Tools Overview
- ASTER
- Introduction
- Brief analysis on its audio rendering of Latex
- MathPlayer 2.0
- Introduction
- Brief analysis on its audio rendering of MathML
- AudioMath
- Final conclusions and remarks
48AudioMath Speaking Mathematics with
MathMLMathPlayer 2.0 Introduction
- MathPlayer is a mathematics display engine for
Microsofts Internet Explorer 6.0, developed by
Design Science. - Uses MathML Presentation markup as input and
visual rendering (version 1.0 and 2.0) and audio
rendering (only in version 2.0 out in 2004) as
output. - Download http//www.dessci.com/en/products/mathpl
ayer/download.htm
49AudioMath Speaking Mathematics with MathMLThis
Presentation
- Introduction
- Audio Rendering of MathML
- Mathematical Audio Rendering Tools Overview
- ASTER
- Introduction
- Brief analysis on its audio rendering of Latex
- MathPlayer 2.0
- Introduction
- Brief analysis on its audio rendering of MathML
- AudioMath
- Final conclusions and remarks
50AudioMath Speaking Mathematics with
MathMLMathPlayer 2.0 Brief analysis on its
audio rendering of MathML (1/2)
- Demos created by using MathType, Microsoft Word,
MathPlayer and Microsoft SAM TTS Engine
demomathplayer.htm
51AudioMath Speaking Mathematics with
MathMLMathPlayer 2.0 Brief analysis on its
audio rendering of MathML (2/2)
- A few comments
- The table problem why not detect that its a
matrix? - Lack of some keywords begin ltoperatorgt and end
ltoperatorgt can result in ambiguous readings. - No math formulae navigation support. Once again,
it gets complicated with complex math
expressions. - No usermode options provided.
Demo2
52AudioMath Speaking Mathematics with MathMLThis
Presentation
- Introduction
- Audio Rendering of MathML
- Mathematical Audio Rendering Tools Overview
- AudioMath
- The AudioMath Project
- AudioMath Architecture
- The AudioMath Process
- Database of Vocabulary and Speech Rules
- Current Status and Future Work
- Final conclusions and remarks
53AudioMath Speaking Mathematics with MathMLThe
AudioMath Project (1/4)
- Initiative of the Laboratory of Speech
Processing, Signals and Instrumentation of the
Faculty of Engineering University of Oporto,
Portugal. - This laboratory is dedicated to the research
development of Voice User Interfaces and Speech
Technology for Accessibility Solutions. - Aims
- To build a tool (AudioMath) to operate either as
standalone or integrated in a speech interface
(TTS - text-to-speech) that does - Mathematics Audio Rendering
- Parsing, interpretation and conversion of MathML
into plain text format - Generation of the appropriate prosodic contour
for reading of the math formulas text - An intra-formula browsing device (navigation)
- Recognition and conversion of any other text or
markup elements not directly understandable by
the TTS engine (numerals, abbreviations, )
54AudioMath Speaking Mathematics with MathMLThe
AudioMath Project (2/4)
- The rationale is
- Once the formula is described in MathML or
equivalent technique the basis is there to create
the textual description of the mathematical
formulae and the reading of the resulting text
with the best perceptual and communication
effectiveness. - AudioMath is an ActiveX dynamic link library
(dll) that can be used by any program through
internal calls. - Developed in Perl and for Windows 9x/Me/2K/XP.
However its main code can also be used in
Linux/Unix (since its in Perl...).
55AudioMath Speaking Mathematics with MathMLThe
AudioMath Project (3/4)
- AudioMaths main applications
- Reading of technical and scientific documents
online in an accessible way, with particular
benefit for vision impaired persons. - Teaching or learning how to read mathematical
formulae. - Enhancing general accessibility to computer-based
applications, when applied to a TTS engine.
56AudioMath Speaking Mathematics with MathMLThe
AudioMath Project (4/4)
57AudioMath Speaking Mathematics with MathMLThis
Presentation
- Introduction
- Audio Rendering of MathML
- Mathematical Audio Rendering Tools Overview
- AudioMath
- The AudioMath Project
- AudioMath Architecture
- The AudioMath Process
- Database of Vocabulary and Speech Rules
- Current Status and Future Work
- Final conclusions and remarks
58AudioMath Speaking Mathematics with
MathMLAudioMath Architecture (1/2)
- AudioMath has been built in a modular, extensible
and configurable architecture. - AudioMath contains 6 major conversion modules
- Numerals (conversion of several types of numeric
forms) - Abbreviations (conversion of abbreviations in a
text) - Acronyms (conversion of acronyms in the document)
- Network References (conversion of IPs, emails and
URI/URLs) - Mathematical (conversion of MathML expressions)
- Auto-Discovery (the brain of the operation that
recognizes or identifies elements in the document
and calls the respective conversion modules)
59AudioMath Speaking Mathematics with
MathMLAudioMath Architecture (2/2)
60AudioMath Speaking Mathematics with MathMLThis
Presentation
- Introduction
- Audio Rendering of MathML
- Mathematical Audio Rendering Tools Overview
- AudioMath
- The AudioMath Project
- AudioMath Architecture
- The AudioMath Process
- Text Analysis
- Parsing and Interpreting MathML
- Converting and Speaking Mathematical Contents
- Database of Vocabulary and Speech Rules
- Current Status and Future Work
- Final conclusions and remarks
61AudioMath Speaking Mathematics with MathMLText
Analysis (1/3)
- There are several types of text in a document
- Acronym
- (ex UN United Nations )
- Abbreviation
- (ex EQ. equation nº - number)
- Numeral
- (ex 1.2 1,3 XV 23º 1,333... )
- Web Reference
- (ex hfilipe_at_fe.up.pt )
- Math expression
- (ex ltmathgtltmigtxlt/migtltmogtlt/mogtltmngt3lt/mngtlt/mathgt)
- Special Unicode character or a math glyph.
- (ex eacute represents é)
62AudioMath Speaking Mathematics with MathMLText
Analysis (2/3)
- Steps to follow
- 1. In the case of European Portuguese, convert
all the Unicode into Latin1. - 2. Auto-discovery process, based on regular
expressions methods, that recognizes types of
elements ( numerals, acronyms, ... ) - 3. Calls to the modules that convert the
recognized elements into a full plaintext form. - 4. Go to 2 and repeat until all its converted.
63AudioMath Speaking Mathematics with MathMLText
Analysis (3/3)
- To speed up the process the document should be
divided into blocks of text, splitting the MathML
markup from the rest of the text. - Text processing is strongly based on regular
expressions and databases (for acronyms and
abbreviations). - Ex if (n m/(?(?\-\--\)?0-9)?\.,
?0-9\/igs) / its a percentage number/
- Dictionaries and databases where included inside
the code as hash tables (more speed but less
flexibility in updates). - Supports usermode options.
- Ex user likes to hear spelled-out the decimal
part of decimal numbers - 1.25 one point twenty five or one point two five
64AudioMath Speaking Mathematics with MathMLThis
Presentation
- Introduction
- Audio Rendering of MathML
- Mathematical Audio Rendering Tools Overview
- AudioMath
- The AudioMath Project
- AudioMath Architecture
- The AudioMath Process
- Text Analysis
- Parsing and Interpreting MathML
- Converting and Speaking Mathematical Contents
- Database of Vocabulary and Speech Rules
- Current Status and Future Work
- Final conclusions and remarks
65AudioMath Speaking Mathematics with
MathMLParsing and Interpreting MathML (1/3)
- MathML code is parsed using the XMLParser
module which acts as a SAX parser type
(event-based), supporting encoding ISO-8859-1
(Latin-1) and discarding XML namespaces. - AudioMath works with MathML Presentation Markup
so, a relatively bigger effort and computation
is needed in the interpretation of the
mathematical expressions. - Currently, in AudioMath, interpreting MathML is
- A process of raising flags each time a starting
and ending tag is detected, which allows to know
the history of the markup and to retrieve
information, enabling to understand the structure
of the math expression and do its conversion.
66AudioMath Speaking Mathematics with
MathMLParsing and Interpreting MathML (2/3)
- As the expression is becoming discovered, the
conversion process takes place by calling several
algorithms as well as Unicode and MathML
dictionaries. - AudioMath uses 2 kinds of dictionaries
- MathML entities to Unicode
- Ex InvisibleTimes is converted to U02062
- Unicode to European Portuguese full plaintext
form - Ex U02062 is converted into the portuguese word
vezes(times).
67AudioMath Speaking Mathematics with
MathMLParsing and Interpreting MathML (3/3)
- The conversion to full plaintext form is done
according to a database of vocabulary and speech
rules. - To be seen later
- Once again the bigger the expression, the more
complex the interpretation and conversion process
becomes. Navigational mechanisms to browse the
math expression should be provided (work in
progress). - This browsing must happen both externally
(already seen on the topic Math Formula
Navigation) and internally (to infer the meaning
of the mathematical contents).
68AudioMath Speaking Mathematics with MathMLThis
Presentation
- Introduction
- Audio Rendering of MathML
- Mathematical Audio Rendering Tools Overview
- AudioMath
- The AudioMath Project
- AudioMath Architecture
- The AudioMath Process
- Text Analysis
- Parsing and Interpreting MathML
- Converting and Speaking Mathematical Contents
- Database of Vocabulary and Speech Rules
- Current Status and Future Work
- Final conclusions and remarks
69AudioMath Speaking Mathematics with
MathMLConverting and Speaking Mathematical
Contents (1/2)
- The objective of automatically speaking
mathematical contents has to deal, besides the
non-trivial issues of text generation and
phrasing, with the generation of prosody to
impose over the synthetic speech. - For example, consider the possible readings of
the expression on the side - Square root of a squared plus b squared, end of
radicand. - Square root of power base a exponent two, end of
power, plus power base b exponent two, end of
power, end of radicand - Which of these forms is more correct, not
ambiguous and more efficient?
70AudioMath Speaking Mathematics with
MathMLConverting and Speaking Mathematical
Contents (2/2)
- Do the experience read a math expression
monotonically to someone that is not looking at
it and ask for a written version after the
dictation. - Result if one is not careful enough theyll all
sound much a like and quite ambiguous.
Identification is therefore difficult. - The solution must pass by the adoption of formal
ways of text generation that keep the right
structure information of the formula, combined
with the right prosodic information (f0 contour,
pauses, )
71AudioMath Speaking Mathematics with MathMLThis
Presentation
- Introduction
- Audio Rendering of MathML
- Mathematical Audio Rendering Tools Overview
- AudioMath
- The AudioMath Project
- AudioMath Architecture
- The AudioMath Process
- Database of Vocabulary and Speech Rules
- Current Status and Future Work
- Final conclusions and remarks
72AudioMath Speaking Mathematics with
MathMLDatabase of Vocabulary and Speech Rules
(1/5)
- One of the AudioMaths current tasks is the
definition of a database of vocabulary and speech
rules, for several subsets of math formulae. - These rules and intonation are implemented at
conversion time by tagging the text with prosodic
marks, to command the TTS engine in order to
produce the required pauses and f0 modulations. - Math corpus is being defined and categorized by
different operation types algebra, trigonometry,
integrals, - Each type has its structure analyzed and defined
in a formal plaintext way. - Different readings of the same formula are spoken
and analyzed for prosody and perceiveness. - Pitch patterns and pauses are inferred from
speech.
73AudioMath Speaking Mathematics with
MathMLDatabase of Vocabulary and Speech Rules
(2/5)
Square root of (pause) a squared plus b squared
(pause) end of radicand
Fraction (pause) on top (pause) minus b plus
minus square root of (pause) b square minus four
times a times c (pause) end of radicand (double
pause) on bottom two times a (pause) end of
fraction
74AudioMath Speaking Mathematics with
MathMLDatabase of Vocabulary and Speech Rules
(3/5)
75AudioMath Speaking Mathematics with
MathMLDatabase of Vocabulary and Speech Rules
(4/5)
- Prosody conclusions inferred from speech
waveforms (1/2) - There are 2 distinct types of pauses
- Large pause
- Ex Square root of (pause) a squared plus b
squared (pause) end of radicand - Short pause (optional)
- Ex Square root of (pause) a squared (pause) plus
b squared (pause) end of radicand
76AudioMath Speaking Mathematics with
MathMLDatabase of Vocabulary and Speech Rules
(5/5)
- Prosody conclusions inferred from speech
waveforms (2/2) - There are rising and falling movements of f0 in
the speakers intonation intended to provide
classification of the boundaries introduced by
the pauses - Rising tone used when lower hierarchical level
is starting. Ex root of - Falling tone used when level is ended. Ex b
squared - Falling and Rising tone used to clarify the
smaller separating pause. Ex a squared - Emphatic Falling tone used at the end of the
expression. Ex end of radicand
77AudioMath Speaking Mathematics with MathMLThis
Presentation
- Introduction
- Audio Rendering of MathML
- Mathematical Audio Rendering Tools Overview
- AudioMath
- The AudioMath Project
- AudioMath Architecture
- The AudioMath Process
- Database of Vocabulary and Speech Rules
- Current Status and Future Work
- Final conclusions and remarks
78AudioMath Speaking Mathematics with
MathMLCurrent Status and Future Work (1/3)
- Current Status (1/2)
- Unicode support (MathML and Unicode Dictionary
for European Portuguese) - MathML Presentation Markup tags supported
- Token elements
- ltmogt, ltmigt, ltmngt, ltmtextgt, ltmspacegt, ltmsgt,
ltmglyphgt - General Layout Schemata
- ltmrowgt, ltmsqrtgt, ltmrootgt, ltmfracgt, ltmstylegt,
ltmerrorgt, ltmphantomgt, ltmpaddedgt - Tables and Matrices
- ltmlabeledtrgt, ltmaligngroupgt, ltmalignmarkgt
- Enlivening Expressions
- ltmactiongt, ltnonegt
- MathML Presentation Markup tags partially
supported - ltmsupgt, ltmfencedgt, ltmenclosegt, ltmtablegt, ltmtrgt,
ltmtdgt
79AudioMath Speaking Mathematics with
MathMLCurrent Status and Future Work (2/3)
- Current Status (2/2)
- Also, detects and converts
- Numerals cardinals, ordinals, decimals, romans,
percentages, dates, time, scales, sport results,
fractions, currency, powers, telephones and
engineering formats. - Abbreviations social, currency, chemistry and
physics (on database). - Acronyms several (on database).
- Network references emails, url/uri and ips.
- Browser for functionality test and TTS
integration. - Few studies on mathematical prosody and speech
database rules for mathematics. - Usermode support (preferences on rendering).
- User evaluation is performed in the several
iterations of the AudioMaths development.
80AudioMath Speaking Mathematics with
MathMLCurrent Status and Future Work (3/3)
- Future Work
- Complete the support to the MathML Presentation
Markup. - Add support to the MathML Content Markup.
- Further learning on how to read mathematical
formulae. - Develop modules that support HTML, XHTML, SSML
and others. - Providing mechanisms for navigating inside
mathematical formulae (eventually a special audio
browser). - Adding support for new languages (English,
French, ). - Develop the study on the prosody of reading
mathematical formulae. - We can only see a short distance ahead, but we
can see plenty there that needs to be done. Alan
Turing.
81AudioMath Speaking Mathematics with MathMLThis
Presentation
- Introduction
- Audio Rendering of MathML
- Mathematical Audio Rendering Tools Overview
- AudioMath
- Final conclusions and remarks
- Final conclusions
- Appendixes
- Suggested Readings
- Web References
- Knowledge Requirements
82AudioMath Speaking Mathematics with MathMLFinal
Conclusions (1/2)
- About speaking mathematics
- Reading mathematics is very different than
reading a text. - Navigational mechanisms should be provided.
- More studies on prosody are needed.
- There are still no standards on how to read math
- About using MathML to speak mathematics
- MathML is becoming a standard
- Content markup should be better for audio
rendering, however Presentation markup is more
widely used. - If Presentation markup is used more computation
efforts need to be made to the interpretation and
transformation to audio rendering. - Not all elements and attributes of MathML need to
be used on audio rendering.
83AudioMath Speaking Mathematics with MathMLFinal
Conclusions (2/2)
- About AudioMath
- Its an accessibility tool (audio rendering of
MathML and numerals, abbreviations, acronyms and
network references). - Uses special dictionaries to adapt to several
situations MathML entities and Unicode entities. - Built in a modular, extensible and configurable
architecture. - Usermode options are supported.
- AudioMath project studies the mathematical
prosody and builds a database for speech rules.
84AudioMath Speaking Mathematics with MathMLThis
Presentation
- Introduction
- Audio Rendering of MathML
- Mathematical Audio Rendering Tools Overview
- AudioMath
- Final conclusions and remarks
- Final conclusions
- Appendixes
- Suggested Readings
- Web References
- Knowledge Requirements
85AudioMath Speaking Mathematics with
MathMLSuggested Readings (1/2)
- AudioMath related
- Ferreira, Helder et al. Enhancing the
Accessibility of Mathematics for Blind People
The AudioMath Project. ICCHP04. - Ferreira, Helder et al. Audio Rendering of
Mathematical Formulae using MathML and AudioMath.
UI4ALL04. - Ferreira, Helder. Contribute to the automatic
reading of scientific documents (portuguese
version only). Final year project 2003. - Other publications
- Gillian Douglas et al. Cognitive Analysis of
Equation Reading Application to the Development
of the Math Genie. ICCHP04. - Fitzpatrick D. et al. Multi-modal Mathematics
Conveying Math Using Synthetic Speech and Speech
Recognition. ICCHP04.
86AudioMath Speaking Mathematics with
MathMLSuggested Readings (2/2)
-
- Stöger, B. et al. Mathematical Working
Environment for the Blind Motivation and Basic
Ideas. ICCHP04. - Karshmer, A. et al. How well can we read
equations to blind mathematics students some
answers from psychology. HCII03. - Rotard, M. et al. Access to Mathematical
Expressions in MathML for the Blind. HCII03. - Stevens, R. Principles for the Design of Auditory
Interfaces to Present Complex Information to
Blind People. PhD Thesis 1996. - Raman T.V. Audio System For Technical Readings.
PhD Dissertation 1994.
87AudioMath Speaking Mathematics with MathMLThis
Presentation
- Introduction
- Audio Rendering of MathML
- Mathematical Audio Rendering Tools Overview
- AudioMath
- Final conclusions and remarks
- Final conclusions
- Appendixes
- Suggested Readings
- Web References
- Knowledge Requirements
88AudioMath Speaking Mathematics with MathMLWeb
References (1/2)
- MathML, W3C
- http//www.w3c.org/Math/
- Accessible Mathematics
- http//www.latrobe.edu.au/webaccess/maths.html
- Guidelines for Topic Specific Accessibility
(including Mathematics) - http//ncam.wgbh.org/salt/guidelines/sec11.html
- Mathematical Access for Technology and Science
for Visually Disabled People - http//www.cs.york.ac.uk/maths/
89AudioMath Speaking Mathematics with MathMLWeb
References (2/2)
- Math Speak Project
- http//www.rit.edu/easi/easisem/talkmath.htm
- Math Computerized, Spoken and Braille
- http//www.rit.edu/7Eeasi/math.htm
- Mathematics for Computer Generated Spoken
Documents - http//www.cs.cornell.edu/Info/People/raman/aster/
demo.html - The AudioMath Project
- http//lpf-esi.fe.up.pt/audiomath
90AudioMath Speaking Mathematics with MathMLThis
Presentation
- Introduction
- Audio Rendering of MathML
- Mathematical Audio Rendering Tools Overview
- AudioMath
- Final conclusions and remarks
- Final conclusions
- Appendixes
- Suggested Readings
- Web References
- Knowledge Requirements
91AudioMath Speaking Mathematics with
MathMLKnowledge Requirements
- To develop any application that speaks MathML,
you might need - Knowledge about XML and related technologies
- Concepts in text processing, FSM, regular
expressions - Vocabulary for the math expressions
- TTS concepts
- Prosody database
- Lots of patience and hard work!! ?
92AudioMath Speaking Mathematics with MathMLThe
end
- Thank you for your attention!
- Any questions? ?