Title: Methods for Improving Readability of Speech Recognition Transcripts
1Methods for Improving the Readability of Speech
Recognition Transcripts
John McCoeyDepartment of Computer
ScienceVillanova University
Proposed Improvement to VUST System
The Pause Recognition software is our proposed
addition to the VUST System created by Kheir and
Way 1. The VUST System adds 2 pieces of
software, DiBS and a Training Engine, to enhance
the results obtained from the Microsoft Speech
Recognition Engine. The goal of the added Pause
Recognition software is to format the resulting
transcript of the VUST System into paragraph
form, based on the pauses dictated. In order to
do this, we need to know more about the way a
person speaks, and how often and long pauses are
taken. This would require research into the
average length of a speakers pauses in various
situations, and what we could determine from
those pauses. We propose to research various
speakers and implement a system which determines
a relative ratio of the difference in pause
lengths, rather than an inaccurate based on
seconds. However, every speaker will pause for
different lengths of time, depending on how fast
or slow they speak, so some of our information
about relative pause length must come from the
supplied Training Engine. In order for this
software to work, we must modify the DiBS
software and Training Engine.
The DiBS Software from the VUST
System allows for the user to add new
words to the searchable library that the
MSRE uses when looking for a word spoken.
This is useful for adding newly created words,
which often appear in computer
The VUST Training Engine allows for a user
to read sentences into a microphone
for thirty to forty-five
minutes before attempting to
run the Recognition Engine.
This allows the MSRE to adapt
to the sound of ones voice and any kind
of accent that would hinder the system
from accurately translating the
dialog. The To incorporate
the new Pause Recognition
Software, the Training Engine must be able
to recognize pauses between words as
well as the words themselves.
By modifying the Training
Engine for this task, we can then
use information gathered from the training
session incorporated with research
done on average pauses times
to determine the length of a
short pause or a long pause.
Because everyone speaks at a different
pace, this information is put into the
profile and used by the MSRE to
input pauses into the
transcript. The information
gathered from the time of the
pauses can then be used to
determine the end of a
sentence or paragraph.
science, as well as allowing for the
ability to narrow the search of the engine
to a select amount of words which have been
added. For our pause recognition software to
work, we must create a new group of symbols
which will represent pauses and their
approximate length. When the proposed Pause
Recognition runs, it will replace these
symbols with punctuation and create
sentences and paragraphs. In addition,
long pauses may be indicated to the
reader to increase the overall readability of
the document.
? Example screenshot of DiBS 1 adding a custom
dictionary
VUST System 1 withPause Recognition
Other Areas of Improvement
Save a Personal profile to avoid re-training and
recognize speaker changes
Confidence Percentages or Phonetic Spelling of
Unknown words
Onomatopoeiaon-uh-mat-uh-pee-uh, -mah-tuh-
Accessibility of Classroom use for Disabled
Students
Different Language Dictionaries / Translation
1 Richard Kheir and Thomas Way Inclusion of
Deaf Students in Computer Science Classes using
Real-Time Speech Transcription. ITiCSE07
Applied Computing Technology Laboratory,
Department of Computing Sciences, Villanova
University 2007.