Title: AudioSense: A Simulation
1AudioSense A Simulation
- Progress Report
- EECS 578
- Allan Spale
2Background of Concept
- Taking the train home and listening to the sounds
around me - How would deaf people be able to perceive the
environment? - What assistance would be useful in helping people
adapt to the environment?
3Project Goals
- Develop a CAVE application that will simulate
aspects of audio perception - Display the text of speaking objects in space
- Display the description text of non-speaking
objects in space - Display visual cues of multiple sound sources
- Allow the user to selectively listen to different
sound sources
4Topics in the Project
- Augmented reality
- Illustrated by objects in a virtual environment
- 3D sound
- Simulated by an objects interaction property
- Speech recognition
- Simulated by text near the object
- Will remain static during simulation
- Virtual reality / CAVE
- Method for presenting the project
- Not discussed in this presentation
5Augmented Reality
- Definition
- provides means of intuitive information
presentation for enhancing situational awareness
and perception by exploiting the natural and
familiar human interaction modalities with the
environment. - -- Behringer et al. 1999
6Augmented RealityDevice Diagnostics
- Architecture components aid in performing a
diagnostic tests - Computer vision used to track the object in space
- Speech recognition (command-style) used for user
interface - 3D graphics (wireframe and shaded objects) to
illustrate an objects internal structure - 3D audio emits from an item that allows the user
to find the location within the object
7Augmented Reality
8Augmented Reality
9Augmented RealityDevice Diagnostics
- Summary
- Providing 3D graphics and sound helps the user
better diagnose items - Might also want text information on the display
- Tracking methodology still needs improvement
- Speech recognition of commands could be expanded
to include annotation - Utilize IP connection to distribute computing
power from the wearable computer
10Augmented RealityMultimedia Presentations in
the Real World
- Mobile Augmented Reality System (MARS)
- Tracking performed by Global Positioning System
(GPS) and another device - Display is a see-through and head-mounted
- Interaction based on location and gaze
- Additional interaction provided by hand-held
device
11Augmented RealityMultimedia Presentations in
the Real World
- System overview
- Selection occurs through proximity or gaze
direction followed by a menu system - Information presentation
- Video (on hand-held deivce) or images accompanied
by narration (on head-mounted display) - Virtual reality (for places that are not able to
be visited) - Augmented reality (illustrate where items were)
12Augmented Reality
- Multimedia
- presentations
- in the
- real world
13Augmented Reality
- Multimedia
- presentations
- in the
- real world
14Augmented RealityMultimedia Presentations in
the Real World
- Conclusions
- Current system is too heavy and visually
undesirable - Might want to make hand-held display a palm-top
computer - Permit authoring of content
- Create a collaboration between indoor and outdoor
system users
153D SoundAudio-only Web Browsing
- Must overcome difficulties with utilizing 3D
sound - X axis sounds identifiable, Y and Z axes sounds
are not identifiable - Need exists to create structure in audio rendered
web pages - Document reading appears spatially from left to
right in an adequate amount of time - Utilize earcons and selective listening
- Provide meta-content for quick document overview
163D Sound
173D SoundAudio-only Web Browsing
- Future work
- Improve link information that extends beyond web
page title and time duration - Benefits of auditory browsing aids
- Improved comprehension
- Better browsing experience for visually impaired
and sited users
183D SoundInteractive 3D Sound Hyperstories
- Hyperstories
- Story occurring in a hypermedia context
- Forms a nested context model
- World objects can be passive, active, static, or
dynamic
193D SoundInteractive 3D Sound Hyperstories
- AudioDoom
- Like computer game of Doom, but different
- All world objects represented with sound
- Sound represented in a volume almost parallel
to the users eyes - User interacts with the world objects using an
ultrasonic joystick with haptic functionality - Organized by partitioned spaces
203D Sound
- Interactive
- 3D sound
- hyperstories
213D Sound
- Interactive
- 3D sound
- hyperstories
223D SoundInteractive 3D Sound Hyperstories
- Despite elapsed time between sessions, users
remembered the world structure well - Authors illustrate the possibility of
rendering a spatial navigable structure by
using only spatialized sound. - Opens the possibilities for educational software
for the blind within the hyperstory context
23Speech RecognitionMedia retrieval and indexing
- Problems with media retrieval and indexing
- Lots of media being generated too costly and
time-consuming to index manually - Ideal system design
- Speaker independence
- Noisy-recording environment capability
- Open vocabulary
24Speech RecognitionMedia retrieval and indexing
- Using Hidden Markov Models the system achieved
the results in Table 1 - To improve results, using string matching
techniques will help overcome recognition stream
errors
25Speech RecognitionMedia retrieval and indexing
- String matching strategy
- Develop the search term
- Divide the recognition stream into a set of
sub-strings - Implement an initial filter process
- Identify edit operations for remaining
sub-strings in the recognition stream - Calculate the similarity measure for the search
term and matched strings
26Speech Recognition
- Media retrieval and indexing
27Speech RecognitionMedia retrieval and indexing
- Results of implementing the string matching
strategy - Permitting more operations improved recall
performance but degraded precision performance - Despite low performance rates, a system
performing these tasks will be commercially
viable
28Speech RecognitionContinuous Speech Recognition
- Problems with continuous speech recognition
- Has unpredictable errors that are unlike other
predictable user input errors - The absence of context aids makes recognition
difficult for the computer - Speech user interfaces are still in a
developmental stage and will improve over time
29Speech RecognitionContinuous Speech Recognition
- Two modes
- Keyboard-mouse and speech
- Two tasks
- Composition and transcription
- Results
- Keyboard-mouse tasks were faster and more
efficient than speech tasks
30Speech RecognitionContinuous Speech Recognition
- Correction methods
- Two general correction methods
- Inline correction, separate proofreading
- Speech inline correction methods
- Select text and reenter, delete text and reenter,
use correction box, correct problems during
correction
31Speech Recognition
- Continuous speech recognition
32Speech Recognition
- Continuous speech recognition
33Speech RecognitionContinuous Speech Recognition
- Discussion of errors
- Inline correction is preferred by users
regardless of modality - Proofreading had increased usage with speech
because of unpredictable system errors - Keyboard-mouse involved deleting and reentering
the word - Despite ability to correct inline with speech,
errors typically occurred during correction - Dialog boxes used as a last resort
34Speech RecognitionContinuous Speech Recognition
- Discussion of results
- Users still do not feel that they can be
productive using a speech interface for
continuous recognition - More studies must be conducted to improve the
speech interface for users
35Project Implementation
- Write a CAVE application using YG
- 3D objects simulate sound producing objects
- No speech recognition will occur since predefined
text will be attached to each object - Objects will move in space
- Objects will not always produce sound
- Objects may not be in the line of sight
36Project Implementation
- Write a CAVE application using YG
- Sound location
- Show directional vectors for each object that
emits a sound - Longer the vector, the farther away the object is
from the user - X, Y will use arrowheads, Z will use dot / "X"
symbol - Dot is for an object behind the user, "X" symbol
is for an object in front of the user - Only visible if sound can be heard by the user
37Project Implementation
- Write a CAVE application using YG
- Sound properties
- Represented using a square
- Size represents volume/amplitude (probably will
not consider distance that affects volume) - Color represents pitch/frequency
- Only visible if sound can be heard by the user
38Project Implementation
- Write a CAVE application using YG
- Simulate cocktail party effect
- Allow user to enlarge text from an object that is
far away - Provide configuration section to ignore certain
sound properties - Volume/amplitude
- Pitch/frequency
39Project Tasks Completed
- Basic project design
- Have read some documentation about YG
- Tested functionality of YG in my account
- Established contacts with people that have
programmed CAVE applications using YG - Will provide 3D models and code that demonstrates
some functionalities of YG features upon request - Will help with answering questions and
demonstrating and explaining features of YG
40Project Timeline
- Week of March 25
- Practice modifying existing YG programs
- Collect needed 3D models for program
- Week of April 1
- Code objects and their accompanying text
- Implement movement patterns for objects
41Project Timeline
- Week of April 8
- Attempt to turn on and off the sound of objects
- Work with interaction properties of objects that
will determine visualizing sound properties - Week of April 15
- Continue working on visualizing sound properties
- Work on enlarging/reducing text of an object
42Project Timeline
- Week of April 22
- Create simple sound filtering menus
- Test program in CAVE
- EXAM WEEK Week of April 29
- Practice presentation
- Present project
43Bibliography
- Behringer, R., Chen, S., Sundareswaran, V., Wang,
K., and Vassiliou, M. (1998). A Novel Interface
for Device Diagnostics Using Speech Recognition,
Augmented Reality Visualization, and 3D Audio
Auralization, in Proceedings of IEEE
International Conference on Multimedia Computing
and Systems Vol I, Institute of Electrical and
Electronics Engineers, Inc., 427-432. - Goose, S. and Moller, C. (1999). A 3D Audio
Only Interactive Web Browser Using
Spatialization to Convey Hypermedia Document
Structure, in Proceedings of the seventh ACM
international conference on Multimedia (Orlando
FL, October 1999), ACM Press, 363-371.
44Bibliography
- Hollerer, T., Feiner, S., and Pavlik, J. (1998).
Situated Documentaries Embedding Multimedia
Presentations in the Real World, in Proceedings
of the 3rd International Symposium on Wearable
Computers (October 1999, San Francisco CA),
Institute of Electrical and Electronics
Engineers, Inc., 1-8. - Karat, C.-M., Halverson, C., Horn, D., and Karat,
J. (1999). Patterns of Entry and Correction in
Large Vocabulary Continuous Speech Recognition
Systems, in CHI '99, Proceeding of the CHI 99
conference on Human factors in computing systems
the CHI is the limit (Pittsburgh PA, May 1999),
ACM Press, 568-575.
45Bibliography
- Lumbreras, M., Sanchez, J. (1999). Interactive 3D
Sound Hyperstories for Blind Children, in CHI
'99, Proceeding of the CHI 99 conference on Human
factors in computing systems the CHI is the
limit (Pittsburgh PA, May 1999), ACM Press,
318-325. - Robetison, J., Wong, W. Y., Chung, C., Kim, D. K.
(1998). Automatic Speech Recognition for
Generalised Time Based Media Retrieval and
Indexing, in Proceedings of the sixth ACM
international conference on Multimedia (Bristol
UK, September 1998), ACM Press, 241-246.