AudioSense: A Simulation - PowerPoint PPT Presentation

About This Presentation

Title:

AudioSense: A Simulation

Description:

... sound. Simulated by an object's interaction property ... Sound properties ... section to ignore certain sound properties. Volume/amplitude ... – PowerPoint PPT presentation

Number of Views:53

Avg rating:3.0/5.0

Slides: 46

Provided by: Allan121

Learn more at: https://www.evl.uic.edu

Category:

more less

Transcript and Presenter's Notes

Title: AudioSense: A Simulation

1
AudioSense A Simulation

Progress Report
EECS 578
Allan Spale

2
Background of Concept

Taking the train home and listening to the sounds
around me
How would deaf people be able to perceive the
environment?
What assistance would be useful in helping people
adapt to the environment?

3
Project Goals

Develop a CAVE application that will simulate
aspects of audio perception
Display the text of speaking objects in space
Display the description text of non-speaking
objects in space
Display visual cues of multiple sound sources
Allow the user to selectively listen to different
sound sources

4
Topics in the Project

Augmented reality
Illustrated by objects in a virtual environment
3D sound
Simulated by an objects interaction property
Speech recognition
Simulated by text near the object
Will remain static during simulation
Virtual reality / CAVE
Method for presenting the project
Not discussed in this presentation

5
Augmented Reality

Definition
provides means of intuitive information
presentation for enhancing situational awareness
and perception by exploiting the natural and
familiar human interaction modalities with the
environment.
-- Behringer et al. 1999

6
Augmented RealityDevice Diagnostics

Architecture components aid in performing a
diagnostic tests
Computer vision used to track the object in space
Speech recognition (command-style) used for user
interface
3D graphics (wireframe and shaded objects) to
illustrate an objects internal structure
3D audio emits from an item that allows the user
to find the location within the object

7
Augmented Reality

Device
diagnostics

8
Augmented Reality

Device
diagnostics

9
Augmented RealityDevice Diagnostics

Summary
Providing 3D graphics and sound helps the user
better diagnose items
Might also want text information on the display
Tracking methodology still needs improvement
Speech recognition of commands could be expanded
to include annotation
Utilize IP connection to distribute computing
power from the wearable computer

10
Augmented RealityMultimedia Presentations in
the Real World

Mobile Augmented Reality System (MARS)
Tracking performed by Global Positioning System
(GPS) and another device
Display is a see-through and head-mounted
Interaction based on location and gaze
Additional interaction provided by hand-held
device

11
Augmented RealityMultimedia Presentations in
the Real World

System overview
Selection occurs through proximity or gaze
direction followed by a menu system
Information presentation
Video (on hand-held deivce) or images accompanied
by narration (on head-mounted display)
Virtual reality (for places that are not able to
be visited)
Augmented reality (illustrate where items were)

12
Augmented Reality

Multimedia
presentations
in the
real world

13
Augmented Reality

Multimedia
presentations
in the
real world

14
Augmented RealityMultimedia Presentations in
the Real World

Conclusions
Current system is too heavy and visually
undesirable
Might want to make hand-held display a palm-top
computer
Permit authoring of content
Create a collaboration between indoor and outdoor
system users

15
3D SoundAudio-only Web Browsing

Must overcome difficulties with utilizing 3D
sound
X axis sounds identifiable, Y and Z axes sounds
are not identifiable
Need exists to create structure in audio rendered
web pages
Document reading appears spatially from left to
right in an adequate amount of time
Utilize earcons and selective listening
Provide meta-content for quick document overview

16
3D Sound

Audio-only
Web browsing

17
3D SoundAudio-only Web Browsing

Future work
Improve link information that extends beyond web
page title and time duration
Benefits of auditory browsing aids
Improved comprehension
Better browsing experience for visually impaired
and sited users

18
3D SoundInteractive 3D Sound Hyperstories

Hyperstories
Story occurring in a hypermedia context
Forms a nested context model
World objects can be passive, active, static, or
dynamic

19
3D SoundInteractive 3D Sound Hyperstories

AudioDoom
Like computer game of Doom, but different
All world objects represented with sound
Sound represented in a volume almost parallel
to the users eyes
User interacts with the world objects using an
ultrasonic joystick with haptic functionality
Organized by partitioned spaces

20
3D Sound

Interactive
3D sound
hyperstories

21
3D Sound

Interactive
3D sound
hyperstories

22
3D SoundInteractive 3D Sound Hyperstories

Despite elapsed time between sessions, users
remembered the world structure well
Authors illustrate the possibility of
rendering a spatial navigable structure by
using only spatialized sound.
Opens the possibilities for educational software
for the blind within the hyperstory context

23
Speech RecognitionMedia retrieval and indexing

Problems with media retrieval and indexing
Lots of media being generated too costly and
time-consuming to index manually
Ideal system design
Speaker independence
Noisy-recording environment capability
Open vocabulary

24
Speech RecognitionMedia retrieval and indexing

Using Hidden Markov Models the system achieved
the results in Table 1
To improve results, using string matching
techniques will help overcome recognition stream
errors

25
Speech RecognitionMedia retrieval and indexing

String matching strategy
Develop the search term
Divide the recognition stream into a set of
sub-strings
Implement an initial filter process
Identify edit operations for remaining
sub-strings in the recognition stream
Calculate the similarity measure for the search
term and matched strings

26
Speech Recognition

Media retrieval and indexing

27
Speech RecognitionMedia retrieval and indexing

Results of implementing the string matching
strategy
Permitting more operations improved recall
performance but degraded precision performance
Despite low performance rates, a system
performing these tasks will be commercially
viable

28
Speech RecognitionContinuous Speech Recognition

Problems with continuous speech recognition
Has unpredictable errors that are unlike other
predictable user input errors
The absence of context aids makes recognition
difficult for the computer
Speech user interfaces are still in a
developmental stage and will improve over time

29
Speech RecognitionContinuous Speech Recognition

Two modes
Keyboard-mouse and speech
Two tasks
Composition and transcription
Results
Keyboard-mouse tasks were faster and more
efficient than speech tasks

30
Speech RecognitionContinuous Speech Recognition

Correction methods
Two general correction methods
Inline correction, separate proofreading
Speech inline correction methods
Select text and reenter, delete text and reenter,
use correction box, correct problems during
correction

31
Speech Recognition

Continuous speech recognition

32
Speech Recognition

Continuous speech recognition

33
Speech RecognitionContinuous Speech Recognition

Discussion of errors
Inline correction is preferred by users
regardless of modality
Proofreading had increased usage with speech
because of unpredictable system errors
Keyboard-mouse involved deleting and reentering
the word
Despite ability to correct inline with speech,
errors typically occurred during correction
Dialog boxes used as a last resort

34
Speech RecognitionContinuous Speech Recognition

Discussion of results
Users still do not feel that they can be
productive using a speech interface for
continuous recognition
More studies must be conducted to improve the
speech interface for users

35
Project Implementation

Write a CAVE application using YG
3D objects simulate sound producing objects
No speech recognition will occur since predefined
text will be attached to each object
Objects will move in space
Objects will not always produce sound
Objects may not be in the line of sight

36
Project Implementation

Write a CAVE application using YG
Sound location
Show directional vectors for each object that
emits a sound
Longer the vector, the farther away the object is
from the user
X, Y will use arrowheads, Z will use dot / "X"
symbol
Dot is for an object behind the user, "X" symbol
is for an object in front of the user
Only visible if sound can be heard by the user

37
Project Implementation

Write a CAVE application using YG
Sound properties
Represented using a square
Size represents volume/amplitude (probably will
not consider distance that affects volume)
Color represents pitch/frequency
Only visible if sound can be heard by the user

38
Project Implementation

Write a CAVE application using YG
Simulate cocktail party effect
Allow user to enlarge text from an object that is
far away
Provide configuration section to ignore certain
sound properties
Volume/amplitude
Pitch/frequency

39
Project Tasks Completed

Basic project design
Have read some documentation about YG
Tested functionality of YG in my account
Established contacts with people that have
programmed CAVE applications using YG
Will provide 3D models and code that demonstrates
some functionalities of YG features upon request
Will help with answering questions and
demonstrating and explaining features of YG

40
Project Timeline

Week of March 25
Practice modifying existing YG programs
Collect needed 3D models for program
Week of April 1
Code objects and their accompanying text
Implement movement patterns for objects

41
Project Timeline

Week of April 8
Attempt to turn on and off the sound of objects
Work with interaction properties of objects that
will determine visualizing sound properties
Week of April 15
Continue working on visualizing sound properties
Work on enlarging/reducing text of an object

42
Project Timeline

Week of April 22
Create simple sound filtering menus
Test program in CAVE
EXAM WEEK Week of April 29
Practice presentation
Present project

43
Bibliography

Behringer, R., Chen, S., Sundareswaran, V., Wang,
K., and Vassiliou, M. (1998). A Novel Interface
for Device Diagnostics Using Speech Recognition,
Augmented Reality Visualization, and 3D Audio
Auralization, in Proceedings of IEEE
International Conference on Multimedia Computing
and Systems Vol I, Institute of Electrical and
Electronics Engineers, Inc., 427-432.
Goose, S. and Moller, C. (1999). A 3D Audio
Only Interactive Web Browser Using
Spatialization to Convey Hypermedia Document
Structure, in Proceedings of the seventh ACM
international conference on Multimedia (Orlando
FL, October 1999), ACM Press, 363-371.

44
Bibliography

Hollerer, T., Feiner, S., and Pavlik, J. (1998).
Situated Documentaries Embedding Multimedia
Presentations in the Real World, in Proceedings
of the 3rd International Symposium on Wearable
Computers (October 1999, San Francisco CA),
Institute of Electrical and Electronics
Engineers, Inc., 1-8.
Karat, C.-M., Halverson, C., Horn, D., and Karat,
J. (1999). Patterns of Entry and Correction in
Large Vocabulary Continuous Speech Recognition
Systems, in CHI '99, Proceeding of the CHI 99
conference on Human factors in computing systems
the CHI is the limit (Pittsburgh PA, May 1999),
ACM Press, 568-575.

45
Bibliography

Lumbreras, M., Sanchez, J. (1999). Interactive 3D
Sound Hyperstories for Blind Children, in CHI
'99, Proceeding of the CHI 99 conference on Human
factors in computing systems the CHI is the
limit (Pittsburgh PA, May 1999), ACM Press,
318-325.
Robetison, J., Wong, W. Y., Chung, C., Kim, D. K.
(1998). Automatic Speech Recognition for
Generalised Time Based Media Retrieval and
Indexing, in Proceedings of the sixth ACM
international conference on Multimedia (Bristol
UK, September 1998), ACM Press, 241-246.