Title: Interactive Audio
1Interactive Audio
- Sound, Waves, the Ear
- 3D audio
2Overview
- Fundamentals of Sound
- Psychoacoustics
- Interactive Audio
- Applications
3What is sound?
- Sound is the sensation perceived by the sense of
hearing - Audio is acoustic, mechanical, or electrical
frequencies corresponding to normally audible
sound waves
4Dual Nature of Sound
- Transfer of sound and physical stimulation of ear
- Physiological and psychological processing in ear
and brain (psychoacoustics)
5Transmission of Sound
- Requires a medium with elasticity and inertia
(air, water, steel, etc.) - Movements of air molecules result in the
propagation of a sound wave
6Particle Motion
7Longitudinal Motion of Air
8Wavefronts and Rays
9Reflection of Sound
10Absorption of Sound
- Some materials readily absorb the energy of a
sound wave - Example carpet, curtains at a movie theater
11Refraction of Sound
12Refraction of Sound
13Diffusion of Sound
- Not analogous to diffusion of light
- Naturally occurring diffusions of sounds
typically affect only a small subset of audible
frequencies - Nearly full diffusion of sound requires a
reflection phase grating (Schroeder Diffuser)
14The Inverse-Square Law (Attenuation)
I is the sound intensity in W/cm2 W is the sound
power of the source in W r is the distance from
the source in cm
15Psychoacoustics
- Physiological Interactions with audio
- Psychological processing
16Ear Anatomy
17Idealized Ear
18Mechanical Model of Middle Ear
19The Skull
- Occludes wavelengths small relative to the
skull - Causes diffraction around the head (helps amplify
sounds) - Wavelengths much larger than the skull are not
affected (explains how low frequencies are not
directional)
20The Pinna
21The Pinna
- Directs sound into the ear
- Provide cues which indicate sound direction
22Importance of the Pinna
23Ear Canal
- 0.7cm diam. And 3cm long
- Amplifies sound at quarter wavelength resonant
frequency (3kHz)
24Ear Canal and Skull
- (A) Dark line ear canal only
- (B) Dashed line ear canal and skull diffraction
25Middle Ear
- Eardrum vibrates from sound pressure changes
- Ossicles transfer vibration to the oval window
- Impedance difference of air and inner ear fluid
is matched by ratio of surface area of eardrum
and surface area of oval window
26Inner Ear
27The Cochlea
- Mechanical-to-electrical transducer
- Frequency-selective analyzer
- Tectorial and Basilar membranes rub together to
stimulate Hair Cells
28Place Theory
29Place Theory
- The position of maximum vibration of the basilar
membrane corresponds to the perceived pitch of
pure tones - Each hair cell and each nerve fiber has very
sharp bandpass characteristics
30Auditory Area (20Hz-20kHz)
31Spatial Hearing
- Ability to determine direction and distance from
a sound source - Not fully understood process
- However, some cues have been identified as useful
32The Duplex Theory of Localization
- Interaural Intensity Differences (IIDs)
- Interaural Arrival-Time Differences (ITDs)
33Interaural Intensity Difference
- The skull produces a sound shadow
- Intensity difference results from one ear being
shadowed and the other not - The IID does not apply to frequencies below
1000Hz (waves similar or larger than size of
head) - Sound shadowing can result in up to 20dB drops
for frequencies 6000Hz - The Inverse-Square Law can also effect intensity
34Interaural Intensity Difference
35Interaural Arrival-Time Difference
- Perception of phase difference between ears
caused by arrival-time delay (ITD) - Ear closest to sound source hears the sound
before the other ear
36Interaural Arrival-Time Difference
37Cones of Confusion
- Binaural difference cues (IIDs and ITDs) result
in a locus of points for which measurements will
be the same - Results in ambiguity in the determination of
sound source position
38Cones of Confusion
39How do humans resolve the Cones of Confusion
problem?
- Cues used for localization are embodied in the
free-field to the eardrum - The free-field is affected by sound shadowing
from head and torso as well as diffractions from
the pinna
40Pinnas Effect On The Free-field
41Head-related Transfer Function (HRTF)
- The acoustic transfer function between a point in
space and the eardrum of the listener - Encompasses all free-field effects
42HRTF effect on IID
43Monaural and Dynamic Cues
- Spectral cues
- Distance cues
- Direct-to-reverberant energy ratio
- High-to-low frequency energy ratio
- Head rotation or tilt
44Spectral Cues
- Comparison of a known source spectrum with
received spectrum - If spectrum is not known cues can still be
obtained by assuming spectrum is locally flat (or
constant slope)
45Pinnas Effect on Spectrum
46Distance Cues
- Variation of signal level with distance
(attenuation) - Useful only in regards to changes in distance or
if the sound source has a known signal level
47Direct-to-Reverberant Energy Ratio
- Results from observation that reverberation level
is constant over position in an enclosed space - But direct sound energy level decreases with
increasing source-to-listener distance
48High-to-Low Frequency Energy Ratio
- Observation that air attenuates high frequencies
more rapidly than low frequencies over distance
49Head Rotation or Tilt
- Rotation or tilt can alter interaural spectrum in
predictable manner - Can resolve positional ambiguities on a cone of
confusion
50The Haas (or Precedence) Effect
- The perceptual weighting of binaural cues of the
first arriving sound over reflections of the same
sound - Generally reveals true location of sound source
while filtering out contradictory reflections - Hypothesized to be important from evolutionary
standpoint
51Interactive Audio
- Virtual Sound Space
- Facilitate the perception of monaural, binaural,
and dynamic cues within the virtual environment - Model the virtual sound space in real-time
52Digital Recording
- Audio can be digitized and processed on a
computer - Digital formats have frequency and dynamic range
limitations
53Digital Problems
- Current formats do not completely cover the full
frequency range of human hearing (especially low
frequencies) - Representing 0-120dB would require too many bits!
54Review
- Distance cues (attenuation)
- Direct-to-reverberant energy ratio
- High-to-low frequency energy ratio
- Doppler Effect
- IID, ITD
- Spectral cues (effects from head, pinna)
55Attenuation
- Inverse-Square Law
- Overkill
- Sounds fall off too fast
- Solution Add an ambient term (just like in
graphics)
56Static Attenuation
- Set sample volume based on distance
- Volume level is only calculated at beginning of
sample - Low CPU usage
- Best for short duration samples
- Bad for long duration samples
57Dynamic Attenuation
- Sample volume based on distance
- Volume level recalculated every frame
- Good for long duration samples
- Higher CPU usage (3 multiplies every frame per
sample) - Temporal Aliasing
58Temporal Aliasing
- You will hear Stair Stepping or discrete volume
levels as attenuation is recalculated - Solution Increase update rate
- Rule of Thumb At least 20Hz (twice as much as
needed for VR graphics) - Some cues more susceptible than others
59Stereo Attenuation
- Set sample volume based on distance per channel
(left and right) - Even higher CPU usage (3 multiplies
trigonometry per frame, per sample) - Gross approximation of IID
60Stereo Attenuation
61Multiple Channel Audio
- More than 2 speakers
- Typically oriented in a horizontal plane around
the user - Usually 4 or 5 directional speakers (Surround
Sound or Dolby Digital) - Good for directional cues
- Expensive to calculate (probably need hardware
supportespecially for Surround Sound or Dolby
Digital)
62Stereo Extenders
- Processing techniques for increasing stereo
spread - Processed after stereo attenuation is calculated
(DSP inside speakers usually) - Example QSound
63Stereo Extenders
64Solution to the Dynamic Range Problem
- Assume that an individual sound will not have
much dynamic range - Scale the attenuation function to fit a min and
max distance
65Solution to the Dynamic Range Problem
66Sound Source With Limited Dynamic Range
67Modeling Interaural Arrival-Time Difference
- Want to introduce phase difference between left
and right ear - PROBLEM left ear must only hear what was meant
for left ear same for right ear!
68How to control what ears hear?
- Easy solution Head phones
- Hard solution Cross-talk cancellation
69Headphone Solution
- Precise control of what each ear hears
- Good for VR (immersive)
- Not good for multi-user VR (CAVE)
- Cumbersome
- Need to track users head for proper HRTF
calculations - If using HRTF, ear buds are ideal (remove effect
of pinna)
70Cross-talk Cancellation
- Left speaker plays left channel and the
cancellation of the right channel (same for
right) - Results in a sweet spot where left ear will only
hear left channel and right ear will only hear
right channel
71Cross-talk Cancellation
72Problems with Cross-Talk Cancellation
- Sweet Spot is a single user experience
- Implementation requires intimate knowledge of
advanced calculus and Fourier Analysis - Speakers must be accurately placed and oriented
- Needs dedicated DSP hardware
73Calculating ITD Effects
- Determine distance from sound source to each ear
- Simple physics to determine arrival time of sound
to each ear - Heavy Duty math required to smoothly interpolate
phase changes
74Pinna, Head, and Shoulders
- Determine HRTF from spectral analysis of
head-related impulse response (HRIR) - Filter sounds by scaling intensities at each
frequency - Definitely need dedicated hardware
75Determining the HRTF from head-related impulse
response (HRIR)
Microphone for recording HRIRs
76HRIRs
77HRIRs
78Generic HRTF
- Use of an average HRIR to determine HRTF
- Works fairly well for 80 of people
- Custom HRTFs are quite often impractical
79Environmental Effects
- Obstruction/Occlusion
- Reverberation
- Doppler Shift
- Atmospheric Effects
80Obstruction
- Same as sound shadowing
- Generally approximated by a ray test and a low
pass filter - High frequencies should get shadowed while low
frequencies diffract
81Obstruction
82Occlusion
- A completely blocked sound
- Example A sound that penetrates a closed door or
a wall - The sound will be muffled (low pass filter)
83Reverberation
- Effects from sound reflection
- Similar to echo
- Static reverberation
- Dynamic reverberation
84Static Reverberation
- Relies on the closed container assumption
- Parameters used to specify approximate
environment conditions (decay, room size, etc.) - Example Microsoft DirectSound3D EAX
85Static Reverberation
86Dynamic Reverberation
- Calculation of reflections off of surfaces taking
into account surface properties - Typically diffusion and diffraction ignored
- Wave Tracing
- Example Aureal A3D 2.0 or Beam Tracing Paper
87Dynamic Reverberation
88Comparison
- Static Reverberation less expensive
computationally, simple to implement - Dynamic Reverberation very expensive
computationally, difficult to implement, but
potentially superior results
89Doppler Shift
- Change in frequency due to velocity
- Very susceptible to temporal aliasing
- The faster the update rate the better
- Requires dedicated hardware
90Atmospheric Effects
- Attenuate high frequencies faster than low
frequencies - Moisture in air increases this effect
91Applications and Current Research
- Beam Tracing
- NAVE
- Effect of Audio on visual quality
- Audio Spotlight
92Beam Tracing
- Video! (from Siggraph 98 Conference Proceedings
Video Tape) - From paper A Beam Tracing Approach to Acoustic
Modeling for Interactive Virtual Environments,
Thomas Funkhouser
93NAVE
- HRTF (ITD, IID) via cross-talk cancellation of
two front speakers (SBLive! DS3D) - Two rear speakers provide directional and
intensity cues - Discrete bass channel (2nd sound card)
- Static reverberation (EAX)
94Effect of Audio on Visual Quality
- GT Study shows that ambient sounds enhance sense
of presence, as well as subjective quality of 3D
graphics - Enhanced recall and recognition of visual objects
- Dr. Russell Storms study showed enhanced
subjective quality of 2D graphics
95Audio Spotlight
- Produces audio beam (like a flash light)
- Makes use of interference from ultrasonic waves
- Potentially great dynamic range (better than
speaker cones)
96Audio Spotlight
97Audio Spotlight
98Audio Spotlight Compared to Speaker
99Audio Spotlight Beam Dimensions
100Audio Spotlight Distortion
101Audio Spotlight
- Holy Grail of interactive audio?
- Avoid cross-talk cancellation
- Track users ears and aim spotlight at head
- AR aim it at objects
102Open Research
- Diffusion (some work with radiosity)
- Diffraction
- HRTFs with audio spotlights
- Integration of graphics and audio hardware for
wave tracing (Nvidia?)