MultiMicrophone

1 / 58

About This Presentation

Title:

MultiMicrophone

Description:

MultiMicrophone –

Number of Views:34

Avg rating:3.0/5.0

Slides: 59

Provided by: ValuedGate1252

Category:

more less

Transcript and Presenter's Notes

Title: MultiMicrophone

1
Multi-Microphone Speech Processing
Prof. Parham Aarabi Canada Research Chair in
Multi-Sensor Information Systems Founder and
Director of the Artificial Perception Lab
Department of Electrical and Computer
Engineering University of Toronto
2
The why of speech recognition
Speech recognition is great for -Interfacing
with handheld/tablet computers -Hands-free car
function control -Interactive man-machine
conversation
3
The why not of speech recognition
Speech recognition systems need further research
since -noise significantly degrades their
performance -they can be confused with multiple
speakers -they are computationally demanding
4
My teams research goal
Make real-time robust speech recognition possible
using multiple sensors and hardware acceleration
Things we can do with multiple
sensors Multi-Microphone Sound
Localization Multi-Microphone Speech
Enhancement Audiovisual Speech Enhancement
5
The Artificial Perception Lab
-an experimental environment with microphone
arrays, camera arrays, -1 postdoc, 8 graduate
students, 25 undergraduate research
students -funded by Dell, Microsoft,
6
Our approach
1- localize the speech source of interest
(distributed sound localization) 2- enhance
the speech source of interest 3- recognize the
enhanced signal
7
Basic Sound Localization
Microphone arrays can localize sound using time
of arrival and intensity differences.
Applications -smart rooms -automatic
telecon. -robust speech rec. -robotics -other
applications
8
Sound Localization

Sound localization can be expressed as
F(x) is a Spatial Likelihood Function (SLF)
Most basic example Steered Response Power (SRP)
(a.k.a. delay-and-sum)

9
Sound Localization

Filter-and-sum version is better (i.e. using
Generalized Cross Correlations Knapp76)
The SRP-PHAT algorithm Dibiase01 uses the
Phase Transform

10
Distributed Microphone Arrays (DMAs)

DMA advantage higher localization accuracy
Where could DMAs be used?
Suitable for cars, smart rooms, large
environments (airports, )
In situations with networked cell phones, PDAs,
etc.
Are current techniques suitable for DMAs?
Perhaps not!

Prior Work
Account for the different levels of access of
different microphones Aarabi 01

B
A
C
D
Source
E
12
Modeling the Environment, for a presumed speaker
location and orientation

Three attenuation factors
Source directivity, a(?)
Microphone directivity, b(?)
Source-microphone distance, d(x1,x2)

13
Modeling the Environment, for a presumed speaker
location and orientation
14
Modeling the Environment, for a presumed speaker
location and orientation

Overall attenuation is
time-delays of arrival are

15
Enhanced Sound Localization

Enhanced Sound Localization
F(x,?) is the new Spatial Likelihood Function
Attenuations ?i will weight the contributions
from each individual microphone

16
Enhanced Sound Localization

N microphones, each observes a signal mi(t)
Define SLF to be proportional to the log
likelihood

17
SLF Generation

Assuming that the noise is Gaussian, the
log-likelihood based SLF becomes

18
Sound Localization Example Using 24 Mics.
19
Sound Localization Example Using 24 Mics.
High Likelihood
Low Likelihood
20
Sound Localization Example Using 24 Mics.
Low Likelihood
High Likelihood
21
Sound Localization Example Using 24 Mics.
22
Sound Localization Example Using 24 Mics.
23
Sound Localization Example Using 24 Mics.
24
Remarks

Can be extended to the filter-and-sum sound
localization technique
Results in over 60 reduction in percent
anomalies over SRP-PHAT
Brute-force search is inefficient, especially
with the extra orientation dimension

25
So now, we can localize a sound source
Microphone Array
26
but, how do we use the location information to
remove noise
Multi-Microphone Phase-Based Speech Enhancement
27
Current multi-mic. systems improve SNR
28
Our goal reduce perceptual quality of noise
29
Spectrogram of one speaker with no noise
Xk(?)
Frequency (?)
Time segment index (k)
30
Spectrograms with two speakers and two
microphones
Microphone 2 Recording
Microphone 1 Recording
31
The counterpart of spectrograms
Besides the spectrograms X1k(?) and X2k(?),
we also have and
Frequency (?)
Time index (k)
32
Basic approach Ideally,
But, in reality, with noise and reverberations,
we have phase error
33
signal of interest power distribution
noise power distribution
power distribution
?
0
phase error (?(?))
34
Goal scale each time-frequency (TF) block in
order to damage the noise signal
proposed perceptually motivated phase-error filter
power distribution
?
0
phase error (?k(?))
35
Hence, we get a TF mask
36
Which can be applied to either spectrograms
37
Resulting in damaged noise
Original signal
Result
38
Phase-error filter design choices
proposed perceptually motivated phase-error filter
power distribution
?
0
phase error (?(?))
39
Comparison with other speech enhancement methods
40
Speech recognition experiments Using the Bell
Labs CARVUI multi-microphone speech database (56
speakers) SNR-5dB
45o
45o
6cm
41
Speech recognition accuracy rates
2 microphones
42
Speech recognition accuracy rates
43
Speech recognition accuracy rates
44
Speech recognition accuracy rates
45
Speech recognition accuracy rates
46
Speech recognition accuracy rates
47
Speech recognition accuracy rates
48
superdirective beamforming (4 mics.)
Time-Frequency Speech Separation (4 mics.)
49
Ongoing work speech recognition with feedback
Multi-Mic. Speech Processing
Speech Recognition Front-End
Speech Recognition Back-End
hello
50
Ongoing work probabilistic speech separation
Speaker 1 Speech Class
By Frey, Kristjansson, Deng, Attias, and others.
Freq. 1 Magnitude
Freq. 2 Magnitude
Freq. 3 Magnitude
Freq. 4 Magnitude
Freq. 5 Magnitude
Freq. N Magnitude
Freq. N Mic. 1 Observation
Freq. N Noise
51
Ongoing work probabilistic speech separation
Speaker 1 Speech Class
Speaker 2 Speech Class
Freq. 1 Magnitude
Freq. 1 Magnitude
Freq. 2 Magnitude
Freq. 2 Magnitude
Freq. 3 Magnitude
Freq. 3 Magnitude
Freq. 4 Magnitude
Freq. 4 Magnitude
Freq. 5 Magnitude
Freq. 5 Magnitude
Freq. N Magnitude
Freq. N Magnitude
Speaker Location Based Mixture
Freq. N Noise
Freq. N Noise
Freq. N Mic. 1 Observation
Freq. N Mic. 2 Observation
52