MultiMicrophone - PowerPoint PPT Presentation

1 / 58
About This Presentation
Title:

MultiMicrophone

Description:

MultiMicrophone – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 59
Provided by: ValuedGate1252
Category:

less

Transcript and Presenter's Notes

Title: MultiMicrophone


1
Multi-Microphone Speech Processing
Prof. Parham Aarabi Canada Research Chair in
Multi-Sensor Information Systems Founder and
Director of the Artificial Perception Lab
Department of Electrical and Computer
Engineering University of Toronto
2
The why of speech recognition
Speech recognition is great for -Interfacing
with handheld/tablet computers -Hands-free car
function control -Interactive man-machine
conversation
3
The why not of speech recognition
Speech recognition systems need further research
since -noise significantly degrades their
performance -they can be confused with multiple
speakers -they are computationally demanding
4
My teams research goal
Make real-time robust speech recognition possible
using multiple sensors and hardware acceleration
Things we can do with multiple
sensors Multi-Microphone Sound
Localization Multi-Microphone Speech
Enhancement Audiovisual Speech Enhancement
5
The Artificial Perception Lab
-an experimental environment with microphone
arrays, camera arrays, -1 postdoc, 8 graduate
students, 25 undergraduate research
students -funded by Dell, Microsoft,
6
Our approach
1- localize the speech source of interest
(distributed sound localization) 2- enhance
the speech source of interest 3- recognize the
enhanced signal
7
Basic Sound Localization
Microphone arrays can localize sound using time
of arrival and intensity differences.
Applications -smart rooms -automatic
telecon. -robust speech rec. -robotics -other
applications
8
Sound Localization
  • Sound localization can be expressed as
  • F(x) is a Spatial Likelihood Function (SLF)
  • Most basic example Steered Response Power (SRP)
    (a.k.a. delay-and-sum)

9
Sound Localization
  • Filter-and-sum version is better (i.e. using
    Generalized Cross Correlations Knapp76)
  • The SRP-PHAT algorithm Dibiase01 uses the
    Phase Transform

10
Distributed Microphone Arrays (DMAs)
  • DMA advantage higher localization accuracy
  • Where could DMAs be used?
  • Suitable for cars, smart rooms, large
    environments (airports, )
  • In situations with networked cell phones, PDAs,
    etc.
  • Are current techniques suitable for DMAs?
  • Perhaps not!

11
  • Prior Work
  • Account for the different levels of access of
    different microphones Aarabi 01

B
A
C
D
Source
E
12
Modeling the Environment, for a presumed speaker
location and orientation
  • Three attenuation factors
  • Source directivity, a(?)
  • Microphone directivity, b(?)
  • Source-microphone distance, d(x1,x2)

13
Modeling the Environment, for a presumed speaker
location and orientation
14
Modeling the Environment, for a presumed speaker
location and orientation
  • Overall attenuation is
  • time-delays of arrival are

15
Enhanced Sound Localization
  • Enhanced Sound Localization
  • F(x,?) is the new Spatial Likelihood Function
  • Attenuations ?i will weight the contributions
    from each individual microphone

16
Enhanced Sound Localization
  • N microphones, each observes a signal mi(t)
  • Define SLF to be proportional to the log
    likelihood

17
SLF Generation
  • Assuming that the noise is Gaussian, the
    log-likelihood based SLF becomes

18
Sound Localization Example Using 24 Mics.
19
Sound Localization Example Using 24 Mics.
High Likelihood
Low Likelihood
20
Sound Localization Example Using 24 Mics.
Low Likelihood
High Likelihood
21
Sound Localization Example Using 24 Mics.
22
Sound Localization Example Using 24 Mics.
23
Sound Localization Example Using 24 Mics.
24
Remarks
  • Can be extended to the filter-and-sum sound
    localization technique
  • Results in over 60 reduction in percent
    anomalies over SRP-PHAT
  • Brute-force search is inefficient, especially
    with the extra orientation dimension

25
So now, we can localize a sound source
Microphone Array
26
but, how do we use the location information to
remove noise
Multi-Microphone Phase-Based Speech Enhancement
27
Current multi-mic. systems improve SNR
28
Our goal reduce perceptual quality of noise
29
Spectrogram of one speaker with no noise
Xk(?)
Frequency (?)
Time segment index (k)
30
Spectrograms with two speakers and two
microphones
Microphone 2 Recording
Microphone 1 Recording
31
The counterpart of spectrograms
Besides the spectrograms X1k(?) and X2k(?),
we also have and
Frequency (?)
Time index (k)
32
Basic approach Ideally,
But, in reality, with noise and reverberations,
we have phase error
33
signal of interest power distribution
noise power distribution
power distribution
?
0
phase error (?(?))
34
Goal scale each time-frequency (TF) block in
order to damage the noise signal
proposed perceptually motivated phase-error filter
power distribution
?
0
phase error (?k(?))
35
Hence, we get a TF mask
36
Which can be applied to either spectrograms
37
Resulting in damaged noise
Original signal
Result
38
Phase-error filter design choices
proposed perceptually motivated phase-error filter
power distribution
?
0
phase error (?(?))
39
Comparison with other speech enhancement methods
40
Speech recognition experiments Using the Bell
Labs CARVUI multi-microphone speech database (56
speakers) SNR-5dB
45o
45o
6cm
41
Speech recognition accuracy rates
2 microphones
42
Speech recognition accuracy rates
43
Speech recognition accuracy rates
44
Speech recognition accuracy rates
45
Speech recognition accuracy rates
46
Speech recognition accuracy rates
47
Speech recognition accuracy rates
48
superdirective beamforming (4 mics.)
Time-Frequency Speech Separation (4 mics.)
49
Ongoing work speech recognition with feedback
Multi-Mic. Speech Processing
Speech Recognition Front-End
Speech Recognition Back-End
hello
50
Ongoing work probabilistic speech separation
Speaker 1 Speech Class
By Frey, Kristjansson, Deng, Attias, and others.
Freq. 1 Magnitude
Freq. 2 Magnitude
Freq. 3 Magnitude
Freq. 4 Magnitude
Freq. 5 Magnitude
Freq. N Magnitude
Freq. N Mic. 1 Observation
Freq. N Noise
51
Ongoing work probabilistic speech separation
Speaker 1 Speech Class
Speaker 2 Speech Class
Freq. 1 Magnitude
Freq. 1 Magnitude
Freq. 2 Magnitude
Freq. 2 Magnitude
Freq. 3 Magnitude
Freq. 3 Magnitude
Freq. 4 Magnitude
Freq. 4 Magnitude
Freq. 5 Magnitude
Freq. 5 Magnitude
Freq. N Magnitude
Freq. N Magnitude
Speaker Location Based Mixture
Freq. N Noise
Freq. N Noise
Freq. N Mic. 1 Observation
Freq. N Mic. 2 Observation
52
  • Multi-microphone localization and enhancement
    problems
  • Cannot be performed on standard processors in
    real-time
  • Scalability (not appropriate for 10 mics., even
    with DSPs)
  • Power requirements (not good for mobile
    applications)
  • Space requirements (multiple chips, etc.)

53
Solution Hardware Acceleration
  • Initially implemented a TDOA-estimation VLSI chip
    for sound localization
  • Initially implemented on FPGA Nguyen et. al.,
    and then in 0.18?m CMOS

54
Solution Hardware Acceleration
  • Currently working on a low-power joint
    localization and enhancement IC core

55
Solution Hardware Acceleration
  • Eventually, the goal is to have a low-power
    localization and enhancement co-processor

56
Concluding Remarks
  • Multi-microphone speech processing is useful for
    robust speech recognition
  • Other work at the APL includes
  • Distributed Processing For Microphone Arrays
    (from a Sensor Networks view)
  • Camera Arrays and Audiovisual Speech Processing

57
Research has led to spin-off company
58
Please visit
www.apl.utoronto.ca
Write a Comment
User Comments (0)
About PowerShow.com