Title: Acoustic impulse response measurement using speech and music signals
1Acoustic impulse response measurement using
speech and music signals John Usher
Barcelona Media Innovation Centre Av.
Diagonal, 177, planta 9, 08018 Barcelona
2Using adaptive filters to estimate acoustic IRs
- In-situ acquisition of electro-acoustic IR, with
audience. - Continuous
- Fast enough for changing environment conditions.
- Use speech and music signal radiated from
loudspeaker. - AF for IR is nothing new!
- Used for
- Acoustic echo and feedback cancellation.
- Upmixing (2 ? 5.1, 2 ? 3D).
- ANC.
- Room EQ (using noise).
3(No Transcript)
4(No Transcript)
5Localizing objects in a room
- Emit speech warning from loudspeaker in room.
- Extract RIR using adaptive filter.
- Detect reflection onset timing, e.g. using
running kurtosis.
6(No Transcript)
7(No Transcript)
8Empirical experiment with small-room configuration
- Set-up
- Single microphone.
- Single loudspeaker.
- Small room (RT 0.5 s).
- Noise, speech or music radiated.
- Reference measurement using exponential
swept-sine deconvolution. - Further test using live (spoken) voice, with
close and far lav. mic.
9(No Transcript)
10Results Error Criterion
- Start with reference RIR (measured using
swept-sine technique). - Allow Adaptive Filter to converge for 10 seconds
to get AF spectra. - Calculate misalignment mean of difference
between the ref. and AF spectra (80 Hz-- 12 kHz)
11Rate of Convergence
12RIR using noise, music, voice (no obvious
difference in TD!)
Reference RIR from sine-sweep
13RIR from live voice and 2 lavs
Reference RIR from sine-sweep
14Comparison of filter spectra using noise, speech
and music (High SNR)
15Robustness to SNR (25, 12, 3 dB SNR) Masker
noise.
16Robustness to SNR Masker babble
17Comparison with DCFFT
- Dual Channel FFT method
- Following AES reviewer recommendation, compared
with commercial DCFFT system (SMAART).
18Comparison of NLMS vs DCFFT
19Effectiveness of AF RIR acquisition method with
long RIRs.
- 6 RIRs
- Obtained from Dirac fed into Altiverb.
- (NB No background noise simulated.)
- Football stadium, Caen Cathedral, church, EMT
plate, Filmorch. Stage Berlin, Castle. - RT60 9.6-1.1 secs.
- 1.2, 2.3, 3.5, 6.0, 7.8, 9.6.
20What happens if we just model the early part of
the IR?
Not much most of the spectral detail is in the
early part.
For longer IRs, the adaptive filter should be
longer.
Longer RT
21(No Transcript)
22(No Transcript)
23(No Transcript)
24Rate of Convergence for different RTs. 340 ms
window, 32 x overlap.
Longer RT
25Conclusions
- RIR acquisition for small and large rooms
- Adaptive filter updated using NLMS and overlapped
window. - Tested with RT60 0.5 -10 secs.
- Using music, speech and noise as excitation
signals. - Less accurate using live voice and two mics.
- Convergence in lt3 sec. (lt2 dB mean error).
- Little change in performance with SNRs down to 0
dB.
26Conclusions
- Music vs speech
- Music AF matches RIR 60 Hz12 kHz.
- Speech AF matches RIR 100 Hz 8 kHz.
- No considerable improvement for filter sizes gt340
ms. - I.e. we only need to model first 1/8th of RIR to
have a good approximation of the spectrum. - Adaptive whitening algorithm (LPC residuals) can
speed up convergence for highly coloured signals,
but only in low SNRS.
27Applications
In-situ continuous room EQ using filtered-x
approach. Object localization using speech
message. (e.g. using running kurtosis).
Re-mixing live music ambient sound separation
using filter output and error signal (e.g. get
clean signal room ambiance audience
applause).
28Cheers! John Usher
29(No Transcript)
30(No Transcript)