HIWIRE meeting - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

HIWIRE meeting

Description:

4 omnidirectional PZM Shure microphones, 16 kHz/16 bits ... 8 Shure microphones RME Hammerfall. Software environment: Linux, ALSA driver ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 18
Provided by: cvspC
Category:
Tags: hiwire | meeting | shure

less

Transcript and Presenter's Notes

Title: HIWIRE meeting


1
  • HIWIRE meeting
  • ITC-irst
  • Activity report
  • Marco Matassoni,
  • Piergiorgio Svaizer
  • March 9.-10. 2006
  • Torino

2
Outline
  • Beamforming and Adaptive Noise Cancellation
  • Environmental Acoustics Estimation
  • Audio-Video data collection
  • Multi-channel pitch estimation
  • Fixed-platform prototype acquisition module

3
Beamforming DS
Availability of multi-channel signals allows to
selectively capture the desired source
  • Issues
  • estimation of reliable TDOAs
  • Method
  • CSP analysis over multiple frames
  • Advantages
  • robustness
  • reduced computational power

4
DS with MarkIII
  • Test set
  • set N1_SNR0 of MC-TIDIGITS (cockpit noise),
    MarkIII channels
  • clean models, trained on original TIDIGITS
  • Results (WRR )

C_1 38.5
C_32 50.8
DS_C8 79.9
DS_C16 83.0
DS_C32 85.3
DS_C64 85.4
5
Adaptive Noise Cancellation
A remote microphone can be used as reference for
noise estimation
6
NMLS
  • The tested algorithm is the Normalized Mean Least
    Squares iterativelly estimate a FIR filter that
    minimizes the difference between the primary
    channel and the reference
  • We implemented two algorithms
  • time domain
  • frequency domain (subband)

7
DS ANC
  • Test set
  • set N1_SNR0 of MC-TIDIGITS (cockpit noise),
    MarkIII channels
  • clean models, trained on original TIDIGITS
  • Results (WRR)

C_32 (T) 64.7
C_32 (F) 72.4
DS_C64 (T) 81.8
DS_C64 (F) 88.4
8
Acoustics estimation
  • Idea
  • Simulate in a realistic way an environment (and
    the noise)
  • Method
  • Measure several impulse responses in an
    environment with a multi-channel equipment
    (through reproduction of chirp signals)
    preserving relative amplitudes and mutual delays
  • Generate appropriate noisy signals starting from
    clean data
  • The derived acoustics models perform better in
    the given environment (also) using real data.

9
AudioVideo Data Collection
  • Idea
  • In a noisy environment exploit additional
    features from video data
  • (collaboration with NTUA and TUC)
  • Design of AV corpus
  • Task English connected digits, HIWIRE
    commands/keywords
  • Channels 4 audio, 3 video
  • Environment acoustically-treated room noise
    diffusion

10
AudioVideo Setup
11
AudioVideo Setup
  • Audio
  • 4 omnidirectional PZM Shure microphones, 16
    kHz/16 bits
  • background noise diffused by 2 loudspeakers
  • Video
  • Webcam 640x480, 30 fps color, Unix timestamps
  • Stereoscopic camera pair 640x480, 30 fps - bw or
    15 fps color, perfectly synchronous
  • Current data sets
  • 8 speakers / connected digits
  • 2 speakers / HIWIRE keyword lists

12
Fixed prototype acquisition device
  • Hardware platform
  • 8 Shure microphones RME Hammerfall
  • Software environment
  • Linux, ALSA driver
  • Acquisition module
  • acquires synchronously multiple channels (8)
  • writes (to its standard output/file) the enhanced
    signal additional information/features
    (start/end speech hyphoteses, voiced/unvoiced,
    pitch, )

13
Multi-channel pitch analysis
The basic principle is that we can exploit many
observations of the same speech process Once
located the speaker, we can take into account the
different propagation time at the microphones and
perform a time-alignment
Pitch analysis can be performed using adjacent
time intervals extracted from different
microphone signals Basic correlation
techniques AMDF, AUTOC, WAUTOC
14
Single Channel Method Weighted Autocorrelation
  • For every frame of length N
  • (see Shimamura-Kobayashi, Trans. on SAP, 2001)

Hz Samples
15
A Multichannel WAUTOC Method
  • WAUTOC is computed for each channel, and summed
    over the M channels.
  • For a given frame
  • Issues
  • Weights wi may represent the channel reliability
  • Use of possible intraframe smoothing of the
    resulting fundamental frequency contour, which
    could improve the overall accuracy

16
Video example distant-talking speech recognition
17
Video example multi-channel pitch estimation
18
Forthcoming activities
  • more effective combination of beamforming and
    ANC
  • test also ANC before DS beamforming
  • test post-filtering after DS
  • audio-video collection an improved audio/video
    synchronization would be advisable
  • audio-video collection select best balance
    beetween quality and frame rate
  • acoustically characterize the target environment
    (prototype)
  • integrate the selected features in the
    multi-channel front-end
Write a Comment
User Comments (0)
About PowerShow.com