Challenges in Speech Processing for ManMachine Communication - PowerPoint PPT Presentation

1 / 60
About This Presentation
Title:

Challenges in Speech Processing for ManMachine Communication

Description:

Challenges in Speech Processing for ManMachine Communication – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 61
Provided by: yegnanara
Category:

less

Transcript and Presenter's Notes

Title: Challenges in Speech Processing for ManMachine Communication


1
(No Transcript)
2
Nonspectral Features for Speech Processing
  • B.Yegnanarayana
  • Dept of Computer Science and Engineering
  • Indian Institute of Technology Madras
  • Chennai-600036, India
  • yegna_at_cs.iitm.ernet.in
  • Talk at NOLISP2005
  • April 22, 2005

3
Images
  • Digital Image - Matrix of numbers
  • Types of Images
  • line sketches, binary, gray level and color
  • Still images, video, multimedia

lt Prev
Next gt
4
Objective To examine whether spectral
information alone is adequate, or there is
information in speech which we are not using in
many speech applications
5
Outline of the Talk
  • Illustration of nonspectral information Line
    sketch of an image
  • What is spectrum in a signal ?
  • Speech and language
  • Information in speech and nature of speech signal
  • Speech message medium vs message, system vs
    source
  • Progress in speech processing need for
    nonspectral processing
  • Some illustrations of nonspectral processing
  • Need for new tools/approaches

6
Speech and Language
  • Link between speech and language
  • Language is a sequence of events at different
    levels
  • Sound units (spectral), pairs of units (spectral
    transition) and prosody (duration and intonation)
  • Illustration of speech in different languages

7
Information in Speech Signal
  • Message, speaker, language, health, environment,
    emotion, etc
  • Nature of speech in relation to text Signal,
    spectrogram, epochs, formant contours, pitch
    contours, source chs, glottal pulse, duration,
    modulation frequency

8
Features of Speech signal Waveform and
Spectrogram
Speech signal
Spectrogram
Pitch contour
9
Segmental Features Short-time Spectra
10
Medium vs Message
  • System vs source
  • Excite a system to convey information
  • For time varying system, it also has message
  • But we focus mostly on system, i.e., spectrum
  • Spectrum is related to distribution of energy
    with freq
  • What is processed is the sequence of pulses in
    excitation, not frequency
  • Then why do spectrum analysis? We get the message
    even if the spectrum is degraded by channel and
    noise

11
Progress in Speech Processing
  • Humans do processing other than spectrum also
  • Demo Listening to LP residuals of different
    sounds
  • Information in speech 2nd order (spectrum),
    higher order (residual) and long term (prosody)
    relations
  • Leftovers in speech processing Phase, residual
    and suprasegmental
  • Why these were not addressed ? Lack of tools

12
Perception-based Features
  • Most of the features are volume (amplitude and
    spectrum) based
  • Performance may be limited
  • Significance of perceptual attributes
  • Original signal
  • LP residual

13
Some Illustrations of Sounds of Silence
Signal
Residual
Instants
More examples Signal, residual and instants
Some more examples Signal, residual and instants
Back
lt Prev
Next gt
14
Comment on Speech Recognition
Note that SR is achieved by using language and
other constraints, not by recognition of sound
units by spectral means. Speech is perceived even
by human beings as a sequence of acoustic hints
F.S.Cooper
15
Nonspectral Processing of Speech (Some
illustrations)
  • Perceptual listening Demo
  • Analysis Epoch extraction, LP residual, Hilbert
    transform
  • Periodic-aperiodic decomposition of LP residual
  • Prosody manipulation Epochs and LP residual
  • Prosody (duration and intonation) modeling using
    ANN Language and Speaker ID
  • Speaker recognition Complementary information in
    LP residual
  • Speech enhancement single ch, multiple ch,
    multispeaker

16
Perceptual Importance of LP Residual
(a)
(b)
(c)
(a) Speech signal, (b) Random excitation, (c) LP
res Excitation
17
Nonspectral Processing of Speech (Some
illustrations)
  • Perceptual listening Demo
  • Analysis Epoch extraction, LP residual, Hilbert
    transform
  • Periodic-aperiodic decomposition of LP residual
  • Prosody manipulation Epochs and LP residual
  • Prosody (duration and intonation) modeling using
    ANN Language and Speaker ID
  • Speaker recognition Complementary information in
    LP residual
  • Speech enhancement single ch, multiple ch,
    multispeaker

18
Speech Production System
19
Nature of Excitation of Voiced Speech
(a) Glottal volume velocity, (b) Speech Waveform
20
Principle of the Group delay processing
Consider a unit sample sequence
0
?
t
?(?)
?(?)
FT e-j??
0
0
?
?
- group delay ?(?) - ?
FT phase ?(?) - ??
-?
As the windows is moved to the right, ?(?)
increases linearly with time
?(?) f(t)
?
t
0
- ?
21
Principle of Group Delay(contd.)
Consider a damped sinusoid (Resonant system)
Low damping
0
0
?0
t
?
High damping
0
0
?0
t
?
?0
Shifted
0
-?
?
0
t
?
22
(No Transcript)
23
Speech, Glottal Waveform and Instants of
Excitation
a
b
c
d
(a) Segment of voiced speech signal (b) Linear
predication residual for (a) (c) Derivative of
the ECG signal (d) Instants of significant
excitation from the proposed algorithm.
24
Instants Significant Excitation for Male Speech
A N Y D I
C T I O N A RY
Speech
LP residual
Phase slope fn.
ZC instants
Gain plot
25
Instants of Significant Excitation for Female
Speech
Speech
LP residual
Phase slope fn.
ZC instants
Gain plot
26
Formant Extraction using Knowledge of Instants
27
Nonspectral Processing of Speech (Some
illustrations)
  • Perceptual listening Demo
  • Analysis Epoch extraction, LP residual, Hilbert
    transform
  • Periodic-aperiodic decomposition of LP residual
  • Prosody manipulation Epochs and LP residual
  • Prosody (duration and intonation) modeling using
    ANN Language and Speaker ID
  • Speaker recognition Complementary information in
    LP residual
  • Speech enhancement single ch, multiple ch,
    multispeaker

28
(No Transcript)
29
(No Transcript)
30
Nonspectral Processing of Speech (Some
illustrations)
  • Perceptual listening Demo
  • Analysis Epoch extraction, LP residual, Hilbert
    transform
  • Periodic-aperiodic decomposition of LP residual
  • Prosody manipulation Epochs and LP residual
  • Prosody (duration and intonation) modeling using
    ANN Language and Speaker ID
  • Speaker recognition Complementary information in
    LP residual
  • Speech enhancement single ch, multiple ch,
    multispeaker

31
Demonstration of Pitch period modification
(Indian Male speaker)
(a)
(b)
(c)
Speech waveforms and narrowband spectrograms for
(a) Original, (b) pitch period increased by 1.33
and (c) decreased by 0.66 factors
32
Demonstration of Duration modification
(Indian Male speaker)
(a)
(b)
(c)
Speech waveforms and narrowband spectrograms for
(a) Original, (b) Duration increased by 1.5
times and (c) duration decreased by 0.75 times
33
Nonspectral Processing of Speech (Some
illustrations)
  • Perceptual listening Demo
  • Analysis Epoch extraction, LP residual, Hilbert
    transform
  • Periodic-aperiodic decomposition of LP residual
  • Prosody manipulation Epochs and LP residual
  • Prosody (duration and intonation) modeling using
    ANN Language and Speaker ID
  • Speaker recognition Complementary information in
    LP residual
  • Speech enhancement single ch, multiple ch,
    multispeaker

34
  • Duration modeling using ANN
  • Language and speaker ID using duration
  • Intonation modeling using ANN
  • Language and Speaker ID using intonation

35

Performance of the duration model
36
Language and speaker identification using
duration models
Language identification
ltlt
Speaker identification
37
Performance of the intonation model
38
Language and speaker identification using
intonation models
Language identification
Speaker identification
39
Nonspectral Processing of Speech (Some
illustrations)
  • Perceptual listening Demo
  • Analysis Epoch extraction, LP residual, Hilbert
    transform
  • Periodic-aperiodic decomposition of LP residual
  • Prosody manipulation Epochs and LP residual
  • Prosody (duration and intonation) modeling using
    ANN Language and Speaker ID
  • Speaker recognition Complementary information in
    LP residual
  • Speech enhancement single ch, multiple ch,
    multispeaker

40
Speaker Recognition using LP Residual
  • Linear prediction (LP) residual as a feature for
    characterizing speaker-specific information
  • Learning speaker-specific characteristics using
    AANN models
  • Significance of regions of LP residual around the
    instants of glottal closure
  • Relative significance of excitation sources
    corresponding to different sound units
  • Complementary nature of excitation source features

41
Speaker Recognition using Spectral Features
  • Features linear prediction cepstral coefficients
    (LPCC)
  • Cepstral mean subtraction for channel
    compensation
  • AANN models for estimating the density of feature
    vectors
  • Modeling the distribution of LPCC features using
    AANN models

42
Combining Evidence form Spectral and Source
Features
  • Database (NIST 2003)
  • Training data
  • 149 male 191 female speakers
  • Duration 2 minutes
  • Verification data
  • 1343 male 2257 female tests
  • Duration 15 - 45 seconds
  • Sampling rate 8 kHz

43
Nonspectral Processing of Speech (Some
illustrations)
  • Perceptual listening Demo
  • Analysis Epoch extraction, LP residual, Hilbert
    transform
  • Periodic-aperiodic decomposition of LP residual
  • Prosody manipulation Epochs and LP residual
  • Prosody (duration and intonation) modeling using
    ANN Language and Speaker ID
  • Speaker recognition Complementary information in
    LP residual
  • Speech enhancement single ch, multiple ch,
    multispeaker

44
Enhancement of Speech from Single Channel
(Example 1)
DEGRADED SPEECH
ENHANCED SPEECH
45
Enhancement of Speech from Single
Channel (Example 2)
DEGRADED SPEECH
ENHANCED SPEECH
46
Speech Signals of Different Microphones
(a), (d) and (g) are waveforms at three
microphone locations. (b), (e) and (h) are
extracted instants of significant
excitation. (c), (f) and (i) are short-time
spectra for the marked regions
47
Characteristics of Hilbert Envelope
(a), (b) and (c) are LP residual, its Hilbert
transform and Hilbert Envelope for the signal at
mic-0. (d), (e) and (f) are LP residual, its
Hilbert transform and Hilbert Envelope for the
signal at mic-1.
48
Excitation Characteristics of Speech Signal
(a) Waveform, (b) LP residual and (c) Hilbert
Envelope for the speech collected over a
close-speaking microphone
(a) Waveform, (b) LP residual and (c) Hilbert
Envelope for the speech collected over a distant
microphone
49
Cross-Correlation of Hilbert Envelopes
50
Coherent and Incoherent Addition of Hilbert
Envelopes
Hilbert envelope for the signal at (a) mic-1, (b)
mic-2 and (c) mic-3. Result of (d) Incoherent
addition and (e) Coherent addition.
51
Enhancement of Speech from Multiple
Channels (Example 1)
MICROPHONE-1
MICROPHONE-2
WAVEFORM ADDITION
ENHANCED SPEECH
52
Enhancement of Speech using Multiple
Channels (Example 2)
MICROPHONE-1
MICROPHONE-2
WAVEFORM ADDITION
ENHANCED SPEECH
53
Enhancement of Speech in Multispeaker
Environment Results(Ex2)
mic-1
mic-2
Sp1g
Sp1p
Sp2g
Sp2p
54
Enhancement of Speech in Multispeaker
Environment Results(Ex1)
MICROPHONE-1
MICROPHONE-2
SPEAKER-1 ENHANCED
SPEAKER-2 ENHANCED
55
Enhancement of Speech in Multispeaker
Environment Results(Ex3)
MICROPHONE-1
MICROPHONE-2
SPEAKER-1 ENHANCED
SPEAKER-2 ENHANCED
56
Enhancement of Speech in Multispeaker Environment
MICROPHONE-1
MICROPHONE-2
ENHANCED SPEAKER-1
ENHANCED SPEAKER-2
57
Conclusions
  • Speech signal contains significant nonspectral
    information
  • Need to develop new tools
  • Breakthroughs are difficult due to heavy bias in
    thinking spectral way
  • Finally the motivation for this talk is

58
Do not follow where the path may lead go
instead where there is no path and leave a
trail Ralph Waldo Emerson
59
  • Thank you very much for your attention

60
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com