TimeDomain Methods for Speech Processing - PowerPoint PPT Presentation

1 / 77

About This Presentation

Title:

TimeDomain Methods for Speech Processing

Description:

Short-Time Average Zero Crossing Rate. Speech vs. Silence Discrimination ... Weak plosive bursts (/p/, /t/, /k/) at the beginning or end. Nasals at the end. ... – PowerPoint PPT presentation

Number of Views:1504

Avg rating:3.0/5.0

Slides: 78

Provided by: aimm02Cs

Category:

more less

Transcript and Presenter's Notes

Title: TimeDomain Methods for Speech Processing

1
Time-Domain Methods for Speech Processing

2
Contents

Introduction
Time-Dependent Processing of Speech
Short-Time Energy and Average Magnitude
Short-Time Average Zero Crossing Rate
Speech vs. Silence Discrimination Using Energy
and Zero-Crossing
The Short-Time Autocorrelation Function
The Short-Time Average Magnitude Difference
Function

3
Time-Domain Methods for Speech Processing

Introduction

4
Speech Processing Methods

Time-Domain Method
Involving the waveform of speech signal directly.
Frequency-Domain Method
Involving some form of spectrum representation.

5
Time-Domain Measurements

Average zero-crossing rate, energy, and the
autocorrelation function.
Very simple to implement.
Provide a useful basis for estimating important
features of the speech signal, e.g.,
Voiced/unvoiced classification
Pitch estimation

6
Time-Domain Methods for Speech Processing

Time-Dependent Processing of Speech

7
Time Dependent Natural of Speech
This is a test.
8
Time Dependent Natural of Speech
9
Short-Time Behavior of Speech

Assumption
The properties of speech signal change slowly
with time.
Analysis Frames
Short segment of speech signal.
Overlap one another usually.

10
Time-Dependent Analyses

Analyzing each frame may produce either a single
number, or a set of numbers, e.g.,
Energy (a single number)
Vocal tract parameters (a set of numbers)
This will produce a new time-dependent sequence.

11
General Form
n Frame index
x(m) Speech signal
T A linear or nonlinear transformation.
w(m) A window function (finite of infinite).
12
General Form
Qn is a sequence of local weighted average values
of the sequence Tx(m).
13
Example
Energy
Short-Time Energy
14
Example
Short-Time Energy
15
Example
Short-Time Energy
16
General Short-Time-Analysis Scheme
Depending on the choice of window
17
Time-Domain Methods for Speech Processing

Short-Time Energy and Average Magnitude

18
Applications

Silence Detection
Segmentation
Lip Sync

19
Short-Time Energy
20
Short-Time Average Magnitude
21
Block Diagram Representation
22
Block Diagram Representation
What is the effect of windows?
23
The Effects of Windows

Window length
Window function

24
Rectangular Window
25
Rectangular Window
26
Rectangular Window
What is this?
Discuss the effect of window duration.
Discuss the effect of mainlobe width and sidelobe
peak.
27
Commonly Used Windows
28
Commonly Used Windows
Rectangular
Bartlett (Triangular)
Hanning
Hamming
Blackman
29
Commonly Used Windows
Least mainlobe width
30
Examples Short-Time Energy
Rectangular Window
Hamming Window
31
Examples Average Magnitude
Rectangular Window
Hamming Window
32
The Effects of Window Length

Increasing the window length N, decreases the
bandwidth.
If N is too small, e.g., less than one pitch
period, En and Mn will fluctuate very rapidly.
If N is too large, e.g., on the order of several
pitch periods, En and Mn will change very slowly.

33
The Choice of Window Length

No signal value of N is entirely satisfactory.
This is because the duration of a pitch period
varies from about 2 ms for a high pitch female or
a child, up to 25 ms for a very low pitch male.

34
Sampling Rate

The bandwidth of both En and Mn is just that of
the lowpass filter.
So, they need not be sampled as frequently as
speech signals.
For example
Frame size 20 ms
Sample period 10 ms

35
Main Applications of En and Mn

To provide the basis for distinguishing voiced
speech segments from unvoiced segments.
Silence detection.

36
Differences of En and Mn
Emphasizing large sample-to-sample variations in
x(n).
The dynamic range (max/min) is approximately the
square root of En.
The differences in level between voiced and
unvoiced regions are not as pronounced as En.
37
FIR and IIR

All the windows that we discussed are FIRs.
Each of them is a lowpass filter.
It can also be an IIR.

38
IIR Example
Recursive formulas
Short-Time Energy
Short-Time Average magnitude
39
Time-Domain Methods for Speech Processing

Short-Time Average Zero-Crossing Rate

40
Voiced and Unvoiced Signals
41
The Short-Time Average Zero-Crossing Rate
42
Distribution of Zero-Crossings
43
Example
44
Time-Domain Methods for Speech Processing

Speech vs. Silence Discrimination Using Energy
and Zero-Crossing

45
Speech vs. Silence Discrimination

Locating the beginning and end of a speech
utterance in the environment with background of
noise.
Applications
Segmentation of isolated word
Automatic speech recognition
Save bandwidth for speech transmission

46
Examples

In some cases, we can locate the beginning and
end of a speech utterance using energy alone.

47
Examples

In other cases, we can locate the beginning and
end of a speech utterance using zero-crossing
rate alone.

48
Examples

Sometimes, we cannot do it using one criterion
alone.

Actual beginning
49
Difficulties

In general, it is difficult to locate the
boundaries if we encounter the following cases
Weak fricatives (/f/, /th/, /h/) at the beginning
or end.
Weak plosive bursts (/p/, /t/, /k/) at the
beginning or end.
Nasals at the end.
Voiced fricatives which become devoiced at the
end of words.
Trailing off of vowel sounds at the end of an
utterance.

50
Rabiner and Sambur

10 msec frame with sampling rate 100 time/sec is
used.
The algorithm assumes that the first 100 msec of
the interval contains no speech.
The means and standard deviations of the average
magnitude and zero-crossing rate of this interval
are computed to characterize the background noise.

51
The Algorithm
52
The Algorithm
1
2
3
No more than 25 frames
53
Examples
54
Examples
55
Time-Domain Methods for Speech Processing

The Short-Time Autocorrelation Function

56
Autocorrelation Functions
57
Properties
1. Even ?(k) ?(?k).
2. ?(k) ? ?(0) for all k.
3. ?(0) is equal to the energy of x(m).
58
Properties
4. If x(m) has period P, i.e. x(m) x(mP), then
59
Properties
4. If x(m) has period P, i.e. x(m) x(mP), then
This motivates us to use autocorrelation for
pitch detection.
60
Short-Time Version
61
Property
Rn(?k)
Rn(k)
62
Property
hk(n?m)
yk(m)
63
Property
hk(n?m)
yk(m)
64
Property
65
Another Formulation
66
Another Formulation
A noncausal formulation
67
Examples
N401
voiced
Unvoiced
Rectangular Window
Hamming Window
68
Examples
Less data will be involved for larger lag k.
N401
N251
N125
69
Modified Short-Time Autocorrelation Function
Original Version
Modified Version
70
Modified Short-Time Autocorrelation Function
Max. lag
71
Modified Short-Time Autocorrelation Function
Max. lag
72
Examples
N401
Similar
voiced
Unvoiced
Rectangular Window
Modified Version
73
Examples
N401
N251
N125
Rectangular Window
Modified Version
74
Time-Domain Methods for Speech Processing