Title: TimeDomain Methods for Speech Processing
1Time-Domain Methods for Speech Processing
2Contents
- Introduction
- Time-Dependent Processing of Speech
- Short-Time Energy and Average Magnitude
- Short-Time Average Zero Crossing Rate
- Speech vs. Silence Discrimination Using Energy
and Zero-Crossing - The Short-Time Autocorrelation Function
- The Short-Time Average Magnitude Difference
Function
3Time-Domain Methods for Speech Processing
4Speech Processing Methods
- Time-Domain Method
- Involving the waveform of speech signal directly.
- Frequency-Domain Method
- Involving some form of spectrum representation.
5Time-Domain Measurements
- Average zero-crossing rate, energy, and the
autocorrelation function. - Very simple to implement.
- Provide a useful basis for estimating important
features of the speech signal, e.g., - Voiced/unvoiced classification
- Pitch estimation
6Time-Domain Methods for Speech Processing
- Time-Dependent Processing of Speech
7Time Dependent Natural of Speech
This is a test.
8Time Dependent Natural of Speech
9Short-Time Behavior of Speech
- Assumption
- The properties of speech signal change slowly
with time. - Analysis Frames
- Short segment of speech signal.
- Overlap one another usually.
10Time-Dependent Analyses
- Analyzing each frame may produce either a single
number, or a set of numbers, e.g., - Energy (a single number)
- Vocal tract parameters (a set of numbers)
- This will produce a new time-dependent sequence.
11General Form
n Frame index
x(m) Speech signal
T A linear or nonlinear transformation.
w(m) A window function (finite of infinite).
12General Form
Qn is a sequence of local weighted average values
of the sequence Tx(m).
13Example
Energy
Short-Time Energy
14Example
Short-Time Energy
15Example
Short-Time Energy
16General Short-Time-Analysis Scheme
Depending on the choice of window
17Time-Domain Methods for Speech Processing
- Short-Time Energy and Average Magnitude
18Applications
- Silence Detection
- Segmentation
- Lip Sync
19Short-Time Energy
20Short-Time Average Magnitude
21Block Diagram Representation
22Block Diagram Representation
What is the effect of windows?
23The Effects of Windows
- Window length
- Window function
24Rectangular Window
25Rectangular Window
26Rectangular Window
What is this?
Discuss the effect of window duration.
Discuss the effect of mainlobe width and sidelobe
peak.
27Commonly Used Windows
28Commonly Used Windows
Rectangular
Bartlett (Triangular)
Hanning
Hamming
Blackman
29Commonly Used Windows
Least mainlobe width
30Examples Short-Time Energy
Rectangular Window
Hamming Window
31Examples Average Magnitude
Rectangular Window
Hamming Window
32The Effects of Window Length
- Increasing the window length N, decreases the
bandwidth. - If N is too small, e.g., less than one pitch
period, En and Mn will fluctuate very rapidly. - If N is too large, e.g., on the order of several
pitch periods, En and Mn will change very slowly.
33The Choice of Window Length
- No signal value of N is entirely satisfactory.
- This is because the duration of a pitch period
varies from about 2 ms for a high pitch female or
a child, up to 25 ms for a very low pitch male.
34Sampling Rate
- The bandwidth of both En and Mn is just that of
the lowpass filter. - So, they need not be sampled as frequently as
speech signals. - For example
- Frame size 20 ms
- Sample period 10 ms
35Main Applications of En and Mn
- To provide the basis for distinguishing voiced
speech segments from unvoiced segments. - Silence detection.
36Differences of En and Mn
Emphasizing large sample-to-sample variations in
x(n).
The dynamic range (max/min) is approximately the
square root of En.
The differences in level between voiced and
unvoiced regions are not as pronounced as En.
37FIR and IIR
- All the windows that we discussed are FIRs.
- Each of them is a lowpass filter.
- It can also be an IIR.
38IIR Example
Recursive formulas
Short-Time Energy
Short-Time Average magnitude
39Time-Domain Methods for Speech Processing
- Short-Time Average Zero-Crossing Rate
40Voiced and Unvoiced Signals
41The Short-Time Average Zero-Crossing Rate
42Distribution of Zero-Crossings
43Example
44Time-Domain Methods for Speech Processing
- Speech vs. Silence Discrimination Using Energy
and Zero-Crossing
45Speech vs. Silence Discrimination
- Locating the beginning and end of a speech
utterance in the environment with background of
noise. - Applications
- Segmentation of isolated word
- Automatic speech recognition
- Save bandwidth for speech transmission
46Examples
- In some cases, we can locate the beginning and
end of a speech utterance using energy alone.
47Examples
- In other cases, we can locate the beginning and
end of a speech utterance using zero-crossing
rate alone.
48Examples
- Sometimes, we cannot do it using one criterion
alone.
Actual beginning
49Difficulties
- In general, it is difficult to locate the
boundaries if we encounter the following cases - Weak fricatives (/f/, /th/, /h/) at the beginning
or end. - Weak plosive bursts (/p/, /t/, /k/) at the
beginning or end. - Nasals at the end.
- Voiced fricatives which become devoiced at the
end of words. - Trailing off of vowel sounds at the end of an
utterance.
50Rabiner and Sambur
- 10 msec frame with sampling rate 100 time/sec is
used. - The algorithm assumes that the first 100 msec of
the interval contains no speech. - The means and standard deviations of the average
magnitude and zero-crossing rate of this interval
are computed to characterize the background noise.
51The Algorithm
52The Algorithm
1
2
3
No more than 25 frames
53Examples
54Examples
55Time-Domain Methods for Speech Processing
- The Short-Time Autocorrelation Function
56Autocorrelation Functions
57Properties
1. Even ?(k) ?(?k).
2. ?(k) ? ?(0) for all k.
3. ?(0) is equal to the energy of x(m).
58Properties
4. If x(m) has period P, i.e. x(m) x(mP), then
59Properties
4. If x(m) has period P, i.e. x(m) x(mP), then
This motivates us to use autocorrelation for
pitch detection.
60Short-Time Version
61Property
Rn(?k)
Rn(k)
62Property
hk(n?m)
yk(m)
63Property
hk(n?m)
yk(m)
64Property
65Another Formulation
66Another Formulation
A noncausal formulation
67Examples
N401
voiced
Unvoiced
Rectangular Window
Hamming Window
68Examples
Less data will be involved for larger lag k.
N401
N251
N125
69Modified Short-Time Autocorrelation Function
Original Version
Modified Version
70Modified Short-Time Autocorrelation Function
Max. lag
71Modified Short-Time Autocorrelation Function
Max. lag
72Examples
N401
Similar
voiced
Unvoiced
Rectangular Window
Modified Version
73Examples
N401
N251
N125
Rectangular Window
Modified Version
74Time-Domain Methods for Speech Processing
- The Short-Time Average Magnitude Difference
Function
75The AMDF
If x(n) is periodic with period P, then
Computationally more effective than
autocorrelation.
76Example
voiced
Unvoiced
77Exercise
- Recording a piece of yours speech to perform
voice/unvoice segmentation. - Design a effective algorithm to perform
autocorrelation.