Title: Efficient TimeScale Modification of Speech and Clear Voice Systems
1Efficient Time-Scale Modification of Speech
and Clear Voice Systems
2Time-Scale Modification
3Frequency-Scale Modification
4APPLICATIONS
Speech related applications include
SA1 Speech synthesis - based on acoustical unit
concatenation SA2 Foreign language
learning SA3 Audio-typing, and the training
thereof SA4 Accelerated aural reading for the
blind SA5 Voice mail speed up / slow
down SA6 Voice transformation - e.g. making a
female voice sound male SA7 Speech
recognition SA8 Film/speech synchronization SA9 Sp
eech Compression SA10 Noise reduction
5APPLICATIONS (cont.d)
Music related applications include
MA1 Music Transposition MA2 Music study and
editing MA3 Film/soundtrack synchronization MA4 Au
dio compression MA5 Noise reduction
6Existing Approaches
1. Time-domain techniques Most of the early
algorithms fall into this category. They are
based on overlap-add (OLA) methods. 2.
Frequency-domain techniques Most of the
algorithms which have been grouped in this
category are based on short-time Fourier
transform (STFT) or phase vocoder methods and as
such might strictly be considered joint
time-frequency techniques. 3. Parametric
techniques These algorithms are based on
modeling the audio signal production mechanism
and then modifying the resulting model parameters
to realise the required TSM/FSM of the signal by
resynthesis from the modified parameters. The
majority of the algorithms in this group are
based on the linear predictive (LP) model of
speech production and as such have been mainly
directed at speech related applications.
7Existing Approaches (cont.d)
1. Time-domain Overlap-Add methods, also called
sampling methods, splice methods, circular-buffer
methods. This category is the same as category
(1) above. 2. STFT/VOCODER methods. This is the
same as category (2) above. 3. LPC (linear
predictive coding)-based methods. This is very
similar to category (3) above except that now it
excludes non LPC-based parametric
models. 4. Methods based on modelling the signal
as a sum of sinusoids with time-varying
parameters. 5. Methods based on decomposing the
signal into a sinusoidal part and a stochastic
part.
8Time-Domain Overlap-Add Methods
Dudleys pitch-synchronous gating
1938
9Gabors modified sound-film projector, Gabor 46
10Fairbanks modified tape recorder, Fairbanks 54
We hasten the boy off my garage path to show
which edge young owls could view
11Shift register implementation of time-scale
compression by the sampling method, Lee 72
12RAM based implementation of TSM by the sampling
method, Lee 72
13Synchronized Overlap-and-Add (SOLA) Roucos 85
14STFT/VOCODER Methods
Dudleys vocoder, Dudley 39
15Flanagans phase vocoder analysis, Flanagan 66
16Flanagans phase vocoder synthesis, Flanagan 66
17Linear Prediction Methods
Dudleys speech production model
18Simulating a female voice from parameters
derived from a male voice, Atal 71
19Sinusoidal Modelling Methods
Sinusoidal analysis/synthesis system, Quatieri
85
20Sinusoidal Plus Stochastic Modelling Methods
Analysis part of the SMS system, Serra 90
21Review Conclusion
TSM/FSM Approach Comparison
22Synchronized Overlap-Add (SOLA)
SOLA Time-Scale Expansion and Compression
23Normalised cross-correlation measure
Simplified normalised cross-correlation measure
24A Novel Time-Scale Modification Algorithm
Adaptive Overlap-Add (AOLA)
25AOLA vs SOLA computational load comparison
26TSM results ? 0.5 and 2.0. Utterance water
from TIMIT Speech Corpus DARPA, Signal name
\TIMIT\TEST\DR1\FELC0\SA1.WAV