Microcomputer Systems 2

About This Presentation

Title:

Microcomputer Systems 2

Description:

Microcomputer Systems 2 Time Stretching & Pitch Shifting of Audio Signals – PowerPoint PPT presentation

Number of Views:168

Avg rating:3.0/5.0

Slides: 70

Provided by: vkepuska

Category:

more less

Transcript and Presenter's Notes

Title: Microcomputer Systems 2

1
Microcomputer Systems 2

Time Stretching Pitch Shifting of Audio Signals

2
Time Stretching Pitch Shifting of Audio Signals

Outline
Introduction
Techniques Used for Time Compression/Expansion
and Pitch Shifting
Comparison
Timbre and Formants

3
Outline

Introduction
Frequency Shift vs. Pitch Shift Audio Examples
Time Compression/Expansion
Techniques Used for Time Compression/Expansion
and Pitch Shifting
The Phase Vocoder
Related Topics
Why Phase
Time Domain Harmonic Scaling (TDHS)
More recent approaches
Comparison
Which Method to Use
Pitch Shifting Considerations
Audio Examples
Timbre and Formants
Phase Vocoder and Formants
Time Domain Harmronic scaling and Formants

4
Introduction

Time Stretching Pitch Shifting
Are two dominant techniques that used for speech
and sound manipulation.
Typical applications entail
Changing the speed of play-back (altering the
length of the signal) without altering the pitch
of the voice and/or instruments
Changing the pitch of the voice and/or
instruments without changing the length of the
signal.

5
Pitch Shifting
6
Pitch Shifting

As opposed to the process of pitch transposition
achieved using (a simple) sample rate conversion,
Pitch Shifting is a way to change the pitch of a
signal without changing its length.
In practical applications, this is usually
achieved by changing the length of a sound using
one of the methods discussed next and then
performing a sample rate conversion to change the
pitch.

7
Introduction

Pitch Shifting is NOT Frequency Shifting
There exists a certain confusion in terminology
in the literature, as Pitch Shifting is often
also incorrectly named 'Frequency Shifting'.
A true Frequency Shift (as obtainable by
modulating an analytic signal by a complex
exponential) will shift the spectrum of a sound,
while
Pitch Shifting will dilate it, upholding the
harmonic relationship of the sound.
Frequency Shifting yields a metallic, inharmonic
sound which may well be an interesting special
effect but which is a totally inadequate process
for changing the pitch of any harmonic sound
except a single sine wave.

8
Audio Examples of Pitch Shifting vs. Frequency
Shifting

Original Sound
Pitch Shifted
Frequency Shifted

9
Time Compression/Expansion
10
Time Compression/Expansion

Time Compression/Expansion, also known as "Time
Stretching" is the reciprocal process to Pitch
Shifting.
It leaves the pitch of the signal intact while
changing its speed (tempo).
This is a useful application when you wish to
change the speed of a voiceover without messing
with the timbre of the voice.

11
Time Compression/Expansion

There are several fairly good methods to do time
compression/expansion and pitch shifting but most
of them will not perform well on all different
kinds of signals and for any desired amount of
shift/stretch ratio.
Typically, good algorithms allow pitch shifting
up to 5 semitones on average or stretching the
length by 130.
When time stretching and pitch shifting single
instrument recordings you might even be able to
achieve a 200 time stretch, or a one-octave
pitch shift with no audible loss in quality.

12
Time Compression/Expansion of Speech

Typical Goals
To either speed up or slow down a speech signal
while maintaining the approximate pitch
Applications
Change voice mail playback
Court stenographers-play proceedings quicker
Sound effects
Etc

13
Techniques Used for Time Compression/Expansion
Pitch Shifting

Option 1 Change sample rate
If you modify the sample rate, you can change the
speed but the pitch is also changed
Increase sample rate higher pitch (chipmunk
sound)
Decrease sample rate lower pitch (drawn out
echo sound)
Option 2 Decimate or Interpolate Signal
If you change the number of samples, the result
is the same as modifying the sample rate

14
Techniques Used for Time Compression/Expansion
Pitch Shifting

Option 3 Use more complex methods
This will change the speed of the sample while
preserving the pitch data
Short Time Fourier Transform
Short Time Fourier Transform Magnitude
Sinusoidal Synthesis
Linear Prediction Synthesis

15
Techniques Used for Time Compression/Expansion
Pitch Shifting

Currently, there are two different principal time
compression/expansion and pitch shifting schemes
employed in most of today's applications
Phase Vocoder.
Time Domain Harmonic Scaling (TDHS).

16
Phase Vocoder
17
Phase Vocoder

Phase Vocoder. This method was introduced by
Flanagan and Golden in 1966 and digitally
implemented by Portnoff ten years later.
Portnoff, M.R. 1981a."Short-Time Fourier
Analysis of Sampled Speech."IEEE Transactions on
Acoustics, Speech and Signal ProcessingASSP-29(3)
364-373.
Portnoff, M.R. 1981b."Time-Scale Modification of
Speech Based on Short-Time Fourier
Analysis."IEEE Transactions on Acoustics, Speech
and Signal ProcessingASSP-29(3)374-390.

18
Phase Vocoder

It uses a Short Time Fourier Transform (use
abbreviation STFT from here on) to convert the
audio signal to the complex Fourier
representation.
Since the STFT returns the frequency domain
representation of the signal at a fixed frequency
grid, the actual frequencies of the partial bins
have to be found by converting the relative phase
change between two STFT outputs to actual
frequency changes.
Note the term 'partial' has nothing to do with
the signal harmonics. In fact, a STFT will never
readily give you any information about true
harmonics if you are not matching the STFT length
to the fundamental frequency of the signal and
even then is the frequency domain resolution
quite different to what our ear and auditory
system perceives.
The timebase of the signal is changed by
calculating the frequency changes in the Fourier
domain on a different time basis, and then an
iSTFT is done to regain the time domain
representation of the signal.

19
Phase Vocoder

Phase vocoder algorithms are used mainly in
scientific and educational software products (to
show the use and limitations of the Fourier
Transform) but have gained in popularity over the
past few years due to improvements that made it
possible to greatly reduce the artifacts of the
"original" phase vocoder algorithm.
The basic phase vocoder suffers from a severe
drawback because it introduces a considerable
amount of artifacts audible as 'smearing' and
'reverberation' (even at low expansion ratios)
due to the non-synchronized vertical coherence
of the sine and cosine basis functions that are
used to change the timebase.

20
Phase Vocoder

Puckette, Laroche and Dolson have shown that the
phasiness can be greatly reduced by picking peaks
in the Fourier spectrum and keeping the relative
phases around the peaks unchanged. Even though
this improves the quality considerably it still
renders the result somewhat phasey and diffuse
when compared to time domain methods.
Current research focuses on improving the phase
vocoder by applying intra-frame sinusoidal sweep
and ramp rate correction (Bristow-Johnson and
Bogdanowicz) and multi-resolution phase vocoder
concepts (Bonada).

21
Links to Publicly Available Vocoders

Pointers - Phase Vocoder
The MIT Lab Phase Vocoder
WaveMasher - GPL/Open Source Phase Vocoder by
Kenneth Sturgis
Sculptor A Real Time Phase Vocoder by Nick
Bailey
A Phase Vocoder implementation using Matlab
More reading on the Phase Vocoder
The IRCAM "Super Phase Vocoder
S.M.Bernsee's Pitch Shifting Using The Fourier
Transform article (with C code)

22
Time Domain Harmonic Scaling (TDHS).
23
Time Domain Harmonic Scaling (TDHS).

Time Domain Harmonic Scaling (TDHS). This is
based on a method proposed by Rabiner and Schafer
in 1978. It is heavily based on a correct
estimate of the fundamental frequency of the
sound processed.

24
Theory

Short Time Fourier Transform Methods
Chapter 7 in our text (Discrete-Time Speech
Signal Processing)
Refer to notes from in class for mathematical
theory of operation
I will pick up from where Dr. Kepuska stopped in
his notes

25
How is the Speech/Sound Signal Processed

Link
Ch7-Short-Time_Fourier_Transform_Analysis_and_Synt
hesis.ppt

26
Terminology Basic Idea
Frame Rate
Window Size
27
Short Time Fourier Transform

Short Time Fourier Transform
Also called the Fairbanks method
Extract successive short-time segments and then
discard the following ones

28
Short Time Fourier Transform

Frame Rate factor L
In frequency domain after taking the STFT, you
get
X(nL,?)
Form a new signal by
Y(nL, ?) X(snL, ?)
where s compression factor
Take Inverse Fourier Transform
Use Overlap and Add method to form new signal

29
Short Time Fourier Transform
X(nL, ?)
Y(nL, ?) X(2nL, ?)
30
Short Time Fourier Transform
New Sequence
Original Windowed Sequence
31
Short Time Fourier Transform

Problems
Pitch Synchronization
It is highly likely that the pitch periods will
not line up properly

32
Short Time Fourier Transform Magnitude

Short Time Fourier Transform Magnitude
Problems with STFT method relate directly to the
linear phase component of the STFT
Time shift phase change
Alternate approach is to only use the magnitude
portion of the STFTShort Time Fourier Transform
Magnitude

33
Short Time Fourier Transform Magnitude

Compression
With the Fairbanks method, time slices were
discarded
Now we can just compress the time slices
Form a new signal by
Y(nM, ?) X(nL, ?) where
M compression factor L / speed
i.e. for speeding up by two gt M L/2

34
Short Time Fourier Transform Magnitude

Compression
Take Inverse Fourier Transform
Use Overlap and Add method to form new signal

35
Short Time Fourier Transform Magnitude
X(nL, ?)
Y(nM, ?) X(nL, ?) ML/2
36
Short Time Fourier Transform Magnitude
New Sequence
Original Windowed Sequence
37
Other Methods

Sinusoidal SynthesisChapter 9
Time-warp the sinewave frequency track and the
amplitude function
This technique has been successful with not only
speech but also music, biological, and mechanical
signals
Problems
Does not maintain the original phase relations
Suffer from reverberance

38
Other Methods

Linear Prediction Synthesis
Use Homomorphic and Linear Prediction results to
modify the time base
Book briefly mentions this is possible but ran
out of time before I could investigate this
process more

39
Other Methods

New Techniques
Internet search showed several methods trying to
improve on what is out there now
Software
Different software programs that will change
speed for you
Adobe Audition is one of the most all
encompassing right now

40
Matlab Code-Prepare the Workspace

Prepare Workspace
close all
clear all
window_size_1 200
frame_rate_1 100
Speed to slow down by
speed 2

41
Matlab Code-Load the Speech Signal

Load Data File
filename input('Please enter the file name to
be used. ')
sample_data,sample_rate,nbits
wavread(filename)
loop_time floor(max(size(sample_data))/frame_rat
e_1)
sample_data((max(size(sample_data)))(loop_time1)
frame_rate_1)0

42
Matlab Code-Develop the Window

Create Windows
Want windows of 25ms
File sampled at 10,000 samples/sec
Want a window of size 10000 25ms(10ms)
triangle_30ms triang(window_size_1)
triangle_30ms hamming(window_size_1)
W0 sum(triangle_30ms)

43
Matlab Code-Window the Entire Speech Signal

Window the speech
for i 0loop_time-1
window_data(,i1)sample_data((frame_rate_1i
)1((i2) frame_rate_1)).triangle_30ms
end

44
Matlab Code-Perform the Fast Fourier Transform

Create FFT
for i 1loop_time
window_data_fft(,i) fft(window_data(,i),10
24)
end

45
Matlab Code-Recreate the Modified Signal

Recreate Original Signal
Initialize the recreated signals
reconstructed_signal(1(loop_time1)frame_rate_1)
0
real_reconstructed_signal(1(loop_time1)frame_ra
te_1)0
modified_reconstructed_signal(1(loop_time3)(fra
me_rate_1/speed))0
modified_reconstructed_signal_compressed(1(loop_t
ime3) (frame_rate_1/ speed))0

46
Matlab Code-Recreate the Modified Signal

Perform the ifft
for i 1loop_time
recreated_data_ifft(,i) ifft(window_data_ff
t(,i),1024)
real_recreated_data_ifft(,i)
ifft(abs(window_data_fft(,i)),1024)
truncated_recreated_data_ifft(,i)
recreated_data_ifft(1window_size_1,i).(frame_rat
e_1/W0)
real_truncated_recreated_data_ifft(,i)
real_recreated_data_ifft(1window_size_1,i).(fram
e_rate_1/W0)
end

47
Matlab Code-Recreate the Modified Signal

Get back to the original signal
for i0loop_time-1
reconstructed_signal((frame_rate_1i)1((i2)
frame_rate_1)) reconstructed_signal((frame_rate
_1i)1((i2)frame_rate_1))
truncated_recreated_data_ifft(,i1)'
real_reconstructed_signal((frame_rate_1i)1(
(i2)frame_rate_1)) real_reconstructed_signal((
frame_rate_1i)1((i2)frame_rate_1))
real_truncated_recreated_data_ifft(,i1)'
end

48
Matlab Code-Recreate the Modified Signal

Get a modified signal by deleting certain parts
(STFT)
for i0(loop_time-1)/speed
modified_reconstructed_signal((frame_rate_1i)
1((i2) frame_rate_1)) modified_reconstructed
_signal((frame_rate_1i)1((i2)frame_rate_1))
real_truncated_recreated_data_ifft(,ispeed1)'
end

49
Matlab Code-Recreate the Modified Signal

Initialize the compressed sequence (STFTM)
modified_reconstructed_signal_compressed(1frame_r
ate_1frame_rate_1/speed1)truncated_recreated_da
ta_ifft(frame_rate_1-frame_rate_1/speedwindow_siz
e_1,1)'
Get a modified signal by compressing
for i0(loop_time-2)
modified_reconstructed_signal_compressed((fram
e_rate_1/speedi)1(frame_rate_1/speedi)window_
size_1) modified_reconstructed_signal_compressed
((frame_rate_1/speedi)1(frame_rate_1/speedi)w
indow_size_1) real_truncated_recreated_data_ifft
(,i2)'
end

50
Matlab Code-Plot Results

Plot Results
Figure subplot(211)
plot(sample_data)
title('Original Speech') v1axis
hold on subplot(212)
plot(real(modified_reconstructed_signal))
title('STFT Synthesis w/ Speed
',num2str(speed),'X') v2axis
if speed gt 1
subplot(211) axis(v1)
subplot(212) axis(v1)
else
subplot(211) axis(v2)
subplot(212) axis(v2)
end

51
Matlab Code-Write Sound Files

Write sound files
wavwrite(modified_reconstructed_signal,sample_rate
,nbits,'C\Classes\ECE_5525\tea party fairbanks
2x.wav')

52
Examples Baseline Samples
STFT Sound file
Sample Rate 2X
Original File
STFTM Sound file
Sample Rate .5X
53
Examples STFTSpeed 0.5X
Sound file
54
Examples STFTSpeed 2X
Sound file
55
Examples STFTSpeed 4X
Sound file
56
Examples STFTMSpeed 0.5X
Sound file
57
Examples STFTMSpeed 2X
Sound file
58
Examples STFTMSpeed 4X
Sound file
59
More Results

Change in window size
If the window size becomes too small, then a
change in pitch will occur
Need window to be 2 to 3 pitch periods long
I generally used 20 30 ms windows

60
More Results

Change in frame rate
If the frame rate decreases too much, then there
will be too many samples overlapping to get an
intelligible signal

61
More Results

Change filter type
Tried Hammingnot much perceptual difference
Using the window energy becomes important here
Frame Rate/W0 is not equal to one

62
Conclusion

Optimum area
Frame rate is one half of the window size
Window size needs to be 2 to 3 pitch periods long
It is possible to easily change the time scale
and still maintain the original pitch although
the result is not always natural sounding

63
Conclusion

Further investigation
What to do when you want to slow down over half.
Using the STFTM means there will be gaps between
the sequences

64
Conclusion

Further investigation
What to do when you want to slow down over half
Could replicate windowed segments

65
Conclusion

Further investigation
Use the other methods to determine quality
Implement Sinusoidal Synthesis
Implement Linear Predictive Synthesis using
linear prediction and homomorphic methods
Work on synchronizing pitch periods
Shift samples so that the peaks line up
Scott and GerberSynchronized Overlap and Add
(SOLA)
Cross-correlation of two samples to find peak
Use the peaks to line up samples
Align the window at same relative location within
a pitch period

66
Questions

Are there any questions?

67
References

Quatieri, Thomas E. Discrete-Time Speech Signal
Processing. Prentice Hall, Upper Saddle River,
NJ, 2002.
Rabiner, L.R. and Schafer, R.W. Digital
Processing of Speech Signals. Prentice Hall,
Upper Saddle River, NJ, 1978.
Oppenheim, A.V and Schafer, R.W. Digital Signal
Processing. Prentice Hall, Englewood Cliffs, NJ,
1975.
Scott, R. and Gerber, S. Pitch Synchronous
Time-Compression of Speech, Proc. Conf. Speech
Communications Processing, p63-85, April 1972.

68
References

Fairbanks, G., Everitt, W.L., and Jaeger, R.P.
Method for Time or Frequency Compression-Expansio
n of Speech, IEEE Transaction Audio and
Electroacoustics, vol. AU-2 pp.7-12, Jan 1954.

Microcomputer Systems 2 - PowerPoint PPT Presentation

Microcomputer Systems 2

Microcomputer Systems 2 Time Stretching & Pitch Shifting of Audio Signals – PowerPoint PPT presentation