Microcomputer Systems 2 - PowerPoint PPT Presentation

1 / 69
About This Presentation
Title:

Microcomputer Systems 2

Description:

Microcomputer Systems 2 Time Stretching & Pitch Shifting of Audio Signals – PowerPoint PPT presentation

Number of Views:159
Avg rating:3.0/5.0
Slides: 70
Provided by: vkepuska
Category:

less

Transcript and Presenter's Notes

Title: Microcomputer Systems 2


1
Microcomputer Systems 2
  • Time Stretching Pitch Shifting of Audio Signals

2
Time Stretching Pitch Shifting of Audio Signals
  • Outline
  • Introduction
  • Techniques Used for Time Compression/Expansion
    and Pitch Shifting
  • Comparison
  • Timbre and Formants

3
Outline
  • Introduction
  • Frequency Shift vs. Pitch Shift Audio Examples
  • Time Compression/Expansion
  • Techniques Used for Time Compression/Expansion
    and Pitch Shifting
  • The Phase Vocoder
  • Related Topics
  • Why Phase
  • Time Domain Harmonic Scaling (TDHS)
  • More recent approaches
  • Comparison
  • Which Method to Use
  • Pitch Shifting Considerations
  • Audio Examples
  • Timbre and Formants
  • Phase Vocoder and Formants
  • Time Domain Harmronic scaling and Formants

4
Introduction
  • Time Stretching Pitch Shifting
  • Are two dominant techniques that used for speech
    and sound manipulation.
  • Typical applications entail
  • Changing the speed of play-back (altering the
    length of the signal) without altering the pitch
    of the voice and/or instruments
  • Changing the pitch of the voice and/or
    instruments without changing the length of the
    signal.

5
Pitch Shifting
6
Pitch Shifting
  • As opposed to the process of pitch transposition
    achieved using (a simple) sample rate conversion,
    Pitch Shifting is a way to change the pitch of a
    signal without changing its length.
  • In practical applications, this is usually
    achieved by changing the length of a sound using
    one of the methods discussed next and then
    performing a sample rate conversion to change the
    pitch.

7
Introduction
  • Pitch Shifting is NOT Frequency Shifting
  • There exists a certain confusion in terminology
    in the literature, as Pitch Shifting is often
    also incorrectly named 'Frequency Shifting'.
  • A true Frequency Shift (as obtainable by
    modulating an analytic signal by a complex
    exponential) will shift the spectrum of a sound,
    while
  • Pitch Shifting will dilate it, upholding the
    harmonic relationship of the sound.
  • Frequency Shifting yields a metallic, inharmonic
    sound which may well be an interesting special
    effect but which is a totally inadequate process
    for changing the pitch of any harmonic sound
    except a single sine wave.

8
Audio Examples of Pitch Shifting vs. Frequency
Shifting
  • Original Sound
  • Pitch Shifted
  • Frequency Shifted

9
Time Compression/Expansion
10
Time Compression/Expansion
  • Time Compression/Expansion, also known as "Time
    Stretching" is the reciprocal process to Pitch
    Shifting.
  • It leaves the pitch of the signal intact while
    changing its speed (tempo).
  • This is a useful application when you wish to
    change the speed of a voiceover without messing
    with the timbre of the voice.

11
Time Compression/Expansion
  • There are several fairly good methods to do time
    compression/expansion and pitch shifting but most
    of them will not perform well on all different
    kinds of signals and for any desired amount of
    shift/stretch ratio.
  • Typically, good algorithms allow pitch shifting
    up to 5 semitones on average or stretching the
    length by 130.
  • When time stretching and pitch shifting single
    instrument recordings you might even be able to
    achieve a 200 time stretch, or a one-octave
    pitch shift with no audible loss in quality.

12
Time Compression/Expansion of Speech
  • Typical Goals
  • To either speed up or slow down a speech signal
    while maintaining the approximate pitch
  • Applications
  • Change voice mail playback
  • Court stenographers-play proceedings quicker
  • Sound effects
  • Etc

13
Techniques Used for Time Compression/Expansion
Pitch Shifting
  • Option 1 Change sample rate
  • If you modify the sample rate, you can change the
    speed but the pitch is also changed
  • Increase sample rate higher pitch (chipmunk
    sound)
  • Decrease sample rate lower pitch (drawn out
    echo sound)
  • Option 2 Decimate or Interpolate Signal
  • If you change the number of samples, the result
    is the same as modifying the sample rate

14
Techniques Used for Time Compression/Expansion
Pitch Shifting
  • Option 3 Use more complex methods
  • This will change the speed of the sample while
    preserving the pitch data
  • Short Time Fourier Transform
  • Short Time Fourier Transform Magnitude
  • Sinusoidal Synthesis
  • Linear Prediction Synthesis

15
Techniques Used for Time Compression/Expansion
Pitch Shifting
  • Currently, there are two different principal time
    compression/expansion and pitch shifting schemes
    employed in most of today's applications
  • Phase Vocoder.
  • Time Domain Harmonic Scaling (TDHS).

16
Phase Vocoder
17
Phase Vocoder
  • Phase Vocoder. This method was introduced by
    Flanagan and Golden in 1966 and digitally
    implemented by Portnoff ten years later.
  • Portnoff, M.R. 1981a."Short-Time Fourier
    Analysis of Sampled Speech."IEEE Transactions on
    Acoustics, Speech and Signal ProcessingASSP-29(3)
    364-373.
  • Portnoff, M.R. 1981b."Time-Scale Modification of
    Speech Based on Short-Time Fourier
    Analysis."IEEE Transactions on Acoustics, Speech
    and Signal ProcessingASSP-29(3)374-390.

18
Phase Vocoder
  • It uses a Short Time Fourier Transform (use
    abbreviation STFT from here on) to convert the
    audio signal to the complex Fourier
    representation.
  • Since the STFT returns the frequency domain
    representation of the signal at a fixed frequency
    grid, the actual frequencies of the partial bins
    have to be found by converting the relative phase
    change between two STFT outputs to actual
    frequency changes.
  • Note the term 'partial' has nothing to do with
    the signal harmonics. In fact, a STFT will never
    readily give you any information about true
    harmonics if you are not matching the STFT length
    to the fundamental frequency of the signal and
    even then is the frequency domain resolution
    quite different to what our ear and auditory
    system perceives.
  • The timebase of the signal is changed by
    calculating the frequency changes in the Fourier
    domain on a different time basis, and then an
    iSTFT is done to regain the time domain
    representation of the signal.

19
Phase Vocoder
  • Phase vocoder algorithms are used mainly in
    scientific and educational software products (to
    show the use and limitations of the Fourier
    Transform) but have gained in popularity over the
    past few years due to improvements that made it
    possible to greatly reduce the artifacts of the
    "original" phase vocoder algorithm.
  • The basic phase vocoder suffers from a severe
    drawback because it introduces a considerable
    amount of artifacts audible as 'smearing' and
    'reverberation' (even at low expansion ratios)
    due to the non-synchronized vertical coherence
    of the sine and cosine basis functions that are
    used to change the timebase.

20
Phase Vocoder
  • Puckette, Laroche and Dolson have shown that the
    phasiness can be greatly reduced by picking peaks
    in the Fourier spectrum and keeping the relative
    phases around the peaks unchanged. Even though
    this improves the quality considerably it still
    renders the result somewhat phasey and diffuse
    when compared to time domain methods.
  • Current research focuses on improving the phase
    vocoder by applying intra-frame sinusoidal sweep
    and ramp rate correction (Bristow-Johnson and
    Bogdanowicz) and multi-resolution phase vocoder
    concepts (Bonada).

21
Links to Publicly Available Vocoders
  • Pointers - Phase Vocoder
  • The MIT Lab Phase Vocoder
  • WaveMasher - GPL/Open Source Phase Vocoder by
    Kenneth Sturgis
  • Sculptor A Real Time Phase Vocoder by Nick
    Bailey
  • A Phase Vocoder implementation using MatlabĀ 
  • More reading on the Phase Vocoder
  • The IRCAM "Super Phase Vocoder
  • S.M.Bernsee's Pitch Shifting Using The Fourier
    Transform article (with C code)

22
Time Domain Harmonic Scaling (TDHS).
23
Time Domain Harmonic Scaling (TDHS).
  • Time Domain Harmonic Scaling (TDHS). This is
    based on a method proposed by Rabiner and Schafer
    in 1978. It is heavily based on a correct
    estimate of the fundamental frequency of the
    sound processed.

24
Theory
  • Short Time Fourier Transform Methods
  • Chapter 7 in our text (Discrete-Time Speech
    Signal Processing)
  • Refer to notes from in class for mathematical
    theory of operation
  • I will pick up from where Dr. Kepuska stopped in
    his notes

25
How is the Speech/Sound Signal Processed
  • Link
  • Ch7-Short-Time_Fourier_Transform_Analysis_and_Synt
    hesis.ppt

26
Terminology Basic Idea
Frame Rate
Window Size
27
Short Time Fourier Transform
  • Short Time Fourier Transform
  • Also called the Fairbanks method
  • Extract successive short-time segments and then
    discard the following ones

28
Short Time Fourier Transform
  • Frame Rate factor L
  • In frequency domain after taking the STFT, you
    get
  • X(nL,?)
  • Form a new signal by
  • Y(nL, ?) X(snL, ?)
  • where s compression factor
  • Take Inverse Fourier Transform
  • Use Overlap and Add method to form new signal

29
Short Time Fourier Transform
X(nL, ?)
Y(nL, ?) X(2nL, ?)
30
Short Time Fourier Transform
New Sequence
Original Windowed Sequence
31
Short Time Fourier Transform
  • Problems
  • Pitch Synchronization
  • It is highly likely that the pitch periods will
    not line up properly

32
Short Time Fourier Transform Magnitude
  • Short Time Fourier Transform Magnitude
  • Problems with STFT method relate directly to the
    linear phase component of the STFT
  • Time shift phase change
  • Alternate approach is to only use the magnitude
    portion of the STFTShort Time Fourier Transform
    Magnitude

33
Short Time Fourier Transform Magnitude
  • Compression
  • With the Fairbanks method, time slices were
    discarded
  • Now we can just compress the time slices
  • Form a new signal by
  • Y(nM, ?) X(nL, ?) where
  • M compression factor L / speed
  • i.e. for speeding up by two gt M L/2

34
Short Time Fourier Transform Magnitude
  • Compression
  • Take Inverse Fourier Transform
  • Use Overlap and Add method to form new signal

35
Short Time Fourier Transform Magnitude
X(nL, ?)
Y(nM, ?) X(nL, ?) ML/2
36
Short Time Fourier Transform Magnitude
New Sequence
Original Windowed Sequence
37
Other Methods
  • Sinusoidal SynthesisChapter 9
  • Time-warp the sinewave frequency track and the
    amplitude function
  • This technique has been successful with not only
    speech but also music, biological, and mechanical
    signals
  • Problems
  • Does not maintain the original phase relations
  • Suffer from reverberance

38
Other Methods
  • Linear Prediction Synthesis
  • Use Homomorphic and Linear Prediction results to
    modify the time base
  • Book briefly mentions this is possible but ran
    out of time before I could investigate this
    process more

39
Other Methods
  • New Techniques
  • Internet search showed several methods trying to
    improve on what is out there now
  • Software
  • Different software programs that will change
    speed for you
  • Adobe Audition is one of the most all
    encompassing right now

40
Matlab Code-Prepare the Workspace
  • Prepare Workspace
  • close all
  • clear all
  • window_size_1 200
  • frame_rate_1 100
  • Speed to slow down by
  • speed 2

41
Matlab Code-Load the Speech Signal
  • Load Data File
  • filename input('Please enter the file name to
    be used. ')
  • sample_data,sample_rate,nbits
    wavread(filename)
  • loop_time floor(max(size(sample_data))/frame_rat
    e_1)
  • sample_data((max(size(sample_data)))(loop_time1)
    frame_rate_1)0

42
Matlab Code-Develop the Window
  • Create Windows
  • Want windows of 25ms
  • File sampled at 10,000 samples/sec
  • Want a window of size 10000 25ms(10ms)
  • triangle_30ms triang(window_size_1)
  • triangle_30ms hamming(window_size_1)
  • W0 sum(triangle_30ms)

43
Matlab Code-Window the Entire Speech Signal
  • Window the speech
  • for i 0loop_time-1
  • window_data(,i1)sample_data((frame_rate_1i
    )1((i2) frame_rate_1)).triangle_30ms
  • end

44
Matlab Code-Perform the Fast Fourier Transform
  • Create FFT
  • for i 1loop_time
  • window_data_fft(,i) fft(window_data(,i),10
    24)
  • end

45
Matlab Code-Recreate the Modified Signal
  • Recreate Original Signal
  • Initialize the recreated signals
  • reconstructed_signal(1(loop_time1)frame_rate_1)
    0
  • real_reconstructed_signal(1(loop_time1)frame_ra
    te_1)0
  • modified_reconstructed_signal(1(loop_time3)(fra
    me_rate_1/speed))0
  • modified_reconstructed_signal_compressed(1(loop_t
    ime3) (frame_rate_1/ speed))0

46
Matlab Code-Recreate the Modified Signal
  • Perform the ifft
  • for i 1loop_time
  • recreated_data_ifft(,i) ifft(window_data_ff
    t(,i),1024)
  • real_recreated_data_ifft(,i)
    ifft(abs(window_data_fft(,i)),1024)
  • truncated_recreated_data_ifft(,i)
    recreated_data_ifft(1window_size_1,i).(frame_rat
    e_1/W0)
  • real_truncated_recreated_data_ifft(,i)
    real_recreated_data_ifft(1window_size_1,i).(fram
    e_rate_1/W0)
  • end

47
Matlab Code-Recreate the Modified Signal
  • Get back to the original signal
  • for i0loop_time-1
  • reconstructed_signal((frame_rate_1i)1((i2)
    frame_rate_1)) reconstructed_signal((frame_rate
    _1i)1((i2)frame_rate_1))
    truncated_recreated_data_ifft(,i1)'
  • real_reconstructed_signal((frame_rate_1i)1(
    (i2)frame_rate_1)) real_reconstructed_signal((
    frame_rate_1i)1((i2)frame_rate_1))
    real_truncated_recreated_data_ifft(,i1)'
  • end

48
Matlab Code-Recreate the Modified Signal
  • Get a modified signal by deleting certain parts
    (STFT)
  • for i0(loop_time-1)/speed
  • modified_reconstructed_signal((frame_rate_1i)
    1((i2) frame_rate_1)) modified_reconstructed
    _signal((frame_rate_1i)1((i2)frame_rate_1))
    real_truncated_recreated_data_ifft(,ispeed1)'
  • end

49
Matlab Code-Recreate the Modified Signal
  • Initialize the compressed sequence (STFTM)
  • modified_reconstructed_signal_compressed(1frame_r
    ate_1frame_rate_1/speed1)truncated_recreated_da
    ta_ifft(frame_rate_1-frame_rate_1/speedwindow_siz
    e_1,1)'
  • Get a modified signal by compressing
  • for i0(loop_time-2)
  • modified_reconstructed_signal_compressed((fram
    e_rate_1/speedi)1(frame_rate_1/speedi)window_
    size_1) modified_reconstructed_signal_compressed
    ((frame_rate_1/speedi)1(frame_rate_1/speedi)w
    indow_size_1) real_truncated_recreated_data_ifft
    (,i2)'
  • end

50
Matlab Code-Plot Results
  • Plot Results
  • Figure subplot(211)
  • plot(sample_data)
  • title('Original Speech') v1axis
  • hold on subplot(212)
  • plot(real(modified_reconstructed_signal))
  • title('STFT Synthesis w/ Speed
    ',num2str(speed),'X') v2axis
  • if speed gt 1
  • subplot(211) axis(v1)
  • subplot(212) axis(v1)
  • else
  • subplot(211) axis(v2)
  • subplot(212) axis(v2)
  • end

51
Matlab Code-Write Sound Files
  • Write sound files
  • wavwrite(modified_reconstructed_signal,sample_rate
    ,nbits,'C\Classes\ECE_5525\tea party fairbanks
    2x.wav')

52
Examples Baseline Samples
STFT Sound file
Sample Rate 2X
Original File
STFTM Sound file
Sample Rate .5X
53
Examples STFTSpeed 0.5X
Sound file
54
Examples STFTSpeed 2X
Sound file
55
Examples STFTSpeed 4X
Sound file
56
Examples STFTMSpeed 0.5X
Sound file
57
Examples STFTMSpeed 2X
Sound file
58
Examples STFTMSpeed 4X
Sound file
59
More Results
  • Change in window size
  • If the window size becomes too small, then a
    change in pitch will occur
  • Need window to be 2 to 3 pitch periods long
  • I generally used 20 30 ms windows

60
More Results
  • Change in frame rate
  • If the frame rate decreases too much, then there
    will be too many samples overlapping to get an
    intelligible signal

61
More Results
  • Change filter type
  • Tried Hammingnot much perceptual difference
  • Using the window energy becomes important here
  • Frame Rate/W0 is not equal to one

62
Conclusion
  • Optimum area
  • Frame rate is one half of the window size
  • Window size needs to be 2 to 3 pitch periods long
  • It is possible to easily change the time scale
    and still maintain the original pitch although
    the result is not always natural sounding

63
Conclusion
  • Further investigation
  • What to do when you want to slow down over half.
  • Using the STFTM means there will be gaps between
    the sequences

64
Conclusion
  • Further investigation
  • What to do when you want to slow down over half
  • Could replicate windowed segments

65
Conclusion
  • Further investigation
  • Use the other methods to determine quality
  • Implement Sinusoidal Synthesis
  • Implement Linear Predictive Synthesis using
    linear prediction and homomorphic methods
  • Work on synchronizing pitch periods
  • Shift samples so that the peaks line up
  • Scott and GerberSynchronized Overlap and Add
    (SOLA)
  • Cross-correlation of two samples to find peak
  • Use the peaks to line up samples
  • Align the window at same relative location within
    a pitch period

66
Questions
  • Are there any questions?

67
References
  • Quatieri, Thomas E. Discrete-Time Speech Signal
    Processing. Prentice Hall, Upper Saddle River,
    NJ, 2002.
  • Rabiner, L.R. and Schafer, R.W. Digital
    Processing of Speech Signals. Prentice Hall,
    Upper Saddle River, NJ, 1978.
  • Oppenheim, A.V and Schafer, R.W. Digital Signal
    Processing. Prentice Hall, Englewood Cliffs, NJ,
    1975.
  • Scott, R. and Gerber, S. Pitch Synchronous
    Time-Compression of Speech, Proc. Conf. Speech
    Communications Processing, p63-85, April 1972.

68
References
  • Fairbanks, G., Everitt, W.L., and Jaeger, R.P.
    Method for Time or Frequency Compression-Expansio
    n of Speech, IEEE Transaction Audio and
    Electroacoustics, vol. AU-2 pp.7-12, Jan 1954.

69
Reference Material
  • http//www.dspdimension.com/ of Stephan M. Bernsee
Write a Comment
User Comments (0)
About PowerShow.com