Speech/Audio%20Signal%20Processing%20in%20MATLAB/Simulink - PowerPoint PPT Presentation

About This Presentation
Title:

Speech/Audio%20Signal%20Processing%20in%20MATLAB/Simulink

Description:

Speech/Audio Signal Processing in MATLAB/Simulink J.-S. Roger Jang ( ) CS Dept, Tsing-Hua Univ, Taiwan ( ) http://www.cs.nthu.edu.tw/~jang – PowerPoint PPT presentation

Number of Views:1047
Avg rating:3.0/5.0
Slides: 42
Provided by: KenH178
Category:

less

Transcript and Presenter's Notes

Title: Speech/Audio%20Signal%20Processing%20in%20MATLAB/Simulink


1
Speech/Audio Signal Processing in MATLAB/Simulink
2006 Speech/Audio Signal Processing in
MATLAB/Simulink
  • J.-S. Roger Jang (???)
  • CS Dept, Tsing-Hua Univ, Taiwan
  • (???? ???)
  • http//www.cs.nthu.edu.tw/jang
  • jang_at_cs.nthu.edu.tw

2
About Me
  • Experiences
  • 1993-1995 The MathWorks, Inc.
  • 1995-now CS Dept., Tsing Hua Univ., Taiwan
  • Research interests
  • Speech/Audio Signal Processing, Fuzzy Logic,
    Neural Networks, Pattern Recognition, Biometric
    Identification, Document Classification,
    Web-based Technologies
  • Programming languages
  • MATLAB, C, JavaScript, VBScript, Perl

3
Outline
  • Wave file manipulation
  • Reading, writing, recording ...
  • Time-domain processing
  • Delay, filtering, sptools
  • Frequency-domain processing
  • Spectrogram
  • Pitch determination
  • Auto-correlation, SIFT, AMDF, HPS ...
  • Others
  • Formant estimation, speech coding

4
Toolbox/Blockset Used
  • MATLAB
  • Simulink
  • Signal Processing Toolbox
  • DSP Blockset

5
MATLAB Primer
  • Before you start, you need to get familiar with
    MATLAB. Please read MATLAB Primer at the
    following page
  • http//neural.cs.nthu.edu.tw/jang/demo/demoDownloa
    d.asp
  • Exercise
  • Please plot two curves ysin(2t) and ycos(3t)
    in the same figure.
  • Please plot x vs. y where xsin(2t) and
    ycos(3t).

6
To Read a Wave File
  • To read a MS .wav file (PCM format only) wavread
  • y wavread(file)
  • wavread(file, n1, n2)
  • y, fs, nbits, opts wavread(file)
  • wavread(file, n)
  • y, fs, nbits wavread(file)
  • If the wav file is stereo, y will be a two-column
    matrix.

7
To Read a Wav File
  • Example (wavRead01.m)
  • y, fs wavread('singapore.wav')
  • plot((1length(y))/fs, y)
  • xlabel('Time in seconds')
  • ylabel('Amplitude')
  • Exercise
  • Plot the waveform of rrrrr.wav. Use MATLABs
    zoom button to find the consecutive curling R
    occurs.
  • Plot the two-channel waveform in flanger.wav.

8
Solution to the Previous Exercise
  • wavRead02.m
  • y, fs wavread(flanger.wav)
  • subplot(2,1,1), plot((1length(y))/fs, y(,1))
  • subplot(2,1,2), plot((1length(y))/fs, y(,2))

9
To Play Wav Files
  • To play sound using Windows audio output device
    wavplay, sound, soundsc
  • wavplay(y, fs)
  • wavplay(y, fs, async) non-blocking call
  • wavplay(y, fs, sync) blocking call
  • sound(y, fs)
  • soundsc() autoscale the sound
  • Example (wavPlay01.m)
  • y, fs wavread(rrrrr.wav)
  • wavplay(y, fs)
  • Exercise
  • Follow the example to play flanger.wav.

10
To Read/Play Using DSP Blocks
  • To read/play sound using DSP Blockset
  • DSP Blockset/DSP Sources/From Wave File
  • DSP Blockset/DSP Sinks/To Wave Device
  • Example
  • Exercise
  • Create a model as shown above.

Frame-based operation!
11
Solution
  • Solution to the previous exercise
  • slWavFilePlay01.mdl

12
To Write a Wave File
  • To write MS wave files wavwrite
  • wavwrite(y, fs, nbits, wavefile)
  • nbits must be 8 or 16.
  • y must have two columns for stereo data.
  • Amplitude values outside -1,1 are clipped.
  • Example (wavWrite01.m)
  • y, fs wavread(rrrrr.wav)
  • wavwrite(y, fs1.2, 8, testout.wav)
  • !start testout.wav
  • Exercise
  • Try out the above example.

13
To Record a Wave File
  • To record wave files
  • 1. Use the recording utility under WinXP.
  • 2. Use wavrecord under MATLAB.
  • 3. Use From Wave Device under Simulink, under
    DSP Blocksets/Platform Specific IO/Windows
    (Win32)
  • Example
  • 1. Go ahead and try WinXP recording utility!
  • 2. Try wavRecord01.m
  • 3. Try slWavFileRecord01.mdl
  • Exercise
  • Try out the above examples.

14
Time-Domain Speech Signals
  • A typical time-domain plot of speech signals
  • Amplitude volume or intensity
  • Frequency pitch

15
Changing Wave Playback Param.
  • To control the play of a sound
  • Normal wavplay(y, fs)
  • High volume wavplay(2y, fs)
  • Low volume wavplay(0.5y, fs)
  • High pitch (and faster) wavplay(y, 1.2fs)
  • Low pitch (and slower) wavplay(y, 0.8fs)
  • Exercise
  • Try wavPlay01.m and trace the code.
  • Create wavPlay02.m such that you can record
    your own voice on the fly.

16
Time-Domain Signal Processing
  • Take-home exrecise
  • How to get a high pitch with the same time span?

17
Synthetic Sounds
  • Use a sine wave generator (under DSP blocksets)
    to produce sounds
  • Single frequency
  • Multiple frequencies
  • Amplitude modulation
  • Exercise
  • Create the above models.

18
Solution
  • Solution to the previous exercise
  • sineSource01
  • sineSource02
  • sineSource03

19
Delay in Speech/Audio
  • What is a delay in a signal?
  • y(n) --gt y(n-k)
  • What effects can delay generate?
  • Echo
  • Reverberation
  • Chorus
  • Flanging

20
Single Delay in Audio Signal
  • Block diagram

a
Input
Output
u(n)
y(n) u(n) au(n-k)
Simulink model
Exercise Create the above model.
21
Multiple Delay in Audio Signal
  • How to create karaoke effects

a
Input
Output y(n)
u(n)
2
3
y(n) u(n) a u(n-k) a u(n-2k) a u(n-3k) ...
Simulink model
22
Multiple Delay in Audio Signal
  • Parameter values
  • Feedback gain a lt 1
  • Actual delay time k/fs
  • Exercise
  • Create the above model and change some parameters
    to see their effects.
  • Modify the model to take microphone input (so you
    can start singing karaoke now!)
  • Use a configurable subsystem to include all
    possible input files and the microphone. (See
    next page.)

23
Multiple Delay in Audio Signal
  • How to use configurable subsystem block?
  • 1. Create a library (say, wavinput.mdl)
  • 2. Get a block of configurable subsystem
  • 3. Fill the dialog box with the library name

24
Audio Flanging
  • Flanging sound
  • A sound similar to the sound of a jet plane
    flying overhead, or a "whooshing" sound
  • Pitch modulation due to a variable delay
  • Simulink demo
  • dspafxf.mdl (all platforms)
  • dspafxf_nt.mdl (for 95/98/NT)

25
Audio Flanging
  • Simulink model

Original spectrogram
Modified spectrogram
26
Signal Processing Using sptool
  • To invoke sptool, type sptool.

27
Speech Production
  • How is speech produced?
  • Speech is produced when air is forced from the
    lungs through the vocal cords (glottis) and along
    the vocal tract.
  • Analogy to System Theory
  • Input air forced into the vocal cords
  • Output media vibration
  • System (or filter) vocal tract
  • Pitch frequency frequency of the input
  • Formant frequency resonant frequency

28
Source Filter Model of Speech
  • The source-filter model of speech production
  • Speech is split into a rapidly varying excitation
    signal and a slowly varying filter. The envelope
    of the power spectra contains the vocal tract
    information.

Two important characteristics of the model are
fundamental (pitch) frequency (f0) and formants
(F1, F2, F3, )
29
Frame Analysis of Speech Signal
Speech wave form
Zoom in
Overlap
Frame
30
Spectrogram
  • Spectrogram (specgram.m) displays short-time
    frequency contents

Wave form
Spectrogram
31
Real-time Spectrogram
  • Try dspstfft_win32

Spectrogram
Spectrum
32
Pitch and Formants
  • Pitch and formants can be defined visually

Pitch period 1/f0
First formant F1
Second formant F2
33
Spectrogram Reading
  • Spectrogram Reading
  • http//cslu.cse.ogi.edu/tutordemos/SpectrogramRead
    ing/spectrogram_reading.html

Waveform
Spectrogram
compute
34
Pitch Determination Algorithms
  • Time-domain
  • Auto-correlation
  • AMDF (Average Magnitude Difference Function)
  • Gold-Rabiner algorithm (1969)
  • Frequency-domain
  • Cepstrum (Noll 1964)
  • Harmonic product spectrum (Schroeder 1968)
  • Others
  • SIFT (Simple inverse filter tracking)
  • Maximum likelihood
  • Neural network approach

35
Autocorrelation of Each Frame
  • Let s(k) be a frame of size 128.

1
128
s(k)
s(k-h)
h30
x(30) dot prod. of overlapped
sum(s(31128).s(199)
Autocorrelation x(h)
Pitch period
30
36
Autocorrelation via DSP Blockset
  • Real-time autocorrelation demo
  • Exercise
  • Construct the above model and try it.

37
Pitch Tracking via Autocorrelation
  • Real-time pitch tracking via autocorrelation
    pitch2.mdl

38
Formant Analysis
  • Characteristics of formants
  • Formants are perceptually defined.
  • The corresponding physical property is the
    frequencies of resonances of the vocal tract.
  • Formant analysis is useful as the position of the
    first two formants pretty much identifies a
    vowel.
  • Computation methods
  • Peak picking on the smoothed spectrum
  • Peak picking on the LP spectrum
  • Factoring for the LP roots
  • Fitting of mixture of Gaussians

39
Formant Analysis
  • Track Draw
  • A package for formant synthesis with options to
    sketch formant tracks on a spectrogram.
  • http//www.utdallas.edu/assmann/TRACKDRAW/trackdr
    aw.html
  • Formant Location Algorithm
  • MATLAB code by Michelle Jamrozik
  • http//ece.clemson.edu/speech/files.htm

40
Speech Waveform Coding
  • Time domain coding
  • PCM Pulse Code Modulation
  • DPCM Differential PCM
  • ADPCM Adaptive Differential PCM (dspadpcm.mdl)
  • Frequency domain coding
  • Sub-band coding
  • Transform coding
  • Speech Coding in MATLAB
  • http//www.eas.asu.edu/speech/education/educ1.htm
    l

41
Conclusions
  • Ideal tools for speech/audio signal processing
  • MATLAB
  • Simulink
  • Signal Processing Toolbox
  • DSP Blockset
  • Advantages
  • Reliable functions well-established and tested
  • Visible graphical algorithm design tools
  • High-level programming language yet C-compatible
  • Powerful visualization capabilities
  • Easy debugging
  • Integrated environment

42
References
  • 1 Discrete-Time Processing of Speech Signals,
    by Deller, Proakis and Hansen, Prentice
    Hall, 1993
  • 2 Fundamentals of Speech Recognition, by
    Rabiner and Juang, Prentice Hall, 1993
  • 3 Effects Explained, http//www.harmony-centra
    l.com/Effects/effects-explained.html
  • 4 TrackDraw, http//www.utdallas.edu/assmann/
    TRACKDRAW/trackdraw.html
  • 5 Speech Coding in MATLAB, http//www.eas.asu.
    edu/speech/education/educ1.html
Write a Comment
User Comments (0)
About PowerShow.com