Wide/Narrow Band Spectrograms - PowerPoint PPT Presentation

About This Presentation
Title:

Wide/Narrow Band Spectrograms

Description:

Wide/Narrow Band Spectrograms Wide band (left) Combines harmonics Voiced speech vocal fold pulses (glottis air puffs) show as vertical lines Narrow band(right) – PowerPoint PPT presentation

Number of Views:1129
Avg rating:3.0/5.0
Slides: 41
Provided by: Harv55
Learn more at: http://cs.sou.edu
Category:

less

Transcript and Presenter's Notes

Title: Wide/Narrow Band Spectrograms


1
Wide/Narrow Band Spectrograms
  • Wide band (left)
  • Combines harmonics
  • Voiced speech vocal fold pulses (glottis air
    puffs) show as vertical lines
  • Narrow band(right)
  • Individual harmonics
  • Narrow-band displays formants horizontally
  • No vocal pulses shown
  • Display parameters
  • Generally log power (log(amplitude2)
  • Frame shift 1 ms typical

Spectrogram for a vowel sound
Spectrograms vowel with varying pitch
2
Frame Positioning
  • Pitch-synchronous
  • Centered around a pitch period
  • Varied size frames
  • Unvoiced sections assume fixed pitch period
  • Challenge Determine exact pitch period locations
  • Pitch-asynchronous
  • Fixed frames and shifts
  • typically 25-30 ms frame width with a 10 ms frame
    shift
  • Tradeoffs
  • Too large contains more than one phoneme
  • Too small cannot determine F0 or the harmonics

3
Source Filter Separation
  • Source F0 correlating to pitch and intonation
  • Filter The spectral envelope
  • Three separation approaches Filter bank,
    cepstral analysis, and linear prediction
  • Importance Spectrum and pitch need to be studied
    separately

4
Filter Bank
  • Time Domain
  • Series of linear band pass filters
  • Frequency Domain
  • Window a frame (Ex Hamming)
  • Perform Fourier Transform
  • Warp frequencies (Ex Mel scale)
  • Compute weighted sum of each bin
  • Advantage
  • simple and robust for finding spectral envelope
  • Okay for ASR (unless language is tonal)
  • Disadvantage
  • Lose too much detail to find pitch.
  • Peaks can fall between harmonics not good for TTS

5
The Cepstrum
  • Definition cn F -1log(F(xn
  • Note Sometimes the Cepstrum is taken on the
    square of the spectrum rather than on the log of
    the spectrum
  • Treat the spectrum as a wave
  • Formant frequency is slow
  • Glottal pulses are fast
  • Cepstrum separates the two
  • Cepstral Terminology
  • Cepstrum is Spectrum in reverse
  • Quefrency instead of frequency
  • Lifter instead of filter

6
Separating Source from Vocal Filter
  • Source
  • Excites particular fundamental frequencies
  • The glottis source sometimes is noisy
  • Filter
  • The source is filtered resulting in vocal tract
    resonances
  • Goal Separate excitation frequencies from the
    filter
  • Process
  • Time domain convolves source with filter (un
    vn)
  • Convolution multiplies in the frequency domain
    (UV)
  • Log converts multiplication to a sum (log(UV)
    log(U) log(V))
  • The V (filter) varies slowly the U (excitation)
    varies quickly.
  • The inverse operation separates un and vn
    into different quefrencies
  • Observations
  • There are no pitch excitations in unvoiced speech
  • Cepstral analysis works well for speech
    recognition applications

7
Cepstrum Process Illustration
Spectral envelope on the left, F0 is one of the
excitations
8
Cepstrum Samples
Note Band passing frequencies below 100 or
greater than 900 can help
9
Cepstral Mean Normalization (CMN)
For Automatic Speech Recognition
  • For each window we perform a Cepstral analysis
  • Mel scaled Quefrencies summed into 13 to 39 bins
  • Each bin represents a Cepstral vector X x0,
    x1, , xT-1
  • Compute the mean of each vector coefficientµk
    1/T ?t0,T-1xt where k is a vector coefficient
  • Subtract uk from coefficient k of each vector X

10
Cepstral Evaluation
  • The Cepstral process eliminates phase data.
    However, human perception largely, but not
    totally, ignores phase
  • Use the lower quefrencies to study the vocal
    filter
  • Use the peak to study pitch and glottis behavior
  • Zeroing the pitch portion of the Cepstrum and
    transforming back to the frequency domain is an
    approach for speech recognition
  • Disadvantage of Cepstrals They are difficult to
    interpret using a visual plot

11
Time Domain Pitch Detection
  • Recall the autocorrelation pitch detection
    algorithm
  • Correlate a window of speech with a previous
    window
  • Find the best match
  • Problem too many false peaks
  • Peak and center clipping
  • Algorithm to reduce false peaks
  • clip the top/bottom of a signal
  • Center the remainder around 0
  • Other alternatives
  • Researchers propose many other pitch detection
    algorithms
  • There are much debate as to which is the best

12
Epoch Detection
  • Simply determining the pitch is not sufficient
    for synthesis
  • Unit selection requires accurate anchors to be
    able to merge segments of speech
  • Otherwise clicks and other artifacts will be
    heard
  • Pitch-marking or epoch-detection attempt to
    accurately mark pitch points
  • Mark peaks or troughs
  • Mark Instant of glottal closure (large negative
    pulse)
  • There are many algorithms proposed, but this
    remains an open research area

13
Linear Prediction Coding (LPC)
  • Originally developed to compress (code) speech
  • Although coding pertains to compression, the term
    LPC has much broader implications in NLP
  • LPC is equivalent to the vocal tract model (Week
    6)
  • LPC is another computational method to
  • Compute vocal tract reflection coefficients
  • Compute vocal tract filter coefficients
  • LPC is useful to separating source (glottis) from
    filter (vocal tract)

14
Linear Predictive Encoding (LPC)
One approach There are many others with better
compression
  • Pseudo Code
  • WHILE not EOF
  • READ sample n (sn)x prediction()
  • error x sn
  • IF error too large to
  • fit in compressed size
  • WRITE special code
  • WRITE sn
  • ELSE
  • WRITE error
  • Concept
  • Guess at the next value using a set of previous
    values
  • Instead of outputting the actual data, output the
    error from the guess
  • Less bits should be needed if the guess is good

15
Linear Algebra Background
  • N equations and P unknowns
  • If NltP, 8 number of potential solutions
  • x y 5
  • Solutions are along the line y 5-x
  • If NP, there is at most one unique solution
  • Solution x y 5 and x y 3, solution x4,
    y1
  • If NgtP, there cannot even be one solution
  • No solutions for xy 4, x y 3, 2x 7 7
  • The best we can do is find the closes fit

16
Least Squares minimize error
  • First Approach Linear algebra find orthogonal
    projections of vectors onto the best fit
  • Second Approach Calculus Use derivative with
    zero slope to find best fit

17
Solving n equations and n unknowns
  • Gaussian Elimination
  • Complexity O(n3)
  • Successive Iteration
  • Complexity varies
  • Cholskey Decomposition
  • More efficient, still O(n3)
  • Levenson-Durbin
  • Complexity O(n2)
  • Works for symmetric Toplitz matrices

Definitions for any matrix, A Transpose (AT)
Replace aij by aji for all i and j Symmetric AT
A Positive Definite No complex solutions
Toplitz Diagonals to the right all have equal
values Lower/Upper triangular No non zero values
above/below diagonal
18
Symmetric Toeplitz Matrices
Example
  • Flipping rows and columns produces the same
    matrix
  • Every diagonal to the right contains the same
    value

19
Levinson Durbin Algorithm
or
Step 0 E0 1 r0 Initial Value
Step 1 E1 -3 (1-k12)E0 k1 2 r1/E0
Step 2 E2 -8/3 (1-k22)E1 k2 1/3 (r2 a11r1)/E1
Step 3 E3 -5/2 (1-k32)E2 k3 1/4 (r3 a21r2 a22r1)/E2
Step 4 E4 -12/5 (1-k42) E3 k4 1/5 r4 a31r3 a32r2 a33r1)/E3
a112 k1
a214/3 a11-k2a11 a221/3k2
a315/4 a21-k3a22 a320 a22-k3a21 a331/4 k3
a416/5 a31-k4a33 a420 a32-k4a32 a430a33-k4a31 a441/5k4
  • Verify results by plugging a41, a42, a43, a44
    back into the equations
  • 6/5(1) 0(2) (0)3 1/5(4) 2, 6/5(2) 0(1)
    0(2) 1/5(3) 3
  • 6/5(3) 0(2) 0(1) 1/5(2) 4, 6/5(4) 0(3)
    0(2) 1/5(1) 5

20
Levinson-Durbin Pseudo Code
  • E0 r0
  • FOR step 1 TO P
  • kstep ri
  • FOR i 1 TO step-1 THEN kstep - ai-1,i
    rstep-i
  • kstep / Estep-1
  • Estep (1 k2step)Estep-1
  • astep,step kstep-1
  • For i 1 TO step-1 THEN astep,i astep-1,I
    kstepastep-1, step-i

Note ri are the row 1 matrix coefficients
21
Cholesky Decomposition
  • Requirements
  • Symmetric (same matrix if flip rows and columns)
  • Positive definite matrix (no complex solutions)
  • Solution
  • Factor matrix A into A LLT where L is lower
    triangular
  • Perform forward substitution to solve L(LTak)
    bk
  • Use the resulting vector, xi, in the above step
    to perform a backward substitution to solve for
    LTak xi
  • Complexity
  • Factoring step O(n3/3)
  • Forward substitution n2
  • Backward substitution n2

22
Cholesky Factorization
Result
23
Cholesky Factorization Pseudo Code
  • FOR k1 TO n-1
  • lkk a½kkFOR j k1 TO n
  • ljk ajk/ lkk
  • FOR j k1 TO n
  • FOR i j TO n
  • aij aij lik ljk
  • lnn ann
  • Column index k
  • Row index j
  • Elements of matrix A aij
  • Elements of matrix L l

24
Illustration Linear Prediction
  • 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
    15, 16

Goal Estimate yn using the three previous
values yn a1 yn-1 a2 yn-2 a3 yn-3 Three ak
coefficients, Frame size of 16 Thirteen equations
and three unknowns Note The equation is an IIR
filter
25
LPC Basics
  • Predict xn from xn-1, , xn-P
  • en yn - ?k1,P ak yn-k
  • en is the error between the projection and the
    actual value
  • The goal is to find the coefficients that produce
    the smallest en value
  • Concept
  • Square the error
  • take the partial derivative with respect to each
    ak
  • Find the solution with zero derivative (the
    minimum).
  • Result P equations with P unknowns

26
Finding the Best LPC Estimate
  • One linear prediction equation en yn - ?k1,P
    ak yn-k
  • Over a whole frame we have n equations and k
    unknowns
  • Sum en over the entire frame E ?n0,N-1(yn -
    ?k1,P ak yn-k)
  • Square the total error E2 ?n0,N-1 (yn -
    ?k1,P ak yn-k)2
  • Take partial derivative with respect to each aj
    generates P equations (Ej)
  • Like a regular derivative treating only aj as a
    variable2Ej 2(?n0,N-1 (yn - ?k1,P
    akyn-k)yn-j)
  • Calculus Chain Rule if y y(u(x)) then dy/dx
    dy/du du/dx
  • Set each Ej to zero (zero derivative) to find the
    minimum P errorsfor j 1 to P then 0 ?n0,N-1
    (yn - ?k1,P akyn-k)yn-j (j indicates the
    equation)
  • Rearrange terms for each j of the P equations,
    ?n0,N-1 ynyn-j?n0,N-1?k1,Pakyn-kyn-j?k1,P?n
    1,Nakyn-kyn-j ?k1,Pakf(j,k)f(j,0)
  • Result P equations and P unknowns where f(j,k)
    ?n0,N-1 yn-kyn-j

27
Covariance Method
  • Result from previous slide (equation j)
    ?n0,N-1ynyn-j ?k1,P?n0,N-1akyn-kyn-j
  • A more concise notation when f(j,k) ?n0,N-1
    yn-kyn-j is f(j,0)?k1,Pakf(j,k)
  • Now we have P equations and P unknowns
  • Because f(j,k) f(k,j), the matrix is symmetric
  • Solution requires O(n3) iterations (ex
    Cholskeys decomposition)
  • Why covariance? Its not probabilistic, but the
    matrix looks similar

28
Covariance Example
Recall f(j,k) ?nstart,startN-1 yn-kyn-j
Where equation j is f(j,0) ?k1,Pakf(j,k)
  • Signal , 3, 2, -1, -3, -5, -2, 0, 1, 2, 4, 3,
    1, 0, -1, -2, -4, -1, 0, 3, 1, 0,
  • Frame -5, -2, 0, 1, 2, 4, 3, 1, Number of
    coefficients 3
  • f(1,1) -3-3 -5-5 -2-2 00 11 22
    44 33 68
  • f(2,1) -1-3 -3-5 -5-2 -20 01 12
    24 43 50
  • f(3,1) 2-3 -1-5 -3-2 -50 -21 02
    14 23 13
  • f(1,2) -3-1 -5-3 -2-5 0-2 10 21
    42 34 50
  • f(2,2) -1-1 -3-3 -5-5 -2-2 00 11
    22 44 60
  • f(3,2) 2-1 -1-3 -3-5 -5-2 -20 01
    12 24 36
  • f(1,3) -32 -5-1 -2-3 0-5 1-2 20
    41 32 13
  • f(2,3) -12 -3-1 -5-3 -2-5 0-2 10
    21 42 36
  • f(3,3) 22 -1-1 -3-3 -5-5 -2-2 00
    11 22 48
  • f(1,0) -3-5 -5-2 -20 01 12 24
    43 31 50
  • f(2,0) -1-5 -3-2 -50 -21 02 14
    23 41 23
  • f(3,0) 2-5 -1-2 -30 -51 -22 04
    13 21 -12

29
Auto-Correlation Method
  • Assume all values of the signal outside of
    0ltjltN-1 is zero
  • Correlate from -8 to 8 (most values are 0)
  • The LPC formula for f becomes f(j,k)?n0,N-1-(j-
    k) ynyn(j-k)R(j-k)
  • The Matrix is now in the Toplitz format
  • The Levinson Durbin algorithm applies
  • Implementation complexity O(n2)

30
Auto Correlation Example
Recall f(j,k)?n0,N-1-(j-k) ynyn(j-k)R(j-k) Wh
ere equation j is R(j) ?k1,P
R(j-k)ak Notation j is the row, k is the column
  • Signal , 3, 2, -1, -3, -5, -2, 0, 1, 2, 4, 3,
    1, 0, -1, -2, -4, -1, 0, 3, 1, 0,
  • Frame -5, -2, 0, 1, 2, 4, 3, 1, Number of
    coefficients 3
  • R(0) -5-5 -2-2 00 11 22 44
    33 11 60
  • R(1) -5-2 -20 01 12 24 43 31
    35
  • R(2) -50 -21 02 14 23 41 12
  • R(3) -51 -22 04 13 21 -4

31
LPC Transfer Function
  • Predict the values of the next sample
  • Sn ? k1,p ak sn-k
  • The error signal (en), is the LPC residual
  • ensn- sn sn- ? k1,p ak sn-k
  • Perform a Z-transform of both sides
  • E(z)S(z)- ?k1,pak S(z)z-k
  • Factor S(z)E(z) S(z) 1-?k1,p ak z-k
    S(z)A(z)
  • Compute the transfer function S(z) E(z)/A(z)
  • Conclusion LPC provides us with an all pole
    filter

32
LPC Coding and Synthesis Models
Coding Model
  • Synthesis Model

Conclusion The LPC all-pole model can code and
synthesizes speech
33
The LPC Model
  • The LPC estimate
  • An all-pole IR filter yn Gxn - ?k1,N ak yn
  • The Gxn residual attempts to model the glottal
    source
  • LPC estimates the separation of source from
    filter
  • Challenges (Problems in synthesis)
  • The residual does not accurately model the source
    (glottis)
  • The filter does not model radiation from the lips
  • The model does not account for nasal resonances
  • Possible solutions
  • Additional poles can increase the accuracy to a
    point
  • 1 pole pair for each 1k of sampling rate
  • 2 more pairs can better estimate the source and
    lips
  • Introduce zeroes into the model
  • More robust analysis of the glottal source and
    lip radiation

34
The LPC Spectrum
  1. Perform a LPC analysis
  2. Find the poles
  3. Plot the spectrum aroundthe z-Plane unit circle
  • What do we find concerning the LPC spectrum?
  • Adding poles better matches speech up to about 22
    for a 16k sampling rate
  • The peaks tend to be overly sharp (spiky)
    because small radius changesgreatly alters pole
    skirt widths

35
PARCOR
  • Definition PARtial auto CORrelation
    coefficients
  • LPC coefficients are a1, a2, aP
  • PARCOR coefficients are k1, k2, kP
  • It is easy to compute PARCOR from LPC and visa
    versa
  • Review
  • Rectangular tubes have reflection coefficientsrk
    (Ak1 Ak)/(Ak1 Ak)
  • With algebra the ratio of areas between tubes
    areAk/Ak1 (1-rk)/(1rk)
  • Importance
  • LPC is equivalent to the tube model of the vocal
    tract
  • LogAk1/Ak log(1-ki)/(1ki)
  • We can adjust the LPC parameters based on PARCOR

36
Relationship to Tube Model
  • Given PARCOR, compute LPC
  • Given LPC, compute PARCOR
  • FOR j 1 TO P THEN xP,j aj
  • kp xP,P
  • FOR i P TO 2 STEP -1
  • FOR j 1 TO i-1
  • xi-1,j (xi,j kixi,i-j)/(1-ki2)
  • ki-1 xi-1,i-1
  • FOR i 1 TO P
  • xi,i ki
  • if (igt1) then FOR j 1 TO i-1
  • xi,j xi-1,j kixi-1,i-j
  • FOR j1 TO P THEN aj xP,j

Notes ki are PARCOR coefficients ai are LPC
coefficients xi,j is a temporary work array
37
Line Spectrum Pairs
  • Overview
  • Filter with an additional coefficient
  • Uses the equations on the right
  • The New filter models
  • A completely closed glottis
  • Completely open lips
  • Characteristics
  • Spectrum shown as lines because of infinite
    amplitudes of formants
  • Forces zeros and poles to be interspersed on the
    unit circle
  • Advantages
  • Easier to estimate formants
  • Less sensitive to quantization errors

38
LPC and the Source Signal
  • Experiments show
  • Glottis requires both zeros and poles
  • It requires less poles than the vocal function
  • LPC combines the glottal and vocal tract poles
  • If U(z) I(z)G(z)
  • U(z) source function
  • I(z) Impulse sequence
  • G(z) Glottal filter
  • Transfer function
  • Goal separate glottal poles from the LPC
    predictor

39
Closed Phase Analysis
  • Find Instant of glottal closure
  • Epoch detection algorithm
  • Divide signal
  • closed phase (glottis does not affect LPC
    predictors)
  • open phases (glottis has significant impact)
  • Strategy
  • Compute the G filter over a number of pitch
    periods
  • Perform an inverse filter to obtain the glottal
    signal

40
Open Phase Analysis
  • Problem It is not easy to find the instance of
    glottal closure
  • Goal add extra poles to the model
  • Advantages
  • Human hearing is more sensitive to peaks than to
    valleys
  • The tube model and LPC are all-pole systems
  • Disadvantages
  • Relationships between the poles and the formants
    becomes obscure
  • Extra poles can approximate a zero, but not
    perfectly
  • How can extra poles approximate zeros
  • For example if x,y ? -1, then consider the
    following derivation
  • 1-x 1/(1y)
  • 1 (1-x)(1y) 1 y x xy 1 y(1-x) x
  • Therefore y x/(1-x)
Write a Comment
User Comments (0)
About PowerShow.com