Speech Analysis - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Speech Analysis

Description:

Speech spectrogram. Derived formant tracks. A/D conversion ... Wide-band and narrow-band spectrograms. Mel-frequency filter bank ... – PowerPoint PPT presentation

Number of Views:1331
Avg rating:3.0/5.0
Slides: 31
Provided by: philipj5
Category:

less

Transcript and Presenter's Notes

Title: Speech Analysis


1
Speech Analysis
EEM.ssr Speaker Speech Recognition
  • by
  • Dr Philip Jackson

lecturer in speech audio Centre for Vision,
Speech Signal Processing, Department of
Electronic Engineering.
http//www.ee.surrey.ac.uk/Teaching/Courses/eem.ss
r
2
Whats the point of analysing speech?
  • Speech analysis, or speech processing, transforms
    a speech waveform into a representation that is
    suitable for extracting its features
  • Human visual inspection
  • e.g., by a speech scientist, speech therapist, or
    forensic phonetician
  • Computer analysis
  • e.g., for automatic speech recognition, speaker
    recognition, or paralinguistic processing

3
And what does that mean?
  • Suitable could be
  • amenable to human visual inspection
  • using a small number of bits per second (for
    transmission or storage)
  • compatible with the models in a speech recognizer
  • in line with our understanding of human auditory
    processing

4
Cochlear section
  • Cochlea, or inner ear, has a spiral form
  • vestibular canal
  • basilar membrane
  • tympanic canal
  • auditory nerve

5
Response of the cochlea
6
Basilar membrane
  • sound enters at the stapes
  • travels along the basilar membrane
  • vibrates at matching position
  • activates auditory nerves

7
Short-term spectrum
  • Represents the distribution of power with respect
    to frequency over a time interval centred at
    time, t, like a vertical slice through the
    spectrogram
  • From a source-filter perspective, it gives us
    some information about the shape of the vocal
    tract at time t
  • From a human speech perception view, it provides
    similar information to that sent from the cochlea
    to the auditory nerve

8
Computing the ST-spectrum
  • Analogue-to-Digital (A/D) Conversion
  • convert the analogue signal from the microphone
    into a digital signal
  • Windowing
  • select a short section of speech, centred at time
    t, and smooth
  • Frequency analysis
  • estimate the distribution of power with respect
    to frequency

9
Waterfall display
10
Speech spectrogram
11
Derived formant tracks
12
A/D conversion
  • Sampling measures the speech signal at regular
    intervals, n
  • Quantisation encodes the signal xn with a
    discrete value

x
n
n
13
Sample rate
  • Nyquists theorem for a signal band-limited to B
    Hz, then a rate of 2B samples per second is
    needed to encode the signal faithfully
  • Human ear sensitive up 20 kHz (hence 44 kHz rate
    for CDs)
  • But for speech
  • high-quality needs 10 kHz bandwidth, i.e., 20 kHz
    sample rate
  • bandwidth can be reduced to 4 kHz (8 kHz rate),
    for telephone quality
  • e.g., 8-bit PCM at 8kHz 64 kbps

14
CD-quality fS 44 kHz
15
High-quality speech fS 20 kHz
16
Telephone speech fS 8 kHz
17
Window functions
18
Frequency analysis
  • Discrete Fourier Transform (DFT) is applied to
    the windowed digital waveform x(n)n1,,N.
  • With an N-sample window, an N-point complex
    spectrum is obtained X(k) k1,,N.
  • The modulus squared gives the power spectrum,
    X2(k)
  • The logarithm gives the log-power spectrum,
    logX2(k)

19
Discrete Fourier transform
  • over a finite period of time
  • sampled at regular intervals

Forward transform
Inverse transform
20
Frequency analysis
  • Alternative methods include
  • filter-bank analysis (based on a set of band-pass
    filters)
  • approximations of the spectral envelope, e.g.,
    Linear predictive coding (LPC)

21
Time-frequency resolution 1
  • If the window is long then
  • the time resolution is poor
  • the number of points, N, is large
  • there are N points in the spectrum
  • so there is fine frequency resolution
  • narrow-band frequency analysis, or narrow-band
    spectrum

22
Narrow-band spectrum
23
Time-frequency resolution 2
  • If the window is short then
  • the time resolution is good
  • the number of points, N, is small
  • there are N points in the spectrum
  • so the frequency resolution is coarse
  • broad-band frequency analysis, or broad-band
    spectrum

24
Wide-band spectrum
25
Time-frequency resolution 3
  • In summary
  • long window, narrow-band spectrum
  • short window, broad-band spectrum.
  • Indeed, the bandwidth-time product cannot exceed
    a half
  • where and is the sample
    rate

26
Wide-band and narrow-band spectrograms
27
Mel-frequency filter bank
  • Allocation of DFT bins to filters, spaced
    according to the Mel scale

28
The real cepstrum
  • Procedure for computing cepstral coefficients
    from the magnitude spectrum

29
Mel-frequency cepstrum
  • Procedure for computing cepstral coefficients,
    based on the output from Mel-frequency binning

30
Summary of Fourier analysis
  • Fourier leads to frequency representation
  • good for visualisation
  • is reversible
  • continuous and discrete time forms
  • Wide- and narrow-band spectra obtained by
    adjusting frame size
  • Windowing
  • reduces spectral smearing
  • allows for adaptation
Write a Comment
User Comments (0)
About PowerShow.com