Multimedia: Representation, Compression and Transmission - PowerPoint PPT Presentation

1 / 30

About This Presentation

Title:

Multimedia: Representation, Compression and Transmission

Description:

MP3 adopts perceptual coding to attain a high compression ratio and provide very ... To use MP3 for compression, we select two options: ... – PowerPoint PPT presentation

Number of Views:48

Avg rating:3.0/5.0

Slides: 31

Provided by: compHk

Category:

more less

Transcript and Presenter's Notes

Title: Multimedia: Representation, Compression and Transmission

1
Chapter 2

Multimedia Representation, Compression and
Transmission

2
Contents

2. Audio
2.1 Human Perception
2.2 Audio Bandwidth
2.3 Digitization
2.4 Audio Compression
2.4.1 Differential PCM
2.4.2 Adaptive Differential PCM
2.4.3 MP3

3
2.1 Human Perception

Audio speech, music or synthesized audio.
Audio signals are analog.
Audio Perception
Sound waves generate air pressure oscillations.
Stimulate human auditory system.
Transform to neural signals recognizable by the
brain.

4
2.1 Human Perception

Features of human auditory system
1. Frequency range Human can listen to audio
signals within the typical frequency range 20 --
20,000 Hz.
2. Dynamic range It is the range of the softest
to the loudest audio amplitude that human can
hear.
Different persons may have different frequency
and dynamic ranges.

5
2.2 Audio Bandwidth

Period and Frequency
A periodic signal consists of a continuously
repeated waveform pattern. If its period is T,
its frequency is
Example The following signals are periodic with
period T and frequency

6
2.2 Audio Bandwidth
7
2.2 Audio Bandwidth

Signal Characteristic
A signal can be decomposed into many sinusoidal
signal components such that different components
1. have different frequencies and
2. may have different amplitudes.
(This decomposition can be done by mathematical
techniques called Fourier series and Fourier
transform.)

8
2.2 Audio Bandwidth
Frequency of 1st component (1st harmonic) f1
1/T Frequency of 2nd component (2nd harmonic) Fr
equency of 3rd component (3rd harmonic)
3 f1 5 f1
9
2.2 Audio Bandwidth

Frequency Domain
After decomposing a signal into its components,
we can analyze the properties of this signal in
the frequency domain.
Example
It is difficult to visualize the energy content
of a signal in the time domain, but it is easy to
do so in the frequency domain.

10
2.2 Audio Bandwidth

Bandwidth
Bandwidth is the range of component frequencies.
Example
A signal may have infinite number of components.
In this case, bandwidth is defined to be the
frequency range over which x (say, 99) of the
energy of the signal lies.

11
2.2 Audio Bandwidth

Effect of Limited Bandwidth
If a network does not have sufficient bandwidth
to send all the frequency components of a signal
some frequency components are omitted
the signal is distorted.
If a network has a larger bandwidth to send more
frequency components of an audio signal
the audio signal is relatively less distorted.

12
(No Transcript)
13
2.3 Digitization

Digitization convert an analog audio signal to
digital form via sampling and quantization.
Sampling
Sample the magnitude of the audio signal at a
certain rate.

14
2.3 Digitization
Nyquist Theorem For a signal that has no
frequency components higher than x Hz, its analog
signal can be completely reproduced from its
samples taken at the rate 2 of samples per
second.
Illustration of Nyquist sampling rate
15
2.3 Digitization
Example Telephone systems transmit voice signal
components with at most 4000 Hz. Sampling rate
should be 8000 samples/sec.
16
2.3 Digitization

Quantization
If N bits are used to represent a sample value,
there are 2N distinct quantization values.
Each sample value is rounded to the nearest
quantization value, so there may be quantization
error.

17
2.3 Digitization
If the first sample value is 24.1, it is
quantized to 24 (0001 1000), so the quantization
error is 0.1.
18
2.3 Digitization

Pulse Code Modulation (PCM)
PCM perform sampling and quantization on audio
signals.
PCM is used in
Digital telephone networks Use a sampling rate
of 8000 samples per second and 8 bits per sample,
so the data rate is 64 kbps (adopted in ITU-T
G.711).
Audio CD Use a sampling rate of 44100 samples
per second and 16 bits per sample, so the data
rate for stereo audio is 1.411 Mbps.

19
2.4 Audio Compression

2.4.1 Differential PCM
Differential PCM is a compressed version of PCM.
It has
lower bit rate but its voice quality may be
poorer.
Differential PCM
Voice signal changes slowly compared with the
sampling rate.
Successive sample values have a small
difference.
Use fewer bits to encode the difference between
the current sample value and the previous one.
Lower bit rate, but voice quality may be degraded
when voice amplitude changes abruptly.

20
2.4 Audio Compression

Example
For PCM in digital telephony, sampling rate is
8000 samples/sec and 8 bits are used for each
sample. Data rate is 64 kbps.
If differential PCM is adopted and 6 bits are
used to encode the difference between successive
sample values, data rate is reduced to 48 kbps.

21
2.4 Audio Compression
2.4.2 Adaptive Differential PCM
Adaptive differential PCM is an improved version
of differential PCM. Main idea When the voice
amplitude changes steeply for a significant
duration, change to use a larger quantization
step (i.e., a larger difference between
successive quantization values)
22
2.4 Audio Compression
23
2.4 Audio Compression

ITU-T G.721 adopts adaptive differential PCM, a
sampling rate of 8000 samples per second, and 4
bits for encoding the
difference between successive sample values.
Bit rate is 32 kbps, but voice quality is only
slightly worse than that in PCM at 64 kbps.

24
2.4 Audio Compression

2.4.3 MP3
CD audio has a data rate of 1.411 Mbps.
Well-known compression method for CD audio MP3.
MP3 MPEG audio layer 3. (MPEG specifies three
audio compression layers.)
MP3 adopts perceptual coding to attain a high
compression ratio and provide very good audio
quality.

25
2.4 Audio Compression

Perceptual Coding
It is based on the science of psychoacoustics,
which studies how people perceive sound.
It exploits certain flaws in the human auditory
system for compression, such that the compressed
audio sounds about the same to human even though
its signal waveform may become quite different.

26
2.4 Audio Compression

1st Flaw Threshold of Audibility
When a frequency component is very weak (i.e.,
its power is below a threshold), human cannot
hear it.
Threshold of audibility (averaged over many
people)

Compression Omit the frequency components whose
power falls below the threshold of audibility.
27
2.4 Audio Compression

2nd Flaw Frequency Masking
Some sounds can mask other sounds a loud sound
in one frequency band hides a softer sound in
another frequency band.
Masking effect

Compression Omit the masked frequency components.
28
2.4 Audio Compression

3rd Flaw Temporal Masking
When a masking sound ends, it takes a short time
before hearing the masked sound.
Masking effect

Compression If the amplitudes of the masked
frequency components are less than the decay
envelope, omit these components.
29
2.4 Audio Compression

To use MP3 for compression, we select two
options
Sampling rate We can sample the waveform at 32
kHz, 44.1 kHz or 48 kHz on one or two channels.
Bit rate Typically, we choose the bit rate to be
96 kbps, 128 kbps or 160 kbps.

30
2.4 Audio Compression

Main Steps for Compression
Perform sampling on the audio signal. Divide the
samples into groups with 1152 samples per group.
Each group is passed through (i) 32 digital
filters to get 32 frequency subbands, and (ii) a
psychoacoustic model to determine the masked
frequencies.
Based on the available "bit budget" (depending on
the chosen bit rate), allocate more bits to the
subbands with larger unmasked spectral power.
Finally, use Huffman coding to encode the bits
(i.e., assign shorter codewords to numbers that
appear frequently).