Title: Multimedia: Representation, Compression and Transmission
1Chapter 2
- Multimedia Representation, Compression and
Transmission
2Contents
- 2. Audio
- 2.1 Human Perception
- 2.2 Audio Bandwidth
- 2.3 Digitization
- 2.4 Audio Compression
- 2.4.1 Differential PCM
- 2.4.2 Adaptive Differential PCM
- 2.4.3 MP3
32.1 Human Perception
- Audio speech, music or synthesized audio.
- Audio signals are analog.
- Audio Perception
- Sound waves generate air pressure oscillations.
- Stimulate human auditory system.
- Transform to neural signals recognizable by the
brain.
42.1 Human Perception
- Features of human auditory system
- 1. Frequency range Human can listen to audio
signals within the typical frequency range 20 --
20,000 Hz.
- 2. Dynamic range It is the range of the softest
to the loudest audio amplitude that human can
hear.
- Different persons may have different frequency
and dynamic ranges.
52.2 Audio Bandwidth
- Period and Frequency
- A periodic signal consists of a continuously
repeated waveform pattern. If its period is T,
its frequency is
-
- Example The following signals are periodic with
period T and frequency
62.2 Audio Bandwidth
72.2 Audio Bandwidth
- Signal Characteristic
- A signal can be decomposed into many sinusoidal
signal components such that different components
- 1. have different frequencies and
- 2. may have different amplitudes.
- (This decomposition can be done by mathematical
techniques called Fourier series and Fourier
transform.)
82.2 Audio Bandwidth
Frequency of 1st component (1st harmonic) f1
1/T Frequency of 2nd component (2nd harmonic) Fr
equency of 3rd component (3rd harmonic)
3 f1 5 f1
92.2 Audio Bandwidth
- Frequency Domain
- After decomposing a signal into its components,
we can analyze the properties of this signal in
the frequency domain.
- Example
- It is difficult to visualize the energy content
of a signal in the time domain, but it is easy to
do so in the frequency domain.
102.2 Audio Bandwidth
- Bandwidth
- Bandwidth is the range of component frequencies.
Example
-
- A signal may have infinite number of components.
- In this case, bandwidth is defined to be the
frequency range over which x (say, 99) of the
energy of the signal lies.
112.2 Audio Bandwidth
- Effect of Limited Bandwidth
- If a network does not have sufficient bandwidth
to send all the frequency components of a signal
- some frequency components are omitted
- the signal is distorted.
- If a network has a larger bandwidth to send more
frequency components of an audio signal
- the audio signal is relatively less distorted.
12(No Transcript)
132.3 Digitization
- Digitization convert an analog audio signal to
digital form via sampling and quantization.
- Sampling
- Sample the magnitude of the audio signal at a
certain rate.
142.3 Digitization
Nyquist Theorem For a signal that has no
frequency components higher than x Hz, its analog
signal can be completely reproduced from its
samples taken at the rate 2 of samples per
second.
Illustration of Nyquist sampling rate
152.3 Digitization
Example Telephone systems transmit voice signal
components with at most 4000 Hz. Sampling rate
should be 8000 samples/sec.
162.3 Digitization
- Quantization
- If N bits are used to represent a sample value,
there are 2N distinct quantization values.
- Each sample value is rounded to the nearest
quantization value, so there may be quantization
error.
172.3 Digitization
If the first sample value is 24.1, it is
quantized to 24 (0001 1000), so the quantization
error is 0.1.
182.3 Digitization
- Pulse Code Modulation (PCM)
- PCM perform sampling and quantization on audio
signals.
- PCM is used in
- Digital telephone networks Use a sampling rate
of 8000 samples per second and 8 bits per sample,
so the data rate is 64 kbps (adopted in ITU-T
G.711). - Audio CD Use a sampling rate of 44100 samples
per second and 16 bits per sample, so the data
rate for stereo audio is 1.411 Mbps.
192.4 Audio Compression
- 2.4.1 Differential PCM
- Differential PCM is a compressed version of PCM.
It has
- lower bit rate but its voice quality may be
poorer.
- Differential PCM
- Voice signal changes slowly compared with the
sampling rate.
- Successive sample values have a small
difference.
- Use fewer bits to encode the difference between
the current sample value and the previous one.
- Lower bit rate, but voice quality may be degraded
when voice amplitude changes abruptly.
202.4 Audio Compression
- Example
- For PCM in digital telephony, sampling rate is
8000 samples/sec and 8 bits are used for each
sample. Data rate is 64 kbps.
- If differential PCM is adopted and 6 bits are
used to encode the difference between successive
sample values, data rate is reduced to 48 kbps.
212.4 Audio Compression
2.4.2 Adaptive Differential PCM
Adaptive differential PCM is an improved version
of differential PCM. Main idea When the voice
amplitude changes steeply for a significant
duration, change to use a larger quantization
step (i.e., a larger difference between
successive quantization values)
222.4 Audio Compression
232.4 Audio Compression
- ITU-T G.721 adopts adaptive differential PCM, a
sampling rate of 8000 samples per second, and 4
bits for encoding the
- difference between successive sample values.
- Bit rate is 32 kbps, but voice quality is only
slightly worse than that in PCM at 64 kbps.
242.4 Audio Compression
- 2.4.3 MP3
- CD audio has a data rate of 1.411 Mbps.
Well-known compression method for CD audio MP3.
- MP3 MPEG audio layer 3. (MPEG specifies three
audio compression layers.)
- MP3 adopts perceptual coding to attain a high
compression ratio and provide very good audio
quality.
252.4 Audio Compression
- Perceptual Coding
- It is based on the science of psychoacoustics,
which studies how people perceive sound.
- It exploits certain flaws in the human auditory
system for compression, such that the compressed
audio sounds about the same to human even though
its signal waveform may become quite different.
262.4 Audio Compression
- 1st Flaw Threshold of Audibility
- When a frequency component is very weak (i.e.,
its power is below a threshold), human cannot
hear it.
- Threshold of audibility (averaged over many
people)
Compression Omit the frequency components whose
power falls below the threshold of audibility.
272.4 Audio Compression
- 2nd Flaw Frequency Masking
- Some sounds can mask other sounds a loud sound
in one frequency band hides a softer sound in
another frequency band.
- Masking effect
Compression Omit the masked frequency components.
282.4 Audio Compression
- 3rd Flaw Temporal Masking
- When a masking sound ends, it takes a short time
before hearing the masked sound.
- Masking effect
Compression If the amplitudes of the masked
frequency components are less than the decay
envelope, omit these components.
292.4 Audio Compression
- To use MP3 for compression, we select two
options
- Sampling rate We can sample the waveform at 32
kHz, 44.1 kHz or 48 kHz on one or two channels.
- Bit rate Typically, we choose the bit rate to be
96 kbps, 128 kbps or 160 kbps.
302.4 Audio Compression
- Main Steps for Compression
- Perform sampling on the audio signal. Divide the
samples into groups with 1152 samples per group.
- Each group is passed through (i) 32 digital
filters to get 32 frequency subbands, and (ii) a
psychoacoustic model to determine the masked
frequencies. - Based on the available "bit budget" (depending on
the chosen bit rate), allocate more bits to the
subbands with larger unmasked spectral power.
- Finally, use Huffman coding to encode the bits
(i.e., assign shorter codewords to numbers that
appear frequently).