Audio Signal Processing MPEG1 Audio - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Audio Signal Processing MPEG1 Audio

Description:

Develop standards for coded representation of moving pictures and associated audio ... multichannel extension to MPEG-1 audio (MPEG-2 BC, backward compatible) ... – PowerPoint PPT presentation

Number of Views:293

Avg rating:3.0/5.0

Slides: 25

Provided by: ccEeN

Category:

more less

Transcript and Presenter's Notes

Title: Audio Signal Processing MPEG1 Audio

1
Audio Signal Processing-- MPEG-1 Audio

Shyh-Kang Jeng
Department of Electrical Engineering/
Graduate Institute of Communication Engineering

2
History

Moving Picture Expert Group (MPEG)
Established in 1988
Joint Technical Committee (JTC1) ISO, IEC
Develop standards for coded representation of
moving pictures and associated audio
Original work items
MPEG-1, up to 1.5 Mb/s (ISO/IEC 11172)
MPEG-2, up to 10 Mb/s (ISO/IEC 13818)
MPEG-3, up to 40 Mb/s
MPEG-3 was dropped in July 92

3
History (cont.)

MPEG-4
First proposed in 1991
Approved in July 1993
Targets audiovisual coding at very low bit rates
Scalability, 3-D, etc.
ISO/IEC FDIS in 1999 (ISO/IEC 14496)
MPEG-7
Started in the Fall of 1996
Standardize the description of multimedia
contents of multimedia data base search
Scheduled to become ISO/IEC standard in 2001

4
MPEG-1 Audio Goals

Define coded representation of high quality audio
for storage media and a method for decoding high
quality audio signals
Input of the encoder and output of the decoder
are compatible with existing PCM standards such
as CD and DAT
Support one or two main audio channels
Sampling frequencies
32 KHz, 44.1 KHz, 48 KHz

5
MPEG-2 Audio Goals

Define the multichannel extension to MPEG-1 audio
(MPEG-2 BC, backward compatible)
Define an audio coding standard at lower sampling
frequencies (16 KHz, 22.5 KHz, 24 KHz) than
MPEG-1
Define a higher quality multichannel standard
than MPEG-1 extensions (MPEG-2 non-backward
compatible, NBC, later MPEG-2 Advanced Audio
Coding, AAC)

6
MPEG-4 Audio Goals

Provide high coding efficiency
Data rates from 2 kb/s/ch to 64 kb/s/ch
Provide content based interactivity
Flexible access and manipulation (e.g.,
pitch/speed modification)
Support synthetic audio and speech
Structured audio (SA)
Text to speech (TTS)
Provide additional effects
Post-processing, e.g., reverb, 3D

7
MPEG-1 Audio

Three-part compression standard
Compression of synchronized video and audio at a
total data rate of 1.5 Mb/s
Specification
Syntax of the coded bit stream
Decoding process
Compliance tests for assessing the accuracy of
the decoder
Encoder is not specified

8
MPEG-1 Audio (cont.)

Audio channel configurations
Monophonic mode for a single channel
Dual monophonic mode for two independent channels
Stereo mode in which the two channels are sharing
bits
Joint-stereo mode
Data rates
32 kb/s/ch to 224 kb/s/ch (depending on the
sampling rate, compression ratios up to 241)

9
MPEG-1 Audio Layers

Layer I
Simplest configuration, 32 to 224 kb/s/ch
Best for data rates above 128 kb/s/ch
Used in Philipss DCC at 192 kb/s/ch
Layer II
Intermediate complexity, 32 to 384 kb/s/ch
Best for data rates of 128 kb/s/ch
Used in DAB, CD-Interactive, etc.
Layer III
Highest quality and complexity, 32 to 160 kb/s/ch
Best for data rates below 128 kb/s/ch
Used for transmission over ISDN, Internet, etc.

10
MPEG-1 Audio Layers (cont.)

Single-chip, real-time decoders exist for all
three layers
Layers II and III
Perceptually lossless at 128 kb/s/ch (compression
ratio of 61, 16 bits per sample, 48 KHz sampling
rate)
Selected by ITU-R TG 10/2 for broadcast
applications

11
MPEG-1 Encoder Building Blocks
32 sub-bands (Layers I, II) 576 sub-bands (Layer
III)
12
MPEG-1 Decoder Building Blocks
13
MPEG-1 Filter Bank

A 512-Tap OTDAC filter bank is common to all
layers
32 filters subdivide audio signals onto 32 equal
width bands
At low frequencies a single sub-band covers
several critical bands
The critical band with the highest SMR dictates
the number of quantization bits needed for the
entire sub-band
Small error (in absence of quantization, the
total ripple is

14
MPEG-1 Filter Bank (cont.)

Layer III filter bank
Cascaded with a 36-point MDCT for a total of 576
frequency lines in steady state conditions
A 12-point MDCT is used for transients
Frequency resolution at 48 KHz
Layers I and II 750 Hz
Layers III 42 Hz
Time resolution at 48 KHz
Layers I and II 0.66 ms
Layer III 4 ms

15
MPEG-1 Psychoacoustic Models

Two models
Model 1 (less complex)
Model 2 (includes specifics to support Layer II)
Either works for all layers
For low level of compression the psychoacoustic
models can be bypassed
Perform a separate analysis of the input signal
via a Hanning windowed FFT
Model 1 512 points for Layer I, 1024 for Layers
II and III
Model 2 1024 points for all layers

16
MPEG-1 Psychoacoustic Models (cont.)

The analysis needs to be time-aligned with the
main time to frequency mapping of the input
signal
Model 1 provides one evaluation per frame
Model 2 provides two evaluations per frame
The first centered on the first half of the main
path data
The second centered on the second half of the
main path data
The highest SMR is chosen

17
MPEG-1 Psychoacoustic Models (cont.)

Separation of noise like vs. tone like signal
components
Model 1 based on local peaks the remaining
spectral values are added into a non-tonal
component per critical band
Model 2 use data from the two previous windows
to predict the component values for the current
window
Spreading of masking through adjacent bands

18
MPEG-1 Layers Frames
19
Bitstreams for MPEG-1 Audio Layers
20
MPEG-1 Layer I

Frame size
12 samples/subband 32 subband 384 samples
(corresponds to 8 ms at 48 KHz)
1 scale factor and 1 bit allocation for each 12
samples sub-block
Bit allocation
4 bits per allocation (values 0, , 15)
432 128 bits per channel
Scale factors
6 bits per scale factor
Up to 632 192 per channel

21
MPEG-1 Layer II

Frame size
312samples/subband32 subband 1152 samples
(corresponds to 24 ms at 48 KHz)
1 bit allocation and up to 3 scale factors for
each 36 subband samples
Bit allocation
2 to 4 bits per subband (more bits for lower
subband) totaling 26 to 188 bits/frame
Scale factors
Up to 6323 576 bits per channel

22
MPEG-1 Layer III

Frame size
Same as Layer II, 1152 samples
Non-uniform quantization
Raises input to 3/4 power (the decoder
relinearizes the values by raising them to 4/3),
i.e., bigger values are with less accuracy
Scale factors bands cover approximately critical
bandwidths
Dynamic noise allocation

23
MPEG-1 Layer III (cont.)

Entropy coding
At high frequencies a series of zeros is coded by
run length coding
Count 1 region codes values less than 1 in
absolute values with 4-D Huffman coding
Big values region covers the remaining values
by 2-D Huffman coding
Bit reservoir
Uses a 9-bit pointer pointing to the starting
byte for that frame
Buffer length 7680

24
Stereo Coding

Two types
Layers I, II, and III Intensity stereo coding
Layer III Mid/Side (M/S) Coding
Intensity stereo
Upper frequencies subbands are coded with a
single summed signal and a scale factor for each
channel subband is transmitted
M/S stereo
In certain frequency ranges the sum or difference
are coded.