Concepts of Multimedia Processing and Transmission

About This Presentation

Title:

Concepts of Multimedia Processing and Transmission

Description:

Description schemes and description definition language for integrated multimedia search engine ... MP3 = MPEG1 layer 3 audio. Slide: Courtesy, Hung Nguyen. 08 ... – PowerPoint PPT presentation

Number of Views:73

Avg rating:3.0/5.0

Slides: 71

Provided by: teal9

Category:

more less

Transcript and Presenter's Notes

Title: Concepts of Multimedia Processing and Transmission

1
Concepts of Multimedia Processing and Transmission

IT 481, Lecture 7
Dennis McCaughey, Ph.D.
19 March, 2007

2
Direct Video Broadcast (DVB) Systems
Slide Courtesy, Hung Nguyen
3
Processing of The Streams in The Set-Top Box (STB)
Slide Courtesy, Hung Nguyen
4
Multimedia CommunicationsStandards and
Applications
Slide Courtesy, Hung Nguyen
5
Video Coding Standards

ITU H.261 for Video Teleconference (VTC)
ITU H.263 for VTC over POTS
ITU H.262 for VTC over ATM/broadband and digital
TV networks
ISO MPEG-1 for movies on CDROM (VCD)
1.2 Mbps for video coding and 256 Kbps for audio
coding
ISO MPEG-2 for broadcast quality video on DVD
2-15 Mbps allocated for audio and video coding
Low-bit rate telephony over POTS
10 Kbps for video and 5.3 Kbps for audio
Internet and mobile communication MPEG-4
Very Low Bit Rate (VLBR) code to be compatible
with H.263
Multimedia content description interface MPEG-7
Description schemes and description definition
language for integrated multimedia search engine

Slide Courtesy, Hung Nguyen
6
History

H.261
First video coding standard, targeted for video
conferencing over ISDN. Uses block-based hybrid
coding framework with integer-pixel MC
H.263
Improved quality at lower bit rate, to enable
video conferencing/telephony below 54 kbps
(modems, desktop conferencing)
Half-pixel MC and other improvement
MPEG-1 video
Video on CD and video on the Internet (good
quality at 1.5 mbps)
Half-pixel MC and bidirectional MC
MPEG-2 video
SDTV/HDTV/DVD (4-15 mbps)
Extended from MPEG-1, considering interlaced video

Slide Courtesy, Hung Nguyen
7
H.261 Video Coding Standard

For video-conferencing/video phone
Video coding standard in H.320 (VTC over switched
phone network) which is an umbrella
recommendation
Low delay (real-time, interactive)
Slow motion in general
For transmission over ISDN
Fixed bandwidth px64 Kbps, p1,2,,30
Video Format
CIF (352x288, above 128 Kbps) - Common Interface
Format
QCIF (176x144, 64-128 Kbps) - Quarter CIF
420 color format, progressive scan
Published in 1990
Each macroblock can be coded in intra- or
inter-mode
Periodic insertion of intra-mode to eliminate
error propagation due to network impairments
Integer-pixel accuracy motion estimation in
inter-mode

Slide Courtesy, Hung Nguyen
8
H.261 Encoder

F Loop filter P motion estimation and
compensation
Loop filter apply low-pass filter to smooth the
quantization noise in previously reconstructed
frames before motion estimation and compensation

Slide Courtesy, Hung Nguyen
9
Picture Frames - Overview

Three frame types I-Picture (Intra-frame
picture), P-Picture (Inter-frame predicted
picture) and B-Picture (Bi-directional predicted-
interpolated pictures)
I-Picture is being coded by intra-frame coding.
When encoding I-Picture, we only reduce the
spatial redundancy in the picture without
referencing other pictures. The coding process is
much similar to JPEG Standard. So encoding
I-Picture is less complex than P-frame and B-frame

The basic coding unit is a 8 by 8 matrix block. A
macroblock is consists of six block 4 block of
luminance (Y) , one block of Cb chrominance, and
one block of Cr chrominance
Slide Courtesy, Hung Nguyen
10
Frame Types

Intracoded Frames -gt I-Frames
Level of compression is relatively small 101 to
201
Present at regular intervals to limit extent of
errors
Number of frames between I-frames is known as the
Group of pictures (GOP)
101 to 201 compression ratio
Intercoded Frames
Predicted Frames-gt P-Frames
Significant compression level achieved here
Errors are propagated
201 to 301 compression ratio
Bidirectional Frames -gt B-Frames
Highest levels of compression achieved
B-frames are not used for prediction, thus errors
are not propagated
301 to 501 compression ratio

11
Macro Blocks Color Sub-sampling Schemes
A macroblock consists of 4 8x8 pixel blocks
Slide Courtesy, Hung Nguyen
12
Sub-sampling of Chrominance Information

Transforming (R,G,B)-gt(Y,Cb,Cr) provides two
advantages
1)The human visual system (HVS) is more sensitive
to Y component than the Cb or Cr components.
2) Cb and Cr are far less correlated with Y than
R with G, R with Blue and Blue with G, thus
reducing TV transmission bandwidths.
Cb and Cr both require far less bandwidth and can
be sampled more coarsely (Shannon).
By doing so we can reduce data without affecting
visual quality from a personal view.

Slide Courtesy, Hung Nguyen
13
Color Space Conversion

In general , each pixel in a picture consists of
three components R (Red), G (Green), B (Blue).
(R,G,B) must be converted to (Y,Cb,Cr) in MPEG-1
before processing
We can view the color value of each pixel from
RGB color space , or YCbCr color space
Because (Y,Cb,Cr) is less correlated than
(R,G,B), coding using (Y,Cb,Cr) components is
more efficient.
(Y,U,V) can also be used to denote (Y,Cb,Cr),
however it most appropriately represents the
analog TV equivalent

Slide Courtesy, Hung Nguyen
14
RGB Image
15
Compressed Image (QSF24)
16
Luminance Plane (Y)
17
Blue Chrominance Plane (Cb)
18
Red Chrominance Plane (Cr)
19
Red
20
Green
21
Blue
22
DCT (discrete cosine transform)

DCT is used to convert data from the spatial
domain to data in frequency domain. The higher
frequency coefficients can be more coarsely
quantized without a perceived loss of image
quality due to the fact that the HVS is less
sensitive to the higher frequencies and they
contain less energy.
The DCT coefficient at location (0,0) is called
DC coefficient and the other values we call them
AC coefficients. In general, we use large
quantization step in quantizing the higher AC
coefficients. Higher precision is required for
the DC term in order to avoid blocking in the
reconstructed image.
In MPEG-1, we use 88 DCT. By using this
transform we can convert a 8 by 8 pixel block to
another 8 by 8 block. In general most of the
energy(value) is concentrated to the top-left
corner.
After quantizing the transformed matrix, most
data in this matrix may be zero, then using
zig-zag order scan and run length coding can
achieve a high compression ratio.

Slide Courtesy, Hung Nguyen
23
Transform Coding (TC)

Pack the signal energy into as few transform
coefficients as possible
The DCT yields nearly optimal energy
concentration
A 2-dimensional DCT with block size of 8x8 pixels
is commonly used in todays image coder
Transform is followed by quantization and entropy
coding

Slide Courtesy, Hung Nguyen
24
2D DCT and IDCT
u, v, x, y 0, 1,2, .,7
Slide Courtesy, Hung Nguyen
25
DCT Scan Modes

The zigzag scan used in MPEG-1 is suitable for
progressive images where frequency components
have equal importance in each horizontal and
vertical direction. (Frame pictures only)
In MPEG-2, an alternate scan is introduced
because interlaced images tend to have higher
frequency components in the vertical direction.
Thus, the scanning order weighs more on the
higher vertical frequencies than the same
horizontal frequencies. Selection between these
two zigzag scan orders can be made on a picture
basis. (Frame and field pictures allowed)

Slide Courtesy, Hung Nguyen
26
Motion Compensation

Try to match each block in the actual picture to
content in the previous picture. Matching is made
by shifting each of the 8 x 8 blocks of the two
successive pictures pixel by pixel each direction
-gt Motion vector
Subtract the two blocks -gt Difference block
Transmit the motion vector and the difference
block

Slide Courtesy, Hung Nguyen
27
Quantization

In MPEG-1, a matrix called the quantizer ( Qi,j
) defines the quantization step. If ( Xi,j ) is
the DCT matrix with the same size as Qi,j,
Xi,j is divided by Qi,jQSF to obtain the
quantized value matrix Xqi,j . QSF is the
Quantization Scale Factor
Quantization Equation
Xqi,j Round( Xi,j/(Qi,j QSF))
Inverse Quantization (dequantize) is to
reconstruct original value.
Inverse Quantization Equation
X'i,jQSFXqi,jQi,j
The difference between actual value and
reconstructed value from quantized value is
called the quantization error. In general if we
carefully design Qi,j, visual quality will not
be affected.

Slide Courtesy, Hung Nguyen
28
Quantization (contd)
Slide Courtesy, Hung Nguyen
29
Average Distribution of AC Coefficients
30
MPEG (Moving Picture Expert Group)

Established in January 1988
Operated in the framework of the Joint ISO/IEC
Technical Committee
ISO International Organization for
Standardization
IEC International Electro-technical Commission
First meeting was in May 1988 with 25 experts
participated
Grown to 350 experts from 200 companies in some
20 countries
As a rule, MPEG meets in March, July and November
could be more often as needed

Slide Courtesy, Hung Nguyen
31
MPEG-1 Coding of Moving Pictures and Associated
Audio

Request for Proposal (RFP) July 1989
Adopted in 1993
Coding of audiovisual signal at 1.5 Mbps
Audio coding is separate from speech at 256
Kbps/per channel PCM
Five parts systems, video, audio, conformance
testing and software simulation

Slide Courtesy, Hung Nguyen
32
MPEG-1 Overview

In MPEG-1, video is represented as a sequence of
pictures, and each picture is treated as a
two-dimensional array of pixels (pixels)
The color of each pixel is consists of three
components Y (luminance), Cb and Cr (two
chrominance components)
Composite video, aka baseband video or RCA video,
is the analog waveform that conveys the image
data in a conventional National Television
Standards Committee (NTSC) television signal
Composite video contains chrominance (hue and
saturation) and luminance (brightness)
information, along with synchronization and
blanking pulses
In order to achieve high compression ratio,
MPEG-1 must use hybrid coding techniques to
reduce both spatial redundancy and temporal
redundancy

Slide Courtesy, Hung Nguyen
33
MPEG-1 Overview

Audio/video on CD-ROM (1.5 Mbps, CIF 352x240)
Maximum 1.856 mbps, 768x576 pixels
Start late 1988, test in 10/89, Committee Draft
9/90
ISO/IEC 11172-15 (Systems, video, audio,
compliance, software).
Prompted explosion of digital video applications
MPEG1 video CD and downloadable video over
Internet
Software only decoding, made possible by the
introduction of Pentium chips, key to the success
in the commercial market
MPEG-1 Audio
Offers 3 coding options (3 layers), higher layer
have higher coding efficiency with more
computations
MP3 MPEG1 layer 3 audio

Slide Courtesy, Hung Nguyen
34
MPEG-2 vs. MPEG-1

MPEG-2 is a superset of MPEG-1.
Generally, MPEG-1 is used for CD-ROM or Video CD
(VCD) and MPEG-2 is used for broadcast or DVD.
One current difference between MPEG-1 and MPEG-2
is that MPEG-2 has implemented variable bit rate.
MPEG-2 also is whats known as a closed format,
meaning that a license fee must be paid to use
the decoding algorithms, where MPEG-1 can be
implemented free of charge.

Slide Courtesy, Hung Nguyen
35
MPEG2 vs. MPEG1 (contd)

MPEG1 only handles progressive sequences
specified by Source Input Format (SIF).
MPEG2 is targeted primarily at interlaced, as
opposed to progressive for MPEG-1, sequences and
at higher resolution.
Different DCT modes and scanning methods are
developed for interlaced sequences.
More sophisticated motion estimation methods
(frame/field prediction mode) are developed to
improve estimation accuracy for interlaced
sequences.
MPEG2 has various scalability modes.
MPEG2 has various profiles and levels, each
combination targeted for a different application

Slide Courtesy, Hung Nguyen
36
MPEG Encoding

Frame Types
I Intra Encode complete image, similar to JPEG
P Forward Predicted Motion relative to previous
I and Ps
B Backward Predicted Motion relative to previous
future Is Ps

Slide Courtesy, Hung Nguyen
37
Frame Reconstruction (I P Frames Only)

I frame complete image
P frames provide series of updates to most recent
I frame

Slide Courtesy, Hung Nguyen
38
Using Forward-Backward Prediction

If only forward prediction is used, there are
uncovered areas (such as block behind car in
Frame N) for which we may not be able to find a
good match from the previous reference picture
(Frame N-1).
On the other hand, backward prediction can
properly predict these uncovered areas since they
are available in the future reference picture,
i.e. frame N1 in this example.
New objects such as an airplane moving into the
picture, cannot be predicted from the previous
picture, but can be predicted from the future
picture.

Backward Prediction
Forward Prediction
Slide Courtesy, Hung Nguyen
39
Frame Reconstruction (contd)

B frames interpolate between frames represented
by Is Ps

Slide Courtesy, Hung Nguyen
40
Transmission Order of the Frames
Slide Courtesy, Hung Nguyen
41
Intra-frame Encoding Process

Decomposing image to three components in RGB
space
Converting RGB to YCbCr
Dividing image into several macroblocks (each
macroblock has 6 blocks , 4 for Y, 1 for Cb, 1
for Cr)
DCT transformation for each block
After DCT transform , Quantizing each coefficient
Then use zig-zag scan to gather AC value Use
DPCM to encode the DC value, then use VLC to
encode it
Use RLE to encode the AC value, then use VLC to
encode it

42
I-Picture Encoding Flow Chart
Slide Courtesy, Hung Nguyen
43
(No Transcript)
44
Inter-frame Coding

The kind of pictures that are using the
intra-frame coding technique are P pictures and B
pictures
Coding of the P pictures is more complex than for
I pictures, since motion-compensated macroblocks
may be constructed
The difference between the motion compensated
macroblock and the current macroblock is
transformed with a 2-dimensional DCT giving an
array of 8 by 8 transform coefficients.
The coefficients are quantized to produce a set
of quantized coefficients. The quantized
coefficients are then encoded using a run-length
value technique.

Slide Courtesy, Hung Nguyen
45
Inter-frame Encoding Process

Decomposing image to three components in RGB
space
Converting RGB to YCbCr
Perform motion estimation to record the
difference between the encoding frame and the
reference frame stored within the frame buffer
Dividing image into several macroblocks (each
macroblock has 6 blocks , 4 for Y, 1 for Cb, 1
for Cr)
DCT transformation for each block
Quantizing each coefficient
Use zig-zag scan to gather AC value
Reconstruct the frame and store it to the frame
buffer if necessary
DPCM is applied to encode the DC value, then use
VLC to encode it
Use RLE to encode the AC value, then use VLC to
encode it

Slide Courtesy, Hung Nguyen
46
Predictive Coding

Predictive coding is a technique to reduce
statistical redundancy. That is based on the
current value to predict next value and code
their difference (called prediction error). If we
predict next value more precisely, then the
prediction error will be small.
So we can use less bits to encode prediction
error than actual value. In MPEG-1, we use DPCM
(Difference Pulse Coded Modulation) techniques
which is a kind of predictive coding. And it is
only used in DC coefficient

Slide Courtesy, Hung Nguyen
47
Motion Compensation (MC) And Motion Estimation
(ME)

Motion Estimation is to predict a block of
pixels' value in next picture using a block in
current picture. The location difference between
these blocks is called Motion Vector. And the
difference between two blocks is called
prediction error.
In MPEG-1, encoder must calculate the motion
vector and prediction error. When decoder obtain
these information , it can use this information
and current picture to reconstruct the next
picture.
We usually call this process as Motion
Compensation. In general, motion compensation is
the inverse process of motion Estimation

Slide Courtesy, Hung Nguyen
48
Motion Estimation (ME)
Slide Courtesy, Hung Nguyen
49
Motion Compensation (MC)
Slide Courtesy, Hung Nguyen
50
P-Frame Encoding Macroblock Structure
51
P-Frame Encoding Encoding Procedure
52
Example Frame Sequences
I and P Frames Only
I,P and B Frames
53
Coding of P Pictures

As in I pictures, the encoder needs to store the
decoded P pictures since this may be used as the
starting point for motion compensation.
Therefore, the encoder will reconstruct the image
from the quantized coefficients.
In coding P pictures, the encoder has more
decisions to make than in the case of I pictures
Selection of Macroblock Type There are 8 types
of macroblock in P pictures.
Motion Compensation Decision The encoder has an
option on whether to transmit motion vectors or
not for predictive-coded macroblocks.
Intra/Non-intra Coding Decision Coded/Not Coded
DecisionAfter quantization, if all the
coefficients in a block is zero then the block is
not coded.
Quantizer/No Quantizer Decision Quantizer scale
can be altered which will affect the picture
quality.

Slide Courtesy, Hung Nguyen
54
The Inter-frame Encoding Flow Chart
Slide Courtesy, Hung Nguyen
55
Coding of P Pictures (contd)
Slide Courtesy, Hung Nguyen
56
Coding of B Pictures

B pictures are divided into slices in the same
way as I and P pictures. Since B pictures are not
used as a reference for motion compensation,
errors in B pictures are slightly less important
than in I or P pictures. Consequently, it might
be appropriate to use fewer slices for B pictures

57
Decisions to be made when coding the B pictures

Selection of Macroblock Type There are 12 types
of macroblock in B pictures. Compare with P
pictures, there are extra types due to the
introduction of the backward motion vector. If
both the backward and backward motion vectors are
present, then motion-compensated macroblocks are
constructed from both previous and future
pictures, and the result is averaged to form the
"interpolated" motion-compensated macroblock.
Selecting Motion Compensation Mode
Intra/Non-Intra Coding
Coded/Not Coded Decision

Slide Courtesy, Hung Nguyen
58
Coding of B-Pictures
59
Variable Length Coding (VLC)

In MPEG-1, the last of all encoding processes is
to use a Huffman Code to reduce data redundancy
and the first step in decoding process is to
decode VLC to reconstruct image data
Encoding and decoding processes with a Huffman
Code must refer to a code table having two
entries
The original data and the corresponding codeword.
In MPEG-1 standard , multiple code tables are
defined in MPEG-1 Standard 2-ANNEX C. The use of
multiple code tables improves the compression
ratio.

Slide Courtesy, Hung Nguyen
60
MPEG-2 vs. MPEG-1

MPEG-2 is a superset of MPEG-1.
Generally, MPEG-1 is used for CD-ROM or Video CD
(VCD) and MPEG-2 is used for broadcast or DVD.
One current difference between MPEG-1 and MPEG-2
is that MPEG-2 has implemented variable bit rate.
MPEG-2 also is whats known as a closed format,
meaning that a license fee must be paid to use
the decoding algorithms, where MPEG-1 can be
implemented free of charge.

Slide Courtesy, Hung Nguyen
61
MPEG2 vs. MPEG1 (contd)

MPEG1 only handles progressive sequences
specified by Source Input Format (SIF).
MPEG2 is targeted primarily at interlaced, as
opposed to progressive for MPEG-1, sequences and
at higher resolution.
Different DCT modes and scanning methods are
developed for interlaced sequences.
More sophisticated motion estimation methods
(frame/field prediction mode) are developed to
improve estimation accuracy for interlaced
sequences.
MPEG2 has various scalability modes.
MPEG2 has various profiles and levels, each
combination targeted for a different application

Slide Courtesy, Hung Nguyen
62
MPEG2 Overview

A/V broadcast (TV, HDTV, Terrestrial, Cable,
Satellite, High Speed Inter/Intranet) as well as
DVD video
48 Mbps for TV quality, 10-15 Mbps for better
quality at SDTV resolutions (BT.601)
18-45 Mbps for HDTV applications
MPEG-2 video high profile at high level is the
video coding standard used in HDTV
Test in 11/91, Committee Draft 11/93
ISO/IEC 13818-16 (Systems, video, audio,
compliance, software, DSM-CC)
Consist of various profiles and levels
Backward compatible with MPEG1
MPEG-2 Audio
Support 5.1 channel
MPEG2 AAC (Advanced Audio Coding) requires 30
fewer bits than, and not backward compatible
with, MPEG1 layer 3 or MP3

Slide Courtesy, Hung Nguyen
63
Features Supported by the MPEG-2 Algorithm

Different chrominance sampling formats (i.e.,
420, 422, and 444) can be represented
Video in both the progressive and interlaced scan
formats can be encoded
The decoder can use 32 pull down to represent a
24 fps film as 30 fps video
The displayed video can be selected by a movable
pan-scan window within a larger raster
A wide range for picture qualities can be used
Both constant an variable bit rate channels are
supported
ISO/IEC 11172-2 bit streams are decodable
Bit streams for high and low (hardware)
complexity decoders can be generated
Editing of encoded video is supported
The encoded bit stream is resilient to errors

Slide Courtesy, Hung Nguyen
64
MPEG-2 Slice and Macro-block Structure
Slide Courtesy, Hung Nguyen
65
MPEG-2 Bit Stream Syntax
GOF Group of Frames
Slide Courtesy, Hung Nguyen
66
Progressive vs. Interlaced Scanning

In the Interlaced video, each displayed frame
consists of two interlaced fields, with the
scanning lines in Field 1 located between the
lines of Field 2.
On the contrary, the Progressive video has all
the lines of a picture displayed in one frame.
Thus, progressive video requires a higher picture
rate than the frame-rate of an Interlaced video,
to avoid a flickering display.

(a) Progressive Scan (b) Interlaced Scan
Slide Courtesy, Hung Nguyen
67
Disadvantage of Interlaced Scanning

A moving object may appear distorted when two
fields are merged into a frame.
Since a moving ball will be at different
locations in the two fields in the Interlaced
Format, the ball will look distorted when two
fields are put into a frame
Interlaced video also tends to cause horizontal
picture details to dither thus introduces more
high frequency noises

(a) Progressive Scan (b) Interlaced Scan
Slide Courtesy, Hung Nguyen
68
Field vs. Frame DCT

Frame-based DCT Suitable for the blocks in the
background or in a still image having little
motion because these blocks have high correlation
between pixel values from adjacent scan lines.
Field-based DCT Suitable for blocks having
motion because motion causes distortion and may
introduce high-frequency noises into the
interlaced frame.

Slide Courtesy, Hung Nguyen
69
HDTV Standards
Standard Samples/Line Number of Lines Aspect Ratio
Advanced television (ATV) 1280 720 16/9
Digital Video Broadcast (DVB) 1440 1152 4/3
Multiple Sub-Nyquist Sampling Encoding (MUSE) 1920 1035 16/9
ITU-R HDTV 1920 1152 16/9
70
Summary

H.261
First video coding standard, targeted for video
conferencing over ISDN. Uses block-based hybrid
coding framework with integer-pixel MC
H.263
Improved quality at lower bit rate, to enable
video conferencing/telephony below 54 bkps
(modems, desktop conferencing)
Half-pixel MC and other improvement
MPEG-1 video
Video on CD and video on the Internet (good
quality at 1.5 mbps)
Half-pixel MC and bidirectional MC
MPEG-2 video
SDTV/HDTV/DVD (4-15 mbps)
Extended from MPEG-1, considering interlaced video