Title: Concepts of Multimedia Processing and Transmission
1Concepts of Multimedia Processing and Transmission
- IT 481, Lecture 7
- Dennis McCaughey, Ph.D.
- 19 March, 2007
2Direct Video Broadcast (DVB) Systems
Slide Courtesy, Hung Nguyen
3Processing of The Streams in The Set-Top Box (STB)
Slide Courtesy, Hung Nguyen
4Multimedia CommunicationsStandards and
Applications
Slide Courtesy, Hung Nguyen
5Video Coding Standards
- ITU H.261 for Video Teleconference (VTC)
- ITU H.263 for VTC over POTS
- ITU H.262 for VTC over ATM/broadband and digital
TV networks - ISO MPEG-1 for movies on CDROM (VCD)
- 1.2 Mbps for video coding and 256 Kbps for audio
coding - ISO MPEG-2 for broadcast quality video on DVD
- 2-15 Mbps allocated for audio and video coding
- Low-bit rate telephony over POTS
- 10 Kbps for video and 5.3 Kbps for audio
- Internet and mobile communication MPEG-4
- Very Low Bit Rate (VLBR) code to be compatible
with H.263 - Multimedia content description interface MPEG-7
- Description schemes and description definition
language for integrated multimedia search engine
Slide Courtesy, Hung Nguyen
6History
- H.261
- First video coding standard, targeted for video
conferencing over ISDN. Uses block-based hybrid
coding framework with integer-pixel MC - H.263
- Improved quality at lower bit rate, to enable
video conferencing/telephony below 54 kbps
(modems, desktop conferencing) - Half-pixel MC and other improvement
- MPEG-1 video
- Video on CD and video on the Internet (good
quality at 1.5 mbps) - Half-pixel MC and bidirectional MC
- MPEG-2 video
- SDTV/HDTV/DVD (4-15 mbps)
- Extended from MPEG-1, considering interlaced video
Slide Courtesy, Hung Nguyen
7H.261 Video Coding Standard
- For video-conferencing/video phone
- Video coding standard in H.320 (VTC over switched
phone network) which is an umbrella
recommendation - Low delay (real-time, interactive)
- Slow motion in general
- For transmission over ISDN
- Fixed bandwidth px64 Kbps, p1,2,,30
- Video Format
- CIF (352x288, above 128 Kbps) - Common Interface
Format - QCIF (176x144, 64-128 Kbps) - Quarter CIF
- 420 color format, progressive scan
- Published in 1990
- Each macroblock can be coded in intra- or
inter-mode - Periodic insertion of intra-mode to eliminate
error propagation due to network impairments - Integer-pixel accuracy motion estimation in
inter-mode
Slide Courtesy, Hung Nguyen
8H.261 Encoder
- F Loop filter P motion estimation and
compensation - Loop filter apply low-pass filter to smooth the
quantization noise in previously reconstructed
frames before motion estimation and compensation
Slide Courtesy, Hung Nguyen
9Picture Frames - Overview
- Three frame types I-Picture (Intra-frame
picture), P-Picture (Inter-frame predicted
picture) and B-Picture (Bi-directional predicted-
interpolated pictures) - I-Picture is being coded by intra-frame coding.
When encoding I-Picture, we only reduce the
spatial redundancy in the picture without
referencing other pictures. The coding process is
much similar to JPEG Standard. So encoding
I-Picture is less complex than P-frame and B-frame
The basic coding unit is a 8 by 8 matrix block. A
macroblock is consists of six block 4 block of
luminance (Y) , one block of Cb chrominance, and
one block of Cr chrominance
Slide Courtesy, Hung Nguyen
10Frame Types
- Intracoded Frames -gt I-Frames
- Level of compression is relatively small 101 to
201 - Present at regular intervals to limit extent of
errors - Number of frames between I-frames is known as the
Group of pictures (GOP) - 101 to 201 compression ratio
- Intercoded Frames
- Predicted Frames-gt P-Frames
- Significant compression level achieved here
- Errors are propagated
- 201 to 301 compression ratio
- Bidirectional Frames -gt B-Frames
- Highest levels of compression achieved
- B-frames are not used for prediction, thus errors
are not propagated - 301 to 501 compression ratio
11Macro Blocks Color Sub-sampling Schemes
A macroblock consists of 4 8x8 pixel blocks
Slide Courtesy, Hung Nguyen
12Sub-sampling of Chrominance Information
- Transforming (R,G,B)-gt(Y,Cb,Cr) provides two
advantages - 1)The human visual system (HVS) is more sensitive
to Y component than the Cb or Cr components. - 2) Cb and Cr are far less correlated with Y than
R with G, R with Blue and Blue with G, thus
reducing TV transmission bandwidths. - Cb and Cr both require far less bandwidth and can
be sampled more coarsely (Shannon). - By doing so we can reduce data without affecting
visual quality from a personal view.
Slide Courtesy, Hung Nguyen
13Color Space Conversion
- In general , each pixel in a picture consists of
three components R (Red), G (Green), B (Blue).
(R,G,B) must be converted to (Y,Cb,Cr) in MPEG-1
before processing - We can view the color value of each pixel from
RGB color space , or YCbCr color space - Because (Y,Cb,Cr) is less correlated than
(R,G,B), coding using (Y,Cb,Cr) components is
more efficient. - (Y,U,V) can also be used to denote (Y,Cb,Cr),
however it most appropriately represents the
analog TV equivalent
Slide Courtesy, Hung Nguyen
14RGB Image
15Compressed Image (QSF24)
16Luminance Plane (Y)
17Blue Chrominance Plane (Cb)
18Red Chrominance Plane (Cr)
19Red
20Green
21Blue
22DCT (discrete cosine transform)
- DCT is used to convert data from the spatial
domain to data in frequency domain. The higher
frequency coefficients can be more coarsely
quantized without a perceived loss of image
quality due to the fact that the HVS is less
sensitive to the higher frequencies and they
contain less energy. - The DCT coefficient at location (0,0) is called
DC coefficient and the other values we call them
AC coefficients. In general, we use large
quantization step in quantizing the higher AC
coefficients. Higher precision is required for
the DC term in order to avoid blocking in the
reconstructed image. - In MPEG-1, we use 88 DCT. By using this
transform we can convert a 8 by 8 pixel block to
another 8 by 8 block. In general most of the
energy(value) is concentrated to the top-left
corner. - After quantizing the transformed matrix, most
data in this matrix may be zero, then using
zig-zag order scan and run length coding can
achieve a high compression ratio.
Slide Courtesy, Hung Nguyen
23Transform Coding (TC)
- Pack the signal energy into as few transform
coefficients as possible - The DCT yields nearly optimal energy
concentration - A 2-dimensional DCT with block size of 8x8 pixels
is commonly used in todays image coder - Transform is followed by quantization and entropy
coding
Slide Courtesy, Hung Nguyen
242D DCT and IDCT
u, v, x, y 0, 1,2, .,7
Slide Courtesy, Hung Nguyen
25DCT Scan Modes
- The zigzag scan used in MPEG-1 is suitable for
progressive images where frequency components
have equal importance in each horizontal and
vertical direction. (Frame pictures only) - In MPEG-2, an alternate scan is introduced
because interlaced images tend to have higher
frequency components in the vertical direction.
Thus, the scanning order weighs more on the
higher vertical frequencies than the same
horizontal frequencies. Selection between these
two zigzag scan orders can be made on a picture
basis. (Frame and field pictures allowed)
Slide Courtesy, Hung Nguyen
26Motion Compensation
- Try to match each block in the actual picture to
content in the previous picture. Matching is made
by shifting each of the 8 x 8 blocks of the two
successive pictures pixel by pixel each direction
-gt Motion vector - Subtract the two blocks -gt Difference block
- Transmit the motion vector and the difference
block
Slide Courtesy, Hung Nguyen
27Quantization
- In MPEG-1, a matrix called the quantizer ( Qi,j
) defines the quantization step. If ( Xi,j ) is
the DCT matrix with the same size as Qi,j,
Xi,j is divided by Qi,jQSF to obtain the
quantized value matrix Xqi,j . QSF is the
Quantization Scale Factor - Quantization Equation
- Xqi,j Round( Xi,j/(Qi,j QSF))
- Inverse Quantization (dequantize) is to
reconstruct original value. - Inverse Quantization Equation
- X'i,jQSFXqi,jQi,j
- The difference between actual value and
reconstructed value from quantized value is
called the quantization error. In general if we
carefully design Qi,j, visual quality will not
be affected.
Slide Courtesy, Hung Nguyen
28Quantization (contd)
Slide Courtesy, Hung Nguyen
29Average Distribution of AC Coefficients
30MPEG (Moving Picture Expert Group)
- Established in January 1988
- Operated in the framework of the Joint ISO/IEC
Technical Committee - ISO International Organization for
Standardization - IEC International Electro-technical Commission
- First meeting was in May 1988 with 25 experts
participated - Grown to 350 experts from 200 companies in some
20 countries - As a rule, MPEG meets in March, July and November
could be more often as needed
Slide Courtesy, Hung Nguyen
31MPEG-1 Coding of Moving Pictures and Associated
Audio
- Request for Proposal (RFP) July 1989
- Adopted in 1993
- Coding of audiovisual signal at 1.5 Mbps
- Audio coding is separate from speech at 256
Kbps/per channel PCM - Five parts systems, video, audio, conformance
testing and software simulation
Slide Courtesy, Hung Nguyen
32MPEG-1 Overview
- In MPEG-1, video is represented as a sequence of
pictures, and each picture is treated as a
two-dimensional array of pixels (pixels) - The color of each pixel is consists of three
components Y (luminance), Cb and Cr (two
chrominance components) - Composite video, aka baseband video or RCA video,
is the analog waveform that conveys the image
data in a conventional National Television
Standards Committee (NTSC) television signal - Composite video contains chrominance (hue and
saturation) and luminance (brightness)
information, along with synchronization and
blanking pulses - In order to achieve high compression ratio,
MPEG-1 must use hybrid coding techniques to
reduce both spatial redundancy and temporal
redundancy
Slide Courtesy, Hung Nguyen
33MPEG-1 Overview
- Audio/video on CD-ROM (1.5 Mbps, CIF 352x240)
- Maximum 1.856 mbps, 768x576 pixels
- Start late 1988, test in 10/89, Committee Draft
9/90 - ISO/IEC 11172-15 (Systems, video, audio,
compliance, software). - Prompted explosion of digital video applications
MPEG1 video CD and downloadable video over
Internet - Software only decoding, made possible by the
introduction of Pentium chips, key to the success
in the commercial market - MPEG-1 Audio
- Offers 3 coding options (3 layers), higher layer
have higher coding efficiency with more
computations - MP3 MPEG1 layer 3 audio
Slide Courtesy, Hung Nguyen
34MPEG-2 vs. MPEG-1
- MPEG-2 is a superset of MPEG-1.
- Generally, MPEG-1 is used for CD-ROM or Video CD
(VCD) and MPEG-2 is used for broadcast or DVD. - One current difference between MPEG-1 and MPEG-2
is that MPEG-2 has implemented variable bit rate.
- MPEG-2 also is whats known as a closed format,
meaning that a license fee must be paid to use
the decoding algorithms, where MPEG-1 can be
implemented free of charge.
Slide Courtesy, Hung Nguyen
35MPEG2 vs. MPEG1 (contd)
- MPEG1 only handles progressive sequences
specified by Source Input Format (SIF). - MPEG2 is targeted primarily at interlaced, as
opposed to progressive for MPEG-1, sequences and
at higher resolution. - Different DCT modes and scanning methods are
developed for interlaced sequences. - More sophisticated motion estimation methods
(frame/field prediction mode) are developed to
improve estimation accuracy for interlaced
sequences. - MPEG2 has various scalability modes.
- MPEG2 has various profiles and levels, each
combination targeted for a different application
Slide Courtesy, Hung Nguyen
36MPEG Encoding
- Frame Types
- I Intra Encode complete image, similar to JPEG
- P Forward Predicted Motion relative to previous
I and Ps - B Backward Predicted Motion relative to previous
future Is Ps
Slide Courtesy, Hung Nguyen
37Frame Reconstruction (I P Frames Only)
- I frame complete image
- P frames provide series of updates to most recent
I frame
Slide Courtesy, Hung Nguyen
38Using Forward-Backward Prediction
- If only forward prediction is used, there are
uncovered areas (such as block behind car in
Frame N) for which we may not be able to find a
good match from the previous reference picture
(Frame N-1). - On the other hand, backward prediction can
properly predict these uncovered areas since they
are available in the future reference picture,
i.e. frame N1 in this example. - New objects such as an airplane moving into the
picture, cannot be predicted from the previous
picture, but can be predicted from the future
picture.
Backward Prediction
Forward Prediction
Slide Courtesy, Hung Nguyen
39Frame Reconstruction (contd)
- B frames interpolate between frames represented
by Is Ps
Slide Courtesy, Hung Nguyen
40Transmission Order of the Frames
Slide Courtesy, Hung Nguyen
41Intra-frame Encoding Process
- Decomposing image to three components in RGB
space - Converting RGB to YCbCr
- Dividing image into several macroblocks (each
macroblock has 6 blocks , 4 for Y, 1 for Cb, 1
for Cr) - DCT transformation for each block
- After DCT transform , Quantizing each coefficient
- Then use zig-zag scan to gather AC value Use
DPCM to encode the DC value, then use VLC to
encode it - Use RLE to encode the AC value, then use VLC to
encode it
42I-Picture Encoding Flow Chart
Slide Courtesy, Hung Nguyen
43(No Transcript)
44Inter-frame Coding
- The kind of pictures that are using the
intra-frame coding technique are P pictures and B
pictures - Coding of the P pictures is more complex than for
I pictures, since motion-compensated macroblocks
may be constructed - The difference between the motion compensated
macroblock and the current macroblock is
transformed with a 2-dimensional DCT giving an
array of 8 by 8 transform coefficients. - The coefficients are quantized to produce a set
of quantized coefficients. The quantized
coefficients are then encoded using a run-length
value technique.
Slide Courtesy, Hung Nguyen
45Inter-frame Encoding Process
- Decomposing image to three components in RGB
space - Converting RGB to YCbCr
- Perform motion estimation to record the
difference between the encoding frame and the
reference frame stored within the frame buffer - Dividing image into several macroblocks (each
macroblock has 6 blocks , 4 for Y, 1 for Cb, 1
for Cr) - DCT transformation for each block
- Quantizing each coefficient
- Use zig-zag scan to gather AC value
- Reconstruct the frame and store it to the frame
buffer if necessary - DPCM is applied to encode the DC value, then use
VLC to encode it - Use RLE to encode the AC value, then use VLC to
encode it
Slide Courtesy, Hung Nguyen
46Predictive Coding
- Predictive coding is a technique to reduce
statistical redundancy. That is based on the
current value to predict next value and code
their difference (called prediction error). If we
predict next value more precisely, then the
prediction error will be small. - So we can use less bits to encode prediction
error than actual value. In MPEG-1, we use DPCM
(Difference Pulse Coded Modulation) techniques
which is a kind of predictive coding. And it is
only used in DC coefficient
Slide Courtesy, Hung Nguyen
47Motion Compensation (MC) And Motion Estimation
(ME)
- Motion Estimation is to predict a block of
pixels' value in next picture using a block in
current picture. The location difference between
these blocks is called Motion Vector. And the
difference between two blocks is called
prediction error. - In MPEG-1, encoder must calculate the motion
vector and prediction error. When decoder obtain
these information , it can use this information
and current picture to reconstruct the next
picture. - We usually call this process as Motion
Compensation. In general, motion compensation is
the inverse process of motion Estimation
Slide Courtesy, Hung Nguyen
48Motion Estimation (ME)
Slide Courtesy, Hung Nguyen
49Motion Compensation (MC)
Slide Courtesy, Hung Nguyen
50P-Frame Encoding Macroblock Structure
51P-Frame Encoding Encoding Procedure
52Example Frame Sequences
I and P Frames Only
I,P and B Frames
53Coding of P Pictures
- As in I pictures, the encoder needs to store the
decoded P pictures since this may be used as the
starting point for motion compensation.
Therefore, the encoder will reconstruct the image
from the quantized coefficients. - In coding P pictures, the encoder has more
decisions to make than in the case of I pictures - Selection of Macroblock Type There are 8 types
of macroblock in P pictures. - Motion Compensation Decision The encoder has an
option on whether to transmit motion vectors or
not for predictive-coded macroblocks. - Intra/Non-intra Coding Decision Coded/Not Coded
DecisionAfter quantization, if all the
coefficients in a block is zero then the block is
not coded. - Quantizer/No Quantizer Decision Quantizer scale
can be altered which will affect the picture
quality.
Slide Courtesy, Hung Nguyen
54The Inter-frame Encoding Flow Chart
Slide Courtesy, Hung Nguyen
55Coding of P Pictures (contd)
Slide Courtesy, Hung Nguyen
56Coding of B Pictures
- B pictures are divided into slices in the same
way as I and P pictures. Since B pictures are not
used as a reference for motion compensation,
errors in B pictures are slightly less important
than in I or P pictures. Consequently, it might
be appropriate to use fewer slices for B pictures
57Decisions to be made when coding the B pictures
- Selection of Macroblock Type There are 12 types
of macroblock in B pictures. Compare with P
pictures, there are extra types due to the
introduction of the backward motion vector. If
both the backward and backward motion vectors are
present, then motion-compensated macroblocks are
constructed from both previous and future
pictures, and the result is averaged to form the
"interpolated" motion-compensated macroblock. - Selecting Motion Compensation Mode
- Intra/Non-Intra Coding
- Coded/Not Coded Decision
Slide Courtesy, Hung Nguyen
58Coding of B-Pictures
59Variable Length Coding (VLC)
- In MPEG-1, the last of all encoding processes is
to use a Huffman Code to reduce data redundancy
and the first step in decoding process is to
decode VLC to reconstruct image data - Encoding and decoding processes with a Huffman
Code must refer to a code table having two
entries - The original data and the corresponding codeword.
- In MPEG-1 standard , multiple code tables are
defined in MPEG-1 Standard 2-ANNEX C. The use of
multiple code tables improves the compression
ratio.
Slide Courtesy, Hung Nguyen
60MPEG-2 vs. MPEG-1
- MPEG-2 is a superset of MPEG-1.
- Generally, MPEG-1 is used for CD-ROM or Video CD
(VCD) and MPEG-2 is used for broadcast or DVD. - One current difference between MPEG-1 and MPEG-2
is that MPEG-2 has implemented variable bit rate.
- MPEG-2 also is whats known as a closed format,
meaning that a license fee must be paid to use
the decoding algorithms, where MPEG-1 can be
implemented free of charge.
Slide Courtesy, Hung Nguyen
61MPEG2 vs. MPEG1 (contd)
- MPEG1 only handles progressive sequences
specified by Source Input Format (SIF). - MPEG2 is targeted primarily at interlaced, as
opposed to progressive for MPEG-1, sequences and
at higher resolution. - Different DCT modes and scanning methods are
developed for interlaced sequences. - More sophisticated motion estimation methods
(frame/field prediction mode) are developed to
improve estimation accuracy for interlaced
sequences. - MPEG2 has various scalability modes.
- MPEG2 has various profiles and levels, each
combination targeted for a different application
Slide Courtesy, Hung Nguyen
62MPEG2 Overview
- A/V broadcast (TV, HDTV, Terrestrial, Cable,
Satellite, High Speed Inter/Intranet) as well as
DVD video - 48 Mbps for TV quality, 10-15 Mbps for better
quality at SDTV resolutions (BT.601) - 18-45 Mbps for HDTV applications
- MPEG-2 video high profile at high level is the
video coding standard used in HDTV - Test in 11/91, Committee Draft 11/93
- ISO/IEC 13818-16 (Systems, video, audio,
compliance, software, DSM-CC) - Consist of various profiles and levels
- Backward compatible with MPEG1
- MPEG-2 Audio
- Support 5.1 channel
- MPEG2 AAC (Advanced Audio Coding) requires 30
fewer bits than, and not backward compatible
with, MPEG1 layer 3 or MP3
Slide Courtesy, Hung Nguyen
63Features Supported by the MPEG-2 Algorithm
- Different chrominance sampling formats (i.e.,
420, 422, and 444) can be represented - Video in both the progressive and interlaced scan
formats can be encoded - The decoder can use 32 pull down to represent a
24 fps film as 30 fps video - The displayed video can be selected by a movable
pan-scan window within a larger raster - A wide range for picture qualities can be used
- Both constant an variable bit rate channels are
supported - ISO/IEC 11172-2 bit streams are decodable
- Bit streams for high and low (hardware)
complexity decoders can be generated - Editing of encoded video is supported
- The encoded bit stream is resilient to errors
Slide Courtesy, Hung Nguyen
64MPEG-2 Slice and Macro-block Structure
Slide Courtesy, Hung Nguyen
65MPEG-2 Bit Stream Syntax
GOF Group of Frames
Slide Courtesy, Hung Nguyen
66Progressive vs. Interlaced Scanning
- In the Interlaced video, each displayed frame
consists of two interlaced fields, with the
scanning lines in Field 1 located between the
lines of Field 2. - On the contrary, the Progressive video has all
the lines of a picture displayed in one frame.
Thus, progressive video requires a higher picture
rate than the frame-rate of an Interlaced video,
to avoid a flickering display.
(a) Progressive Scan (b) Interlaced Scan
Slide Courtesy, Hung Nguyen
67Disadvantage of Interlaced Scanning
- A moving object may appear distorted when two
fields are merged into a frame. - Since a moving ball will be at different
locations in the two fields in the Interlaced
Format, the ball will look distorted when two
fields are put into a frame - Interlaced video also tends to cause horizontal
picture details to dither thus introduces more
high frequency noises
(a) Progressive Scan (b) Interlaced Scan
Slide Courtesy, Hung Nguyen
68Field vs. Frame DCT
- Frame-based DCT Suitable for the blocks in the
background or in a still image having little
motion because these blocks have high correlation
between pixel values from adjacent scan lines. - Field-based DCT Suitable for blocks having
motion because motion causes distortion and may
introduce high-frequency noises into the
interlaced frame.
Slide Courtesy, Hung Nguyen
69HDTV Standards
Standard Samples/Line Number of Lines Aspect Ratio
Advanced television (ATV) 1280 720 16/9
Digital Video Broadcast (DVB) 1440 1152 4/3
Multiple Sub-Nyquist Sampling Encoding (MUSE) 1920 1035 16/9
ITU-R HDTV 1920 1152 16/9
70Summary
- H.261
- First video coding standard, targeted for video
conferencing over ISDN. Uses block-based hybrid
coding framework with integer-pixel MC - H.263
- Improved quality at lower bit rate, to enable
video conferencing/telephony below 54 bkps
(modems, desktop conferencing) - Half-pixel MC and other improvement
- MPEG-1 video
- Video on CD and video on the Internet (good
quality at 1.5 mbps) - Half-pixel MC and bidirectional MC
- MPEG-2 video
- SDTV/HDTV/DVD (4-15 mbps)
- Extended from MPEG-1, considering interlaced video
Slide Courtesy, Hung Nguyen