Title: CHAPTER 9 H.264MPEG4 Part 10
1CHAPTER 9H.264/MPEG-4 Part 10
2H.264 The Emerging Video Coding Standard
3Basic Macroblock Coding Structure
4Profiles and Levels
- H. 264 defines a set of three Profiles
- Baseline Profile
- Main Profile
- Extended Profile
- Baseline Profile
- Support intra and inter-coding (using I-slices
and P-slices) and entropy coding with
context-adaptive variable-length codes (CAVLC). - Potential applications of Baseline Profile
include videotelephony, videoconferencing and
wireless communications.
5Profiles and Levels (Cont.)
- Main Profile
- Support for interlaced video, inter-coding using
B-slices, inter coding using weighted prediction
and entropy coding using context-based arithmetic
coding (CABAC). - Potential applications include television
broadcasting and video storage - Extended Profile
- Does not support interlaced video or CABAC but
add modes to enable efficient switching between
coded bitstreams (SP- and SI-slices) and improved
error resilience (data partitioning) - Streaming media applications
6Profiles and Levels (Cont.)
B Slices
Interlace
SP and SI Slices
Weighted Prediction
CABAC
Data Partitioning
Main Profile
I Slices
P Slices
Extended Profile
CAVLC
Slice Groups And ASO
Redundant Slices
Baseline Profile
7Video Format
- H.264 supports coding and decoding of 420
progressive or interlaced video. - An H. 264 encoder may use one or two of a number
of previously encoded pictures as a reference for
motion-compensated prediction of each inter coded
macroblock or macroblock partition. - Motion Compensation
- Seven kinds of block sizes 16 ? 16, 16 ? 8, 8 ?
16, 8 ? 8, 8 ? 4, 4 ? 8, 4 ? 4 - Multiple reference pictures
8Coded Data Format
- H. 264 makes a distinction between a Video Coding
Layer (VCL) and a Network Abstraction Layer
(NAL). - The output of the encoding process is VCL data
which are mapped to NAL units prior to
transmission or storage. - A coded video sequence is represented by a
sequence of NAL units that can be transmitted
over a packet-based network or a bitstream
transmission link or stored in a file.
9Multiple Reference Frames
10Variable Block Size
11Example of Variable Block Size
12Motion Vectors
- Each Partition or sub-macroblock partition in an
inter-coded macroblock is predicted from an area
of the same size in a reference picture. - The offset between the two areas (the motion
vector) has quarter-sample resolution for the
luma component and one-eighth-sample resolution
for the chroma components. - If one or both vector components are fractional
values, the prediction samples are generated by
interpolation between adjacent samples in the
reference frame.
13Interpolation of Luma Half-Pixel Positions
14Interpolation of Luma Quarter-Pel Positions
- Once all the half-pel samples are available, the
samples at quarter-pex positions are produced by
linear interpolation. - a round((G b)/2)
15Interpolation of Chroma Eighth-Sample Positions
- Quarter-pex resolution motion vectors in the luma
component require eighth-pel resolution vectors
in the chroma components (assuming 420
sampling).
a round(8 - dx)8 - dy)A dx(8 - dy)B
(8 - dx)dyC dxDyD/64) In this
example Dx is 2, and dy is 3, so that a
round(30A 10B 18C 6D)/64
B
A
dy
dx
8 - dx
A
8 - dy
C
D
16Motion Vector Prediction
- Encoding a motion vector for each partition can
cost a significant number of bits, especially if
the small partition sizes are chosen. - Motions vectors for neighbouring partitions are
often highly correlated and so each motion vector
is predicted from vectors of nearby, previously
coded partitions. - A predicted vector, MVp, is formed based on
previously calculated motion vectors and MVD, the
difference between the current vector and the
predicted vector, is encoded and transmitted.
17Current and Neighbouring Partitions
C 16 ? 8
B 4 ? 8
A 8 ? 4
E 16 ? 16
18Intra Prediction
- In intra mode a prediction block P is formed
based on previously encoded and reconstructed
blocks and is subtracted from the current block
prior to encoding. - For the luma samples, P is formed for each 4 ? 4
block or a 16 ? 16 block. - There are a total of nine optional prediction
modes for each 4 ? 4 luma block, four modes for a
16 ? 16 block. - One macroblock mode for chroma
- Similar to intra 16 ? 16.
194 ? 4 Luma Prediction Modes
204 ? 4 Luma Prediction Modes (Cont.)
- Mode 2 DC Prediction
- Mode 0-8 (except 2) direction prediction
21DC Prediction
22Mode 0 and Mode 1
23Mode 3 and Mode 4
24Mode 5 and Mode 6
25Mode 7 and Mode 8
2616 ? 16 Luma Prediction Modes
27Intra 16 ? 16 Plane Prediction
28Bitrate Saving (Real Time)
29Bitrate Saving (Storage)
30Objective Evaluation
31Objective Evaluation (Cont.)
32Subjective Evaluation
33Computation Complexity of H. 264
34Analysis, Fast Algorithm, and VLSI Architecture
Design for H.264/AVC Intra Frame Coder
- Yu-Wen Huang, Bing-Yu Hsieh, Tung-Chien Chen, and
Liang-Gee Chen, Fellow, IEEE - IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR
VIDEO TECHNOLOGY, VOL. 15, NO. 3, MARCH 2005
35Core Techniques Adopted in Different Image Coding
Standards
36Comparisons Between Different Image Coding
Standards
37Intraprediction Modes
- In H.264/AVC intra coding, two intra-macroblock
modes are supported. One is intra 4 4 prediction
mode, denoted as I4MB, and the other is intra 16
? 16 prediction mode, denoted as I16MB.
I4MB left illustration of nine 44-luma
prediction modes right examples of real images.
3816 ?16 Luma Prediction Modes
39Transform and Quantization
- H.264/AVC still adopts transform coding for the
prediction error signals. The block size is
chosen as 4 ? 4 instead of 8 ? 8. - The 4 ? 4 DCT is an integer approximation of
original floating point DCT transform. An
additional 2 ? 2 transform is applied to the four
dc-coefficients of each chroma component. - If a-luma macroblock is coded as I16MB, two
dimensional (2-D) 4 ? 4 Hadamard transform will
be further applied on the 16 dc coefficients
of-luma blocks.
40Transform and Quantization (Cont.)
- H.264/AVC uses scalar quantization without
dead-zone. The quantization parameter QP ranges
from 0 to 51. - An increase in QP approximately reduces 12.5 of
the bit-rate. - The quantized transform coefficients of a block
are scanned in a - zigzag fashion. The decoding process can be
realized with only - additions and shifting operations in 16-b
arithmetic. For more - details, please refer to 13.
- In H.264/AVC, two entropy coding schemes are
supported. - One is VLC-based coding, and the other is
arithmetic coding