Title: YaoChung Lin
1Introduction to H.264/SVC Differences,
Possibilities, and Limits
- Yao-Chung Lin
- Image, Video, and Multimedia Systems Group
- Information Systems Laboratory
- May 10 2006
2Scalable Video Coding
- A research topic over 20 years
- Single bitstream serves diversified clients
- Display resolutions (QCIF, CIF, , HDTV)
- Frame rates (15Hz, 30Hz, )
- Bit rates/Qualities
- Developing Standard
- October 2003, MPEG Call for Proposal
- March 2004, 14 proposals submitted and evaluated
- 12 proposals are wavelet-based
- 2 proposals are extension of H.264/AVC
- October 2004, MPEG selected HHI proposal as
starting point for H.264/ MPEG-4 AVC Amd.1 - 2007, final draft will be released
3Current Draft
- Based on H.264 main profile
- MCTF/Hierarchical B-picture (MCTF w/o update
step) for temporal scalability - Layered pyramid prediction structure for spatial
scalability - Layered, sub bit-plane, and (run,level) coding
for SNR scalability
4H.264 Profiles
5Overall Architecture of SVC
A two layer example
6Outline
- Introduction
- Scalabilities
- Temporal Scalability
- Spatial Scalability
- SNR (Quality) Scalability
- Other Details
- Simulation Results
- Conclusion Discussion
7Temporal Scalability
- Group of picture (GOP)
- Concepts of motion compensated temporal filtering
(MCTF) - Hierarchical B-picture
8Group of Picture
- Instantaneous decoding refreshment (IDR) pictures
- Intra coded picture
- Also a key picture
- A GOP with only one picture
- Provide random access ability
- Key pictures
- The last picture in a GOP
- Intra coded
- Inter coded by previous key picture
- Provide lowest temporal resolution
- Non-key picture
- Hierarchically predicted B pictures
- High pass signal of MCTF
- Provide various temporal resolutions
- Note Reference frame number can not be greater
than 16
9Group of Pictures
An example of a group of picture Dyadic, 4
temporal levels
ITU 2006 January, R202
10Concepts of MCTF
- Based on lifting scheme
- Insures perfect reconstruction
- Even if non-linear operations are used
- Open loop
- Non-recursive temporal decomposition
- Prevent drift error
- Improves efficient scalable coding, especially
with FGS
11Lifting Scheme
r reference index m motion vector
Similar to P-picture
Similar to B-picture
ITU, 2006 January, R202
12Motion Modes
- Variable block-size inter modes from 16x16 to 4x4
- Intra modes 16x16, 8x8, 4x4
- Direct mode 16x16, 8x8
13Decomposition Structure
HHI Webpage Scalable Extension of H.264/AVC
14Decomposition Structure
- A dyadic decomposition structure for 2N-1 frames
delay, where N temporal decomposition level - Update steps do not cross the GOP border
HHI Webpage Scalable Extension of H.264/AVC
15Low Delay Support
ITU, 2006 January, R202
16Removal of update step
- Introduce high complexity to decoder
- Derivation of the motion information for update
step - Smaller block sizes
- 9-bit residual motion compensation
- Provide insignificant coding efficiency than that
of closed-loop coding with hierarchical B picture
(HB) - Rate-distortion performance of closed-loop coding
with HB is higher or similar to that of
MCTF-based coding for all test sequences - Except City sequence which has 0.5 dB gain
- After temporal pre-filtering the sequence, the
MCTF gain becomes insignificant
ITU, 2005 July, P059
17Two Closed Loops
FGS Layer
ITU, 2005 July, P059
18Spatial Scalability
- Layered pyramid prediction structure
- Inter-layer intra texture prediction
- Inter-layer motion prediction
- Inter-layer residual prediction
- Extended Spatial Scalability
- Cropping
- Generic upsampling (non-dyadic spatial resampling)
19Layered Pyramid Prediction Structure
- Same concepts used in H.262/MPEG-2, H.263, MPEG-4
with additional inter-layer prediction - Each spatial resolution is coded as a new layer
with texture and motion refinement - Same mechanism for coarse grain SNR scalability
(Spatial downsampling ratio1)
20Inheritance of modes
Previous Spatial Layer
Current Layer
For spatial scaling ratio 2
21Inter-layer Intra Texture Prediction
- Unrestricted inter-layer intra texture prediction
- Decode and predict from all lower layer in the
bitstream - Not supported in the standard
- Constrained inter-layer intra texture prediction
- For MBs in non-key pictures
- The co-located block in the previous layer are
intra coded - Not supported in the standard
- Constrained inter-layer intra texture prediction
for single-loop decoding - For MBs in all pictures (including key pictures)
- The co-located block in the previous layer are
intra coded - Allow decoding (motion compensation) only current
layer - Supported by the current SVC draft
22Generation of Inter-layer Texture Prediction
- Directly de-block filtering
- 4-sample border extension
- Interpolation
- 2x Half-pel interpolation filter of AVC
- Otherwise quarter-pel interpolation filter
Schwarz, ICIP 2005
23Inter-layer Motion Prediction
- Intra base layer
- If previous layer is inter, use scaled
partitioning and motion vectors of base layer - If previous layer is intra, predict from previous
layer - Quarter pel refinement
- Only for reduced spatial resolution
- Refine the scaled motion vector of previous layer
by 1, 0, and -1 in quarter-sample precision - Send the refinement
- None
- Motion vector prediction from neighbor blocks
- Motion vector prediction from previous layer
24Inter-layer Residual Prediction
- Predict the residual from previous layer residual
- Upsample the residual
- 2x separable bi-linear filter 1,1/2
- Otherwise quarter-pel interpolation
- Helpful while the motion information is unchanged
or slightly changed from previous layer
25SNR Scalability
- Coarse grain scalability (CGS)
- Layered coding
- The same mechanism as spatial scalability
- Re-quantize the coefficients with finer step
- Fine grain scalability
- Sub-bitplane arithmetic coding
- Re-quantize the coefficients with finer step
- Provide a continuous refinement from a quality
base layer
26Coarse Grain Scalability
- Same mechanism as spatial scalability
- Except no upsampling
- Provide discrete quality refinement
- Close to single layer RD performance, if dQP gt 6
27Fine Grain SNR Scalability
- Represent the residual between the original
prediction error and base layer representation - Quantized to a bisection step size (dQP6)
- Coded in transform domain for single inverse
transform at decoder - Adaptive references for FGS (AR-FGS) provide
leaky prediction attenuating drift error
28Illustration of AR-FGS
Zero Coef. Block
ITU, 2006 Jan. R202
29Outline
- Introduction
- Scalabilities
- Temporal Scalability
- Spatial Scalability
- SNR (Quality) Scalability
- Other details
- Simulation Results
- Discussion
30Other Details
- Fidelity resolution extension (FRExt)
- Support 8x8 Transform (High Profile)
- Increase coding efficiency especially for
high-resolution source - Motion search block segment size down to 8x8 only
- Weighted prediction
- Scale the reference pictures for prediction
- Find the weights at encoder
- Explicitly send in syntax
- Implicitly derive from temporal distance (an
option for B-picture)
31Other Details
- FGS motion
- Progressive refinement slice (FGS slice) contains
motion data - Provide better prediction
- Adaptive GOP Structure (AGS)
- Divide a GOP into several sub GOPs by appropriate
mode decision - Decreasing the distance between two low-pass
pictures - 0.62 dB gain
- Detail in ITU O018
- Loss Aware rate distortion optimization
- The mode/parameter decision consider the packet
loss - Detail in ITU P057
32JSVM
- Written in C
- Accessing from CVS
- Current version 5.2
- Last Update May 2, 2006
33Simulation Results
- Temporal Scalability
- GOP sizes ITU, 2005 July, P014
- Open loop MCTF vs. closed loop HB ITU, 2005
July, P059 - Spatial
- Given the same base layer
- Exam the inter-layer prediction
- SNR
- CGS, DQP 2 or 6
- FGS
- Key pictures predict from base representation
- FGS motion optimized at 1/3 bit rate
- Open loop MCTF helpful ? ITU, P059
34GOP Sizes
35GOP Sizes
36Open Loop vs. Closed Loop
37Open Loop vs. Close Loop
38Summary of Temporal Scalability Features
- Hierarchical B pictures
- B pictures gives 0.51 dB (IPP -gt IBBPBBP)
- Hierarchical prediction gives additional 0.5 1
dB - MCTF
- Only CITY has 0.5 dB gain compared to
closed-loop HB - The gain is diminished by encoder MCTF
pre-filtering - Improvement comes from hierarchical prediction
structure
39Simulation Results
- Temporal Scalability
- GOP sizes ITU, 2005 July, P014
- Open loop MCTF vs. closed loop HB ITU, 2005
July, P059 - Spatial
- Given the same base layer, exam the inter-layer
prediction - Multiple-loop decoding vs. single-loop decoding
(constrained inter-layer prediction) ITU, O074 - SNR
- CGS, DQP 2 or 6
- FGS
- Key pictures predict from base representation
- FGS motion optimized at 1/3 bit rate
40Spatial Scalability
Schwarz, Marpe, and Wiegand, IWSSIP 05
41Spatial Scalability
Schwarz, Marpe, and Wiegand, IWSSIP 05
42Constrained Inter-Layer Prediction
CIF_at_30
CIF_at_15
QCIF_at_15
QCIF_at_7.5
CIF_at_15
Foreman, Munich test points
43Constrained Inter-Layer Prediction
4CIF_at_60
CIF_at_30
QCIF_at_15
4CIF_at_30
CIF_at_30
QCIF_at_15
Crew, Munich test points
44Summary of Inter-layer prediction tools
- Inter-layer predictions bring 2dB gain
- Intra prediction 1dB
- Motion prediction 0.51dB
- Residual prediction 0.5dB
- Constrained inter-layer intra prediction for
single layer decoding - Provide low complexity decoding
- Pay lt 0.5 dB loss
45Simulation Results
- Temporal Scalability
- GOP sizes ITU, 2005 July, P014
- Open loop MCTF vs. closed loop HB ITU, 2005
July, P059 - Spatial
- Given the same base layer, exam the inter-layer
prediction - Multiple-loop decoding vs. single-loop decoding
(constrained inter-layer prediction) ITU, O074 - SNR
- CGS, DQP 2 or 6
- FGS
- Key pictures predict from base representation
- FGS motion optimized at 1/3 bit rate
- Open loop MCTF helpful ? ITU, P059
46SNR Scalability
Schwarz, Marpe, and Wiegand, IWSSIP 05
47SNR Scalability
Schwarz, Marpe, and Wiegand, IWSSIP 05
48SNR Scalability
49SNR Scalability
50SNR Scalability
51SNR Scalability
- SNR scalability gives rate adaptation with 1dB
quality loss (30 rate loss) - CGS with dQP 6 has least loss in
rate-distortion performance - FGS with appropriate choice reference quality
gives near CGS performance
52Conclusion and Discussion
- Differences from H.264/AVC
- Layered pyramid prediction coding structure
- Inter-layer prediction
- Progressive quality refinement (FGS)
- Possibilities for low complexity encoding
- Use previous layer motion information for ME
- Develop prediction of motion vector candidates
for hierarchical prediction structure - Utilize Philips H264 encoder at TriMedia Platform
- Limits
- Encoding needs multiple loops
- Picture buffer size increases due to hierarchical
prediction - SVC is still under developing
53Reference
- Julien Reichel, Heiko Schwarz, and Mathias Wien,
Joint Scalable Video Model JSVM-5, (R202) ITU-T
VCEG 18th meeting, January 2006 - http//ip.hhi.de/imagecom_G1/savce/index.htm
- Heiko Schwarz, Detlev Marpe, and Thomas Wiegand,
Comparison of MCTF and closed-loop hierarchical
B pictures, (P059) ITU-T VCEG 16th Meeting, July
2005 - Heiko Schwarz, Tobias Hinz, Detlev Marpe, and
Thomas Wiegand, Constrained Inter-Layer
Prediction for Single-Loop Decoding in Spatial
Scalability, ICIP 2005 - Gwang Hoon Park, Min Woo Park, Seyoon Jeong,
Kyuheon Kim, Jinwoo Hong, Improve SVC Coding
Efficiency by Adaptive GOP Structure (SVC CE2),
(O018) ITU-T VCEG 15th Meeting, April 2005 - Yiliang Bao, Marta Karczewicz, Implementation of
close-loop coding in JSVM, (P057) ITU-T VCEG
16th Meeting, July 2005 - Heiko Schwarz, Detlev Marpe, and Thomas Wiegand,
Hierarchical B pictures, (P014) ITU-T VCEG 16th
Meeting, July 2005 - H. Schwarz, D. Marpe, T. Wiegand, Basic Concepts
for Supporting Spatial and SNR Scalibility in the
Scalable H.264/MPEG-AVC Extension, IWSSIP 05 - Heiko Schwarz, Detlev Marpe, and Thomas Wiegand,
Further results on constrained inter-layer
prediction, (O074) ITU-T VCEG 15th Meeting,
April 2005