Title: Perceptual Video Coding: How H'264 can do better
1Perceptual Video Coding How H.264 can do better!
Koohyar Minoo kminoo_at_ucsd.edu and Truong Nguyen
nguyen_at_ucsd.edu
Video Processing Group, ECE Department,
University of California at San Diego
http//videoprocessing.ucsd.edu
- Introduction
- Todays state of the art, Video/Image coders,
optimize the compression quality by minimizing - Some Distortion measure or
- Some joint Rate-Distortion measure
- Traditionally, Mathematical Distortion models
such as MSE (mean Squared Error) or MAD (Mean
Absolute Distance) have been used for video/Image
coding optimization1. - These models are not accurately representing the
perceived distortion based on HVS (Human Visual
System)2.
- Perceptually Enhanced H.264
- We have introduced appropriate perceptual models
for the following tasks in H.264 - Mode Selection Instead of MSE (in RDO mode
selection) or MAD (in simple mode selection) as a
measure of distortion we propose a perceptual
distortion model that allows us to - Select a mode that lowers consumed coding bits
when the distortion associated with that mode is
not noticeable. (Saving bit-budget) - Select a mode that preserve edges and boundary-
integrity of scenes objects despite an increase
in consumed bits. (Maintaining Quality) - Bit Allocation This part has applications in
the following two scenarios - Rate Control or coding with Hypothetical
Reference Decoder (HRD) Buffer Constraints. In
this scenario the bit budget is assigned based on
the predicted perceptual severity of coding loss,
not just the Predicted MAD. (Saving bits for
where its more needed) - Variable Bit Rate (VBR) for Storage Coding. In
this scenario the Quantization parameter is
assigned based on the amount of noise which can
be tolerated for each Macro Block. (Maintaining
same perceptual quality across the frames.)
- Perceptually tuned Quantization Parameter
assignment In this experiment, Quantization
Parameter is assigned based on the perceptual
characteristics of MBs. -
(a) (b)
- Amongst the non-normative tools, in reference
H.264 codec by HHI, the following items
significantly influence the performance of the
codec - Rate-Distortion Optimized mode selection R-D
Optimized mode selection uses MSE to compare
distortion across different modes. - Motion Estimation for Inter modes Motion
Estimation uses MAD or SA(T)D (Sum of Absolute
(Transform) Difference) to decide which
transitional motion vector, better predicts the
current block. - Rate Control for VBR or CBR operations
Currently the reference H.264 implementation of
Rate Control algorithm uses a linear prediction
of MAD of current block based on those of
co-located block of previous frame to estimate
the Quantization parameter for a given rate.
- Objectives
- Devising Perceptually suitable distortion
measures to enhance the coding efficiency of the
H.264 video coder. - These measures will be utilized by following
units to make the coding, Perceptually more
efficient. - R-D optimized mode Selection
- Motion Estimation
- Quantization Parameter assignment to each MB
- Rate Control
- Devised models need to be as general as possible
and not application dependent. e.g. based on
viewing condition and/or display type and etc. In
future, application-based, fine tuning can be
applied to these general models. - Keeping the Computational Complexity of
perceptual distortion measurements, low, so it
doesnt have any impact on overall encoding time
of H.264 encoder.
- Properties of HVS
- Properties of Human Visual System can be used to
correct shortcomings of mathematical models such
as MSE. Under certain conditions HVS can tolerate
more distortion than what MSE predicts. On the
other hand there are some type of distortions
that MSE doesnt signify as much as its
perceived. Here are some of HVS properties which
are the bases of our model - Texture Masking HVS is less sensitive to details
in the areas with high amount of texture
activity. This means that more noise can be
tolerated in MBs with high amount of texture
activity - Example In the following two images the noise
energy added to both images are the same. In (a)
the noise added randomly while in (b) the noise
power is weighted base on Texture activity of the
each MB - (a)
(b) - Flower sequence SIF size Adding white noise with
MSE208 to frame 1 - Intensity Contrast Masking In lower/medium
contrast areas more noise can be hidden in darker
area. - Example In the following two images the noise
energy added to both images are the same. In (a)
the noise added randomly while in (b) the noise
power is weighted base on Intensity Contrast for
each MB - (a)
(b)
- DCT Domain Implementation
- To construct the proposed distortion models, two
constrained were imposed - Accuracy of Perceptual Model The proposed model
needs to, objectively, measure distortion, as
close as possible to subjective measure by
average human observer. - Low Computational Complexity of model So it can
be used for real-time video applications. - These two constraints resulted in a Perceptual
model in Transform Domain which uses Transform
coefficients to measure perceptual distortion by
following rules. - Texture Activity is proportional to the
variance of AC coefficients of DCT values. - Average Intensity for a block can be
represented by DC coefficient of DCT. - Spatial Frequency Sensitivity can be accounted
for, by weighing different coefficients of error
signal in DCT domain, based on the position of
coefficients. - Edge detection There are simple edge detection
routines in DCT domain for Horizantal and
Vertical edges.
- Background on H.264
- H.264 employs many normative and non-normative
tools and features to achieve its superior video
compression performance3. In this section we
consider those tools and features which make
H.264 a good candidate for benefiting from
perceptual aspect of HVS. - Amongst normative tools, these new features are
making considerable contribution to coding
performance. - Variable block sizes for block prediction
Blocks of sizes 16x16 to 4x4 more efficiently
encapsulate the properties of video-Frames
Regions. (e.g. smoother areas will be encoded
with bigger blocks) - Smaller size (4x4) DCT transform This feature
makes the transform coefficients more localized
in space. So its easier to judge from DCT
coefficients about the visual property of a
region of a frame. - Quarter pixel Motion Estimation This will
result in a more accurate prediction of
translational motion. This feature also
accommodate for canceling of added noise to
reference frame from different sources. (e.g.
quantization noise, inaccurate motion estimation
of reference frames.) - De-Blocking In-Loop filter This helps to rid
block-based video coders of blocking artifacts
which are the main perceptual artifacts
especially at low bit rates. - Flexible MacroBlock Ordering This feature
facilitate grouping of MacroBlocks into slices
which can be used either for error resiliency or
more efficient video coding. This grouping can be
based on perceptual importance of MBs (for error
resiliency) or similar coding properties (e.g.
quantization parameter) of different MBs of coded
frame (for coding efficiency).
- Future Works
- Develop application-oriented distortion models
e.g. distortion model can be fine-tuned or
enhanced based on the video-frame size and the
display type and viewing distance (To account for
Contrast Sensitivity). - Performing optimal Rate-Distortion mode
selection for lossy channels based on perceptual
attributes of the coded MacroBlock. Currently
this is part of mode selection based on MSE of
error at a number of different virtual decoders
on the encoder side.
- Results Quality Enhancement
- We used our algorithm to encode many of
sequences, found on different video coding
standards test-data base. - We found that quality gain based on perceptual
coding algorithm depends on - Video Content The dominance of aforementioned
perceptual features in coded sequence is a
deciding factor on quality enhancement. - Target Bit Rate At low and mid ranges the
quality gain is more noticeable than high
bit-rates. - In here we show the results for two scenarios
References 1 Rate-distortion optimization for
video compressionSullivan, G.J. Wiegand, T.
Signal Processing Magazine, IEEEVolume 15,Â
Issue 6, Nov. 1998 Page(s)74 - 90 2 What's
wrong with mean-squared error? Girod, B. (1993).
In Watson, A. B. (ed.). Digital Images and Human
Vision. MIT Press, Cambridge, MA. 207-220. 3
Overview of the H.264/AVC video coding
standardWiegand, T. Sullivan, G.J. Bjntegaard,
G. Luthra, A.Circuits and Systems for Video
Technology, IEEE Transactions onVolume 13,Â
Issue 7, July 2003 Page(s)560 - 576
Acknowledgements This work is supported by a
grant from CalIT2 and Navy.