Title: User-Oriented Approach in Spatial and Temporal Domain Video Coding
1User-Oriented Approach in Spatial and Temporal
Domain Video Coding
- 2003/12/18
- Chia-Chiang Ho, Wei-Ta Chu, Chen-Hsiu Huang and
Ja-Ling Wu - Communication and Multimedia Laboratory
- Department of Computer Science and Information
Engineering - National Taiwan University
2Introduction
- Video Encoding Challenges Reducing storage or
transmission bandwidth, while preserving mostly
the perceived quality. - Typical video encoding schemes treat different
parts of the source video as equal importance. - By combining user attention and foveation
techniques, we develop both scalable and
non-scalable coding schemes that preserves
qualities as far as possible.
3User Attention Model
- Attention refers to the ability of one human to
focus and concentrate on some visual or auditory
object. - Attention can be modeled by two directions
bottom-up and top-down. - Bottom-up attention models what people are
attracted to see. - Top-down attention was usually modeled by
detecting some meaningful objects or features.
(models what people are willing to see)
4Foveation Model
- We know that the retina is responsible for
detecting the light. - There are two kinds of neurons rods and cones.
And cones are responsible for daylight vision. - The density of cone cells is higher at the fovea
and drops with increasing eccentricity (the
viewing angle).
5Foveation Function
- According to empirical experiments
- Larger distance, larger regions can be foveated
- Larger contrast threshold, larger regions can be
foveated - Foveation model is defined as a function of
viewing distance (D) and pixel contrast.
6Foveation in Brief
- The foveation model can be regarded as a kind of
region-of-interest concept. - For ROI description, object segmentation
techniques are widely applied. However,
satisfactory results are not easy to be obtained.
- Foveation model implicitly alleviates the object
boundary restriction, and we think it may be a
compromising mechanism for object-based
applications.
7User-Oriented Video Coding
- Based on MPEG-4 FGS, foveation is exploited to
perform spatially selective enhancement, and user
attention model is used to facilitate temporal
scalability.
8Spatial Domain Approach
- The proposed architecture for the user-oriented
video encoding
9Proposed coding schemes
- Non-scalable Coding
- With foveation model, encoders can discard
unimportant visual information as much as
possible. - Thus, the compression gain can be increased
without sacrificing perceived quality. - Scalable Coding
- Encoders can selectively preserve higher quality
for focused regions.
10Scalable Coding
- Foveation model based scalable coding on the base
layer - The difference between the original video and the
foveated video is then compressed as enhancement
bitstream(s).
11User Attention based Temporal Coding
- According to user attention model, the saliency
value of a video segment is obtained from
intensity, color, motion, and face features. - The segments with small saliency variations
should be preserved when transmission bandwidth
is not enough.
12Temporal Domain Approach
- In our work, the saliency values of each video
frame are calculated from different features. - We could construct a saliency curve to illustrate
the saliency variation of a video clip.
13Temporal Reduction Steps
- Quantization quantize the saliency curve to
several stages mainly according to its standard
deviation. - Variance Calculation variance of the frames
within window is calculated to form the basis of
saliency. - Scalable Coding If the variance of video shot is
larger than a pre-defined threshold, we say that
it dazzled users and doesnt possess high
semantic meaning. This video segment is then
encoded in the enhancement layer due to storage
or transmission restriction.
14Experimental Results Non-scalable Spatial Coding
Original
D 1, k 2
D 1, k 6
We increase the minimum contrast threshold by
modifying CT0 as CT1(k)CT0kS And the D is the
viewing distance.
D 6, k 2
D 6, k 6
15Non-Scalable Experimental Results
- Bitrate savings of applying foveation filters to
various MPEG-1 encoded sequences
Sequence Original bitstream size(bytes) Original bitstream bitrate (kbps) Foveated bitstream size (bytes) Foveated bitstream bitrate (kbps) Bitrate Saving Ratio ()
foreman 1329545 831 1230589 769 7.4
mobile 3913246 2445 3326153 2078 15.0
butterfly 392447 721 374541 688 4.5
About 9 bitrate saving in average
16Experimental Results Scalable Temporal Coding
- In our preliminary experiments, we found that
this approach provides satisfactory results in
some categories of videos. - For example, in a news video, the segments with
smooth frames, such as the scenes of anchorperson
and close-up shot are preserved to be the base
layer. Other segments with frequent scene changes
are encoded as enhancement layer.
17Conclusion
- We proposed a user-oriented approach combining
user attention and foveation models to facilitate
scalable coding in spatial and temporal domains. - This framework could be extended to develop a
transcoder that selectively transcodes a part of
a video frame to meet different requirements in
different devices.