User-Oriented Approach in Spatial and Temporal Domain Video Coding

1 / 17
About This Presentation
Title:

User-Oriented Approach in Spatial and Temporal Domain Video Coding

Description:

... cone cells is higher at the fovea and drops with increasing eccentricity (the viewing angle) ... Fovea. Retina. Lens. D. Foveation point. e. Foveated ... –

Number of Views:42
Avg rating:3.0/5.0
Slides: 18
Provided by: NTU
Category:

less

Transcript and Presenter's Notes

Title: User-Oriented Approach in Spatial and Temporal Domain Video Coding


1
User-Oriented Approach in Spatial and Temporal
Domain Video Coding
  • 2003/12/18
  • Chia-Chiang Ho, Wei-Ta Chu, Chen-Hsiu Huang and
    Ja-Ling Wu
  • Communication and Multimedia Laboratory
  • Department of Computer Science and Information
    Engineering
  • National Taiwan University

2
Introduction
  • Video Encoding Challenges Reducing storage or
    transmission bandwidth, while preserving mostly
    the perceived quality.
  • Typical video encoding schemes treat different
    parts of the source video as equal importance.
  • By combining user attention and foveation
    techniques, we develop both scalable and
    non-scalable coding schemes that preserves
    qualities as far as possible.

3
User Attention Model
  • Attention refers to the ability of one human to
    focus and concentrate on some visual or auditory
    object.
  • Attention can be modeled by two directions
    bottom-up and top-down.
  • Bottom-up attention models what people are
    attracted to see.
  • Top-down attention was usually modeled by
    detecting some meaningful objects or features.
    (models what people are willing to see)

4
Foveation Model
  • We know that the retina is responsible for
    detecting the light.
  • There are two kinds of neurons rods and cones.
    And cones are responsible for daylight vision.
  • The density of cone cells is higher at the fovea
    and drops with increasing eccentricity (the
    viewing angle).

5
Foveation Function
  • According to empirical experiments
  • Larger distance, larger regions can be foveated
  • Larger contrast threshold, larger regions can be
    foveated
  • Foveation model is defined as a function of
    viewing distance (D) and pixel contrast.

6
Foveation in Brief
  • The foveation model can be regarded as a kind of
    region-of-interest concept.
  • For ROI description, object segmentation
    techniques are widely applied. However,
    satisfactory results are not easy to be obtained.
  • Foveation model implicitly alleviates the object
    boundary restriction, and we think it may be a
    compromising mechanism for object-based
    applications.

7
User-Oriented Video Coding
  • Based on MPEG-4 FGS, foveation is exploited to
    perform spatially selective enhancement, and user
    attention model is used to facilitate temporal
    scalability.

8
Spatial Domain Approach
  • The proposed architecture for the user-oriented
    video encoding

9
Proposed coding schemes
  • Non-scalable Coding
  • With foveation model, encoders can discard
    unimportant visual information as much as
    possible.
  • Thus, the compression gain can be increased
    without sacrificing perceived quality.
  • Scalable Coding
  • Encoders can selectively preserve higher quality
    for focused regions.

10
Scalable Coding
  • Foveation model based scalable coding on the base
    layer
  • The difference between the original video and the
    foveated video is then compressed as enhancement
    bitstream(s).

11
User Attention based Temporal Coding
  • According to user attention model, the saliency
    value of a video segment is obtained from
    intensity, color, motion, and face features.
  • The segments with small saliency variations
    should be preserved when transmission bandwidth
    is not enough.

12
Temporal Domain Approach
  • In our work, the saliency values of each video
    frame are calculated from different features.
  • We could construct a saliency curve to illustrate
    the saliency variation of a video clip.

13
Temporal Reduction Steps
  • Quantization quantize the saliency curve to
    several stages mainly according to its standard
    deviation.
  • Variance Calculation variance of the frames
    within window is calculated to form the basis of
    saliency.
  • Scalable Coding If the variance of video shot is
    larger than a pre-defined threshold, we say that
    it dazzled users and doesnt possess high
    semantic meaning. This video segment is then
    encoded in the enhancement layer due to storage
    or transmission restriction.

14
Experimental Results Non-scalable Spatial Coding
Original
D 1, k 2
D 1, k 6
We increase the minimum contrast threshold by
modifying CT0 as CT1(k)CT0kS And the D is the
viewing distance.
D 6, k 2
D 6, k 6
15
Non-Scalable Experimental Results
  • Bitrate savings of applying foveation filters to
    various MPEG-1 encoded sequences

Sequence Original bitstream size(bytes) Original bitstream bitrate (kbps) Foveated bitstream size (bytes) Foveated bitstream bitrate (kbps) Bitrate Saving Ratio ()
foreman 1329545 831 1230589 769 7.4
mobile 3913246 2445 3326153 2078 15.0
butterfly 392447 721 374541 688 4.5
About 9 bitrate saving in average
16
Experimental Results Scalable Temporal Coding
  • In our preliminary experiments, we found that
    this approach provides satisfactory results in
    some categories of videos.
  • For example, in a news video, the segments with
    smooth frames, such as the scenes of anchorperson
    and close-up shot are preserved to be the base
    layer. Other segments with frequent scene changes
    are encoded as enhancement layer.

17
Conclusion
  • We proposed a user-oriented approach combining
    user attention and foveation models to facilitate
    scalable coding in spatial and temporal domains.
  • This framework could be extended to develop a
    transcoder that selectively transcodes a part of
    a video frame to meet different requirements in
    different devices.
Write a Comment
User Comments (0)
About PowerShow.com