Title: Video coding
1Video coding
??
2Video coding
Types of redundancies Spatial Correlation
between neighboring pixel values Spectral
Correlation between different color planes or
spectral bands Temporal Correlation between
different frames in a video sequence In video
coding, temporal correlation is also exploited,
typically using motion compensation (a predictive
coding based on motion estimation)
3Video standards review
4H.261
For video-conferencing/video phone Low delay
(real-time, interactive) Slow motion in
general For transmission over ISDN Fixed
bandwidth px64 Kbps, p1,2,,30
5H.261
Video Format CIF (352x288, above 128 Kbps)
QCIF (176x144, 64-128 Kbps) 420 color format,
progressive scan Published in 1990 Each
macroblock can be coded in intra- or inter-mode
Periodic insertion of intra-mode to eliminate
error propagation due to network impairments
6DCT coefficient quantization
DC Coefficient in Intra-mode Uniform Others Unif
orm with deadzone (to avoid too many small
coefficients being coded, which are typically due
to noise)
MVs coded differentially (DMV) DCT coefficients
are converted into runlength representations and
then coded using VLC (Huffman coding for each
pair of symbols) Symbol (Zero run-length,
non-zero value range) Other information is also
coded using VLC (Huffman coding)
7MPEG-1
Finalized in 1991 Audio/video on CD-ROM
(1.5 Mbps, CIF 352x240, 30 fps). Maximum
1.856 mbps, 768x576 pels Progressive frames
only Prompted explosion of digital video
applications MPEG1 video CD and downloadable
video over Internet Software only decoding,
made possible by the introduction of Pentium
chips, key to the success in the commercial
market MPEG-1 Audio Offers 3 coding options
(3 layers), higher layers have higher coding
efficiency with more computations MP3 MPEG1
layer 3 audio
8MPEG-1 vs H.261
Developed at about the same time Must enable
random access (Fast forward/rewind) Using GOP
structure with periodic I-picture and P-picture
Not for interactive applications Does not
have as stringent delay requirement Fixed rate
(1.5 Mbps), good quality (VHS equivalent) SIF
video format (similar to CIF) CIF 352x288,
SIF 352x240 Using more advanced motion
compensation Half-pel accuracy motion
estimation, range up to /- 64 Using
bi-directional temporal prediction
Important for handling uncovered regions
9MPEG-1 GOP
Encoding order 1 4 2 3 8 5 6 7
10MPEG-1 coder
11H.263
Targeted for visual telephone over PSTN or
Internet Enable video phone over regular phone
lines (28.8 Kbps) or wireless modem Developed
later than H.261, can accommodate
computationally more intensive options Initial
version (H.263 baseline) 1995 H.263 1997
H.263 2000 Result Significantly better
quality at lower rates Better video at 18-24
Kbps than H.261 at 64 Kbps
12H.263
13(some of the ) H.263 improvements over H.261
Better motion estimation half-pel accuracy
motion estimation with bilinear interpolation
filter larger motion search range -31.5,31,
and unrestricted MV at boundary blocks more
efficient predictive coding for MVs (median
prediction using three neighbors) overlapping
block motion compensation (option) variable
block size 16x16 -gt 8x8, 4 MVs per MB (option)
use bidirectional temporal prediction (PB
picture) (option) 3-D VLC for DCT coefficients
(runlength, value, EOB) Syntax-based arithmetic
coding (option at 50 more computations)
14H.263 and beyond
- Aimed particularly at video coding for low bit
rates (typically 20-30 Kbps and above). - Similar to that used by H.261, however with some
improvements and changes to improve performance
and error recovery. - Main differences
- - Half pixel precision is used for motion
compensation - - Four optional negotiable options
- - Unrestricted Motion Vectors
- - Syntax-based arithmetic coding,
- - Advance prediction, and
- - forward and backward frame prediction
(similar to MPEG called P-B frames) - - Five resolutions instead of two
- Further improvements in H.263 and H.264
15H.263
Example MissAmerica Description Average
PSNR(dB) Bitrate (Kbit/s) Compr.
Ratio Original, 30fps 11 n/a 9124 10fps,
20Kbps 1391 29.79 21.83 10fps, 100Kbps
291 36.0 105.47
16MPEG-2
MPEG-2 finalized in 1994 Field-interlaced
video Levels and profiles Profiles Define
bit stream scalability and color space
resolutions Levels Define image resolutions
and maximum bit-rate per profile
17MPEG-2
A/V broadcast (TV, HDTV, Terrestrial, Cable,
Satellite, High Speed Inter/Intranet) as well as
DVD video 48 Mbps for TV quality, 10-15 for
better quality at SDTV resolutions (BT.601)
18-45 Mbps for HDTV applications MPEG-2 video
high profile at high level is the video coding
standard used in HDTV Test in 11/91, Committee
Draft 11/93 Consist of various profiles and
levels Backward compatible with MPEG1 MPEG-2
Audio Support 5.1 channel MPEG2 AAC
requires 30 fewer bits than MPEG1 layer 3
18MPEG-2 vs MPEG-1
- MPEG1 only handles progressive sequences (SIF).
- MPEG2 is targeted primarily at interlaced
sequences and at higher resolution (BT.601
4CIF). - More sophisticated motion estimation methods
(frame/field prediction mode) are developed to
improve estimation accuracy for interlaced
sequences. - - Frame Motion Vectors one motion vector is
generated per MB in each direction, which
corresponds to a 16x16 pels luminance area. - - Field Motion Vectors two motion vectors
per MB is generated for each direction, one for
each of the fields. Each vector corresponds to a
16x8 pels luminance area. - Different DCT modes and scanning methods are
developed for interlaced sequences. - MPEG2 has various scalability modes.
- MPEG2 has various profiles and levels, each
combination targeted for different application
19MPEG-2 scalability
Data partition All headers, MVs, first few
DCT coefficients in the base layer Can be
implemented at the bit stream level Simple
SNR scalability Base layer includes coarsely
quantized DCT coefficients Enhancement layer
further quantizes the base layer quantization
error Relatively simple Spatial
scalability Complex Temporal scalability
Simple
20SNR scalability
21Spatial scalability
22temporal scalability
or
23MPEG-2 profiles and levels
- Profiles tools
- Levels parameter range for a given profile
- Main profile at main level (mp_at_ml) is the most
popular, used for digital TV - Main profile at high level (mp_at_hl) HDTV
- 422 at main level (422_at_ml) is used for studio
production
24MPEG-4
- New features
- Provides technologies to view access and
manipulate objects rather than pixels - Entire scene is decomposed into multiple
objects - Object segmentation is the most difficult
task! - But this does not need to be standardized
? - Each object is specified by its shape, motion,
and texture (color) - - Shape and texture both changes in time
(specified by motion) - - Texture encoding is done with DCT (8x8
pixel blocks) or Wavelets - MPEG-4 assumes the encoder has a segmentation
map available, specifies how to code (actually
decode!) shape, motion and texture
25MPEG-4
26Example of Scene Composition
27Object-Based Coding
28MPEG-4
29MPEG-4
- MPEG-4
- Coding Tools
- Shape coding Binary or Gray Scale
- Motion Compensation Similar to H.263,
Overlapped mode is supported - Texture Coding Block-based DCT and Wavelets
for Static Texture - Type of Video Object Planes (VOPs)
- I-VOP VOP is encoded independently of any
other VOPs - P-VOP Predicted VOP using another previous
VOP and motion compensation - B-VOP Bidirectional Interpolated VOP using
other I-VOPs or P-VOPs - Similar concept to MPEG-2
30Mesh Animation
- An object can be described by an initial mesh
and MVs of the nodes in the following frames - MPEG-4 defines coding of mesh geometry, but not
mesh generation
31Body and Face Animation
- MPEG-4 defines a default 3-D body model
(including its geometry and possible motion)
through body definition table (BDP) - The body can be animated using the body
animation parameters (BAP) - Similarly, face definition table (FDP) and face
animation parameters (FAP) are specified for a
face model and its animation - E.g. eye blink (FAP19)
32Text-to-Speech Synthesis with Face Animation
33Others
- Sprite
- Code a large background in the beginning of
the sequence, plus affine mappings, which map
parts of the background to the displayed scene at
different time instances - Decoder can vary the mapping to zoom in/out,
pan left/right - Global motion compensation
- Using 8-parameter projective mapping
- Effective for sequences with large global
motion - Quarter-pixel motion estimation
- DivX
- - based on MPEG-4
- - can reduce an MPEG-2 video (the same format
used for DVD and pay per view) to 10 percent of
its original size (so that a DVD can be recorded
on a CD) - - audio is normally coded using MP3
34MPEG-7
- MPEG-1/2/4 make content available, whereas MPEG-7
allows you to find the content you need! - A content description standard
- Video/images Shape, size, texture, color,
movements and positions, etc - Audio Key, mood, tempo, changes, position
in sound space, etc - Applications
- Digital Libraries
- Multimedia Directory Services
- Broadcast Media Selection
- Editing, etc
- Example
- Draw an object and be able to find object with
similar characteristics. - Play a note of music and be able to find
similar type of music
35MPEG-21
- Aims at standardizing interfaces and tools to
facilitate the exchange of multimedia resources
across heterogeneous devices, networks and users.
- More specifically, it standardizes requisite
elements for packaging, identifying, adapting and
processing these resources as well as managing
their usage rights. - This framework will benefit the entire
consumption chain from creators and rights
holders to service providers and consumers. - Basic unit of transaction in the MPEG-21
Multimedia Framework the Digital Item, which
packages resources along with identifiers,
metadata, licenses and methods that enable
interaction with the Digital Item. - Another key concept the User, i.e. any entity
that interacts in the MPEG-21 environment or
makes use of Digital Items.
36MPEG-21
- MPEG-21 can be seen as providing a framework in
which one User interacts with another User and
the object of that interaction is a Digital Item.
- Some example interactions include content
creation, management, protection, archiving,
adaptation, delivery and consumption.
37MPEG-A
- MPEGs Multimedia Application Formats (MAF)
provide the framework for integration of elements
from several MPEG standards into a single
specification that is suitable for specific, but
widely usable applications. - Typically, MAFs specify how to combine metadata
with timed media information for a presentation
in a well-defined format that facilitates
interchange, management, editing, and
presentation of the media. The presentation may
be local to the system or may be via a network
or other stream delivery mechanism.
38MPEG-A
- MAF specifications shall integrate elements from
different MPEG standards into a single
specification that is useful for specific but
very widely used applications. Examples are
delivering music, pictures or home videos. MAF
specifications may use elements from MPEG-1,
MPEG-2, MPEG-4, MPEG-7 and MPEG-21. Typically,
MAF specifications include - - The ISO File Format family for storage
- - A simple MPEG-7 tool set for Metadata
- - One or more coding Profiles for
representing the Media - - Tools for encoding metadata in either
binary or XML form
39MPEG-A
- MAFs may specify use of
- - MPEG-21 Digital Item Declaration Language
for representing the Structure of the Media and
the Metadata - - Other MPEG-21 tools
- - non-MPEG coding tools (e.g., JPEG) for
representation of "non-MPEG" media - - Elements from non-MPEG standards that are
required to achieve full interoperability
40MPEG-A 2 examples
- 3on4
- - MP3, is one of the most widely used MPEG
standards. Currently, the ID3 simply appends
simple metadata tags such as Artist, Album, Song
Title, etc. - -MPEG-4 specifies what MPEG expects to be
another very successful specification, the MPEG-4
File Format, while MPEG-7 specifies not only
signal-derived meta-data, but also archival
meta-data such as Artist, Album and Song Title. - - As such, MPEG-4 and MPEG-7 represent an
ideal environment to support the current MP3
music library user experience, and, moreover, to
extend that experience in new directions.
41MPEG-A 2 examples
- Jon4
- - Digital Cameras -gt library with thousands
of digital photos - - Search for photographs of interest can be
difficult -gt - - Need for provision of suitable metadata photo
content (e.g. the subject being photographed),
author, shoot location, imaging parameters, etc,
stored in a standardized format - - The EXIF standard (commonly adopted by
camera manufacturers) does not support advanced
metadata. - MPEG-7 defines rich metadata descriptions for
still images, audio and also provides associated
systems tools (file formats, etc) - As such, MPEG-7 and MPEG-4 file format represent
an ideal environment to support the current
Digital Photos Library user experience
42Summary (1/2)
- H.261
- First video coding standard, targeted for video
conf. over ISDN - Uses block-based hybrid coding framework with
integer-pel MC - H.263, H.264
- Improved quality at lower bit rate, to enable
video conferencing/telephony below 54 Kbps
(modems or internet access, desktop
conferencing) half-pixel MC - MPEG-1 video
- Video on CD and video on the Internet (good
quality at 1.5 Mbps) - Half-pixel MC and bidirectional MC
- MPEG-2 video
- TV/HDTV/DVD (4-15 Mbps)
- Extended from MPEG-1, considering interlaced
video
43Summary (2/2)
- MPEG-4
- To enable object manipulation and scene
composition at the decoder -gt interactive
TV/virtual reality - Object-based video coding shape coding
- Coding of synthetic video and audio animation
- MPEG-7
- To enable search and browsing of multimedia
documents - Defines the syntax for describing the
structural and conceptual content - MPEG-21 beyond MPEG-7, considering
intellectual property protection, etc. - MPEG-A integration of elements from different
MPEG standards into a single specification that
is useful for specific but very widely used
applications