Title: Video Coding
1Video Coding
2Introductions
- Inter-frame redundancy
- HVS low pass filter
- very insensitive to high
- spatial and temporal
- frequency
3Introductions
- Conditional Replenishment
- Motion-compensated Coding
- 3-D Transform Coding
- Hierarchical Coding
- MPEG-X
4Conditional Replenishment
5Conditional Replenishment
63-D Transformation
73-D Transformation
8Motion Compensated Coding
9Motion Compensated Coding
10Motion Compensated Coding
11Motion Compensated Coding
12Motion Compensated Coding
13Motion Compensated Coding
- Hierarchical search - Zenith
14Motion Compensated Coding
- 0.125 bpp with SNR35.8 dB
15MPEG-X
- MPEG-1 (ISO/IEC 11172, Nov 92)
- Compression standard for progressive frame-based
video in SIF(360?240), targeted at 1.5 Mbits/s - 1.2 Mbits/s for video, 250 Kbits/s for audio
- Applications VCD, MP3
- MPEG-2 (ISO/IEC 13818, Nov 94)
- Compression standard for interlaced frame-based
video in CCIR-601(720?480) and high definition
format(1920 ?1088), wide range of bit rates 4 to
80 Mbits/s - Optimized around 4 Mbits/s
- Applications DVD, HDTV Studio, and etc
16MPEG-X
- MPEG-4 (ISO/IEC 14496, Oct 98)
- Multimedia standard for object-based video for
nature or synthetic source - Coding for various bandwidth (5 Kbps 270 Mbps)
- Applications Internet, cable TV, 3G wireless
communication, and etc - MPEG-7 (ongoing)
- Multimedia content description interface
- Applications Internet, video search engine,
digital library - Specify only bitstream syntax and decoding
17Why does company work in the standards?
- Interoperability
- War of formats
- VHS vs. Beta DVIX vs. DVD
- Patent Royalties
- Licensing fee for an MPEG-2 box is US 4 from
MPEGLA - Total licensing fee for DVD is around US 10
- Big companies can avoid being taxed by other
companies - 250 Millions per year for RCA patent profiles
- Create new markets
- VCD Video Compact Disk
- DVD Digital Versatile Disk
- DBS Direct Broadcast System
- HDTV Grand Alliance in the US, DVB in Europe
18MPEG-1
- Requirements (Part)
- Coding of generic video with good quality (about
VHS video) at 1 to 1.5 Mbits/s - Random access to a frame in limited time
- Frequent access points
- Fast forward/reverse
- Seek and play in FF/FR using access points
- System supporting audio-visual synchronized
play/access - A practical/implementable decoder
- And etc
19MPEG-1
- New Features (w.r.t H.261)
- Flexible picture sizes, picture rates, etc
- Picture size up to 4096?4096 supported
- Normally at 360?240
- Picture rates 23.976, 24 (movies), 25 (PAL),
29.97, 30, 50, 59.94, 60 - Bi-directional motion compensation
- Half-pel motion compensation
- VLC for MV difference
- And etc
20MPEG-1
- Motion compensation
- Same as H.263
- Half-pel resolution for motion vectors
- Differential coding of motion vectors
- Motion compensation on 16?16 luminance blocks
- Motion vectors divided by 2 for chrominance
- Different from H.263
- VLC for MV difference
21MPEG-2
- Requirements (Part)
- ITU-R 601 interlaced video with high quality at
49 Mbits/s - Scalable video coding for multi-quality video
applications - Maximum interoperability/compatibility with
MPEG-1 - Support coding of non-interlaced and interlaced
formats of many frame rates - Support video formats of various aspect ratios
- And etc
22MPEG-2
- NEW features
- Allow 422 and 444 formats for chrominance
- Frame-pictures and field pictures
- Frame/field/dual-prime adaptive motion
compensation - New VLC table for DCT coefficients
- Nonlinear quantization table
- Slice always start and end at the same raw of
macroblocks - Motion vectors always coded in half-pel
- And etc
23MPEG-2
24MPEG-2
25MPEG-2
- Coding of interlaced video
- Frame-pictures or field pictures
- Motion compensation
- Frame prediction for frame-pictures frame
motion vectors - Same as MPEG-1
- Field prediction for field pictures field
motion vectors - Field prediction for frame-pictures
- Prediction from either field of the previous
frame - Good for fast motion
- Dual-prime two predictions are formed for each
field from the two recent fields. They are then
averaged for final prediction - Field-pictures or frame-pictures
- Only for P-pictures
- 16?8 MC for field pictures - topbottom halves of
each macroblock
26MPEG-1/2
- Data Structure M3, N15 (tradeoff on M)
27MPEG-1/2
- I-frames
- No temporal redundacy reduction
- Has the highest bit count
- For random access, FF, REW features
- P-frames
- Forward motion-compensated prediction
- B-frames
- Both forward and backward motion-compensated
prediction - Usually results in the lowest bit count
- Increase delay
28MPEG-1/2
- I-frame
- JPEG DCT like
- Highest data rate
- Random access
- FF/FR
29MPEG-1/2
- P-frame
- motion compensated
- coding
- Predicted by previous
- I or P frame
- Prediction error is then
- coded and transmitted
30MPEG-1/2
- B-frame
- delay
- buffer
- lowest
- data rate
- Higher coding
- efficiency
- No error propagation
31MPEG-1/2
intraframe processing
Variable Length Encoder
Buffer Control Strategy
Predictive frame processing
interpolative frame processing
32MPEG-1/2 Results
Bit rate (Mbits/s) SIF-30 CVGA CCIR 601 29.97 FPS VGA HDTV 29.97 FPS HDTV 60 FPS SVGA
pels 352 720 1920 1280
Lines 240 480 1080 720
Original bit rates (Mbps) 30.4 121.5 745.7 663.6
1.1 Mbps Good Poor
4.0 Mbps Excellent Good
9.0 Mbps Excellent Excellent
18.0 Mbps Excellent Good Good
28.0 Mbps Excellent Excellent
33MPEG-1 Demo
- Original Y(720?480), UV(360 ?480)
- Acer
- original 20 60 96
- Bike
- original 33 49 66 81 97
- Foot
- original 33 50 66 108 109
- Table Tennis
- Original 40 49 98 147 150
34MPEG-2 Demo
35TV, Telecom Computer Convergence
- Past
- Television RCA, TCI, GI etc.
- Telecommunications ATT, Hughes etc.
- Computer IBM, Microsoft etc.
- Now
- Television
- Hugues Thomson (RCA) Sarnoff -gt DIRECTV
- Microsoft Philips -gt WebTV
- Telecommunications
- ATT TCI _at_home -gt Local Telephone Service
Internet - Computer
- Window CE TV set-top box -gt Venus
36MPEG-4
- What existing standards can do
- MPEG-1 Frame-based non-interlaced video (1.5
Mbps) - MPEG-2 Frame-based interlaced video (4 Mbps
270 Mbps) - H.261 Low bit rate video conference (64?p Kbps)
- H.263 Very low bit rates video conference (10
Kbps) - What the existing standards can not do
- Coding of video object with content information
(Metadata) - Coding of images for progressive transmission
- Coding of multimedia information for various
bandwidths and media (5 Kbps 270 Mbps) - Interactive
37What applications are relevant to us?
38Internet Image RetrievalJPEG vs. EZW
After 1 second
After 4 seconds
After 8 seconds
39Multiresolution Feature Search Using Wavelet
40Internet Commerce Using Metadata
41Telemedicine Using Wavelet Compression
42Consumer Videophone - Modes Applications
Family / Home
Stand-alone wired videophone
PC-based videophone
POTS/ISDN
LAN
Good spatio-temporal Quality Low end-to-end
Delay Channel Error Resilience
LAN ISDN
Virtual classroom Virtual meeting
43Virtual Set Example
44MPEG-4 Virtual Set Compositing
45MPEG-4 Virtual Set
46MPEG-4 Key Functionalities
Compression
Content-based interactivity
Universal access
47MPEG-4 Key Functionalities
- Content-based interactivity
- A scene is composed of audio-visual objects
- Not just pixels or moving blocks
- Objects can be of different nature
- Text or images
- Rectangular or arbitrary shape
- 2D or 3D objects
- Natural or synthetic
- Different coding schemes applied to different
objects - Composer puts objects back in a scene
48MPEG-4 Key Functionalities
- Universal access
- Robustness in error-prone environments
- Allow applications over wired and wireless
networks - Robust for severe error conditions, e.g. long
error bursts - Content-based scalability
- Allowing scalability in content, quality and
complexity - Achieving content based scaling of visual
information
49MPEG-4 Key Functionalities
- Compression
- Improved coding efficiency
- MPEG-4 video shall provide subjectively better
visual quality at comparable bit rates compared
to existing or emerging standards - 5-64 Kbps for mobile applications
- Up to 20 Mbps for TV/film applications
- Coding of multiple concurrent data streams
- Can code multiple views of a scene efficiently,
e.g. stereo video
50MPEG-4 AV Objects
- Audiovisual Scene is composed of objects (AV)
- Compositor puts objects in scene (AV, 23D)
- Objects can be of different nature
- natural or synthetic AV, text graphics,
animated faces, arbitrary shapes or rectangular - Encoding the object independently
- Coding scheme can differ for individual objects
- From low bitrates to (virtually) lossless quality
51Object Functionalities in MPEG-4
52Video object planes (VOPs)
- Imagedifferent objectstextbackground (VOPs)
- Single VOP backward compatible to MPEG-1/2
- Composition or segmentation
- Note segmentation is outside the scope of
MPEG-4 - Characteristics of VOP
- Separate object coding
- Separate object manipulation
- May have different spatial and temporal
resolutions - May be associated with different degrees of
accessibility sub-VOPs - May be separated or overlapping
53Separated and overlapping VOP
54Content-based object manipulation
- Change of the spatial position of a VOP
- Application of a spatial scaling factor to a VOP
- Change of the speed with which a VOP moves
- Insertion of new VOPs
- Delete of an video object (VO) in the scene
- Successive VOPs belonging to the same physical
objects in a scene are referred as vider objects
(VO) - Change of the scene area
55Example of bit stream manipulation (1)
56Example of bit stream manipulation (2)
57Segmentation process
- Depending on applications, segmentation can be
performed - Online (real-time) or offline (non-real time)
- Automatic or semi-automatic
- Examples
- Video conferencing
- Real-time, automatic
- Separate foreground (communication partner) from
background - Object tacking in video
- May allow off-line and semi-automatic
- Separate moving object from others
58Video object plan formation
59Demo
- Segmentation
- Bit stream manipulation
60Video object-based coding
- Each Video Object in a Scene is Coded and
Transmitted Separately -
61Data structure in visual part of MPEG-4
Visual object Sequence (VS)
Video Object (VO)
Video Object Layer (VOL) Still Object Layer (SOL)
SOL0
Group of Video Object Plane (GOV)
Video Object Plane (VOP)
62MPEG-4 Video Decoder
63MPEG-4 Video Decoder
64MPEG-4 Video Decoder
- Scene description is necessary.
- A language called the Binary Format for Scenes
(BIFS) based on the Virtual Reality Modeling
Language (VRML) has been developed by MPEG for
scene description. - The decoder can use the scene description and
additional input from the user to combine or
compose the objects to reconstruct the original
scene or create a variation on it.
65MPEG-4 standard video tools
66MPEG-4 standard video tools
- The glue that will bind these tools together is
the MPEG-4 systems description language (MSDL)
which will have several components, including - Definitions for the interfaces between the coding
tools, - A mechanism to combine coding tools,
- A mechanism to download new tools.
- The MSDL will transmit to the decoder the bit
stream and the manner in which the tools have to
be used at the decoder to reconstruct the audio
and video.
67Shape coding tool
- Every VOP is coded macroblock by macroblock
- The bounding rectangle of the VOP is extended on
the right-bottom side to multiples of 16x16
blocks (macroblock).
68Description of VOP
69Shape Coding
- Binary alpha planes (shape information) are
encoded by context-based arithmetic encoding
(CAE). - Gray scale alpha planes are encoded by motion
compensated DCT similar to texture coding. - An alpha plane is bounded by a rectangle that
includes the shape of a VOP. - Intra (I-VOPs and P-VOPs) or inter (P-VOPS and
B-VOPs) shape coding at macro block level - Inter motion compensated shape
70Shape Coding (cont.)
- Motion vectors from texture motions or shape
motion of neighboring blocks - Coding modes
- Opaque
- Transparent
- No-update
- Intra Context based Arithmetic Encoding
- Inter Context based Arithmetic Encoding
- Lossless
- Lossy
- Motion compensation without update
- Sub-sampling by factor 2 or 4
71Shape Coding Tools - CAE
- Context based Arithmetic Encoding (CAE) of the
pixel ? - Intra
- Inter
72Shape Coding Tools - CAE
- Compute a context number.
- Index a probability table using the context
number - Use the indexed probability to drive an
arithmetic encoder.
73Motion Compensated DCT
- A hybrid coding scheme used in H.261, H.263,
MPEG-1 and MPEG-2 - Reduces the temporal/spatial correlation of video
objects in two steps - Temporal by motion compensation
- Spatial by Discrete Cosine Transform (DCT)
transform coding.
74Block Based Motion Compensation
- Models transversal motion of block in frames with
a motion vector. - Motion compensation is performed block by block.
75Motion Compensation Tools
Motion compensated coding modes (I, B, P)
76Motion Compensation Tools - Motion Computation
77Motion Compensation Tools - Padding
Process of normal padding of a block
Process of padding of a VOP
Process of extended padding of a block
78Padding
79Motion estimation compensation
80Block-based compatibility for VOP
81Texture Coding Tools (1/3)
- The intra VOPs, as well as residual errors after
motion compensated prediction, are coded using
DCT on 8?8 blocks, in a manner similar to that
employed in MPEG-1, MPEG-2, H.261, and H.263. - Backward compatible to MPEG-1 and MPEG-2
- Efficient prediction of DC and AC coefficients
for intra and inter-coded blocks can also be
employed (this approach is not available in
MPEG-1 and MPEG-2).
82Texture Coding Tools (2/3)
83Texture Coding Tools (3/3)
84Adaptive DC prediction (texture coding)
Block (8x8)
85Adaptive AC prediction (texture coding)
86Coefficients Scanning (texture coding)
Alternate-Horizontal scan
Alternate-Vertical scan
zig-zag scan (H.263/MPEG-1)
87Quantization (texture coding)
- Method 1 Similar to that of H.263
- Method 2 Similar to that of MPEG-2
- Optimized non-linear quantization of DC
coefficients - Quantization matrices and loading mechanism
88Scalability
- Object scalability
- Achieved by the data structure used and the shape
coding - Temporal scalability
- Achieved by generalized scalability mechanism
- Spatial scalability
- Achieved by generalized scalable mechanism
89Object scalability
90Temporal scalability
91Spatial scalability
92Static Sprite Coding Tools (1/5)
- A sprite is an image composed of pixels belonging
to a video object visible throughout a video
segment. - For instance, sprite generated from a panning
sequence will contain all the visible pixels of
the background object throughout the sequence. - Portions of this background may not be visible in
certain frames due to the occlusion of the
foreground objects or the camera motion. - Thus, the sprite contains all parts of the
background that were at least visible once.
93Static Sprite Coding Tools (2/5)
- The sprite encoding syntax can be utilized for
the transmission of any still image to the
decoder since a sprite is essentially just a
still image. - Static sprites are those that are directly copied
(including appropriate warping and cropping) to
generate a particular rendition of the sprite at
a particular time instant. - Sprite the panoramic view of the back ground.
- Improves the coding efficiency for video
sequences with lots of revisiting backgrounds.
94Static Sprite Coding Tools (3/5)
The main idea of static sprite coding technique
is to generate the reconstructed VOPs by directly
warping the quantized sprite using specified
motion parameters. Residual error between the
original VOP and the warped sprite is not added
to the warped sprite.
95Static Sprite Coding Tools (4/5)
- Basic sprite (a large static image) coding
- Low latency sprite coding (sent hierarchically)
- Scalable sprite coding
96Static Sprite Coding Tools (5/5)
Sprite
Foreground Object
Decoded Frame
97Wavelet Tool (1/3)
- Discrete Wavelet Transform (DWT)
- Still image coding mode
- Separate DC band Coding
- Zero-Tree Scanning (ZTS) and Multiscale Zero-Tree
Entropy (MZTE) coding
98Wavelet Tool ZTS (2/3)
- A general architecture for zerotree coding.
- Provides tradeoffs between scalability,
complexity and efficiency. -
Zero Tree Scanning (ZTE)
99Wavelet Tool Multiscale ZTE(3/3)
100Arbitrary Shaped Wavelet Tool
- Shape-Adaptive Wavelet A more general case of
rectangular wavelet. - Zerotree Coding encodes only the interior nodes
- Downsampling the original shape to obtain the
shapes in different resolutions. - Using a shape coding scheme to include the shape
information.
101Shape adaptive wavelet coding - SNR Scalability
bitstream
30kbits
8kbits
5kbits
102Shape adaptive wavelet coding - Spatial
Scalability
10312-Bit Video Coding Tool
- Allows compression of video data with precision
of up to 12-bits/pixel - The syntax, semantics, and coding tools are
extended - bit-precision
- extended DC VLC tables
- extended quantization mechanism
- Insertion of marker bits to avoid start code
emulations
104MPEG-4 demo
- Coastguard
- Original 128 Kbits/sec
- Foreman
- Original 128 Kbits/sec
- Hall_monitor
- Original 128 Kbits/sec
105Summary (1/2)
- MPEG-4
- The first content-based standard, addressing
multimedia. - Object-based representation of a scene.
- Both natural and synthetic.
- Compression many other features.
- Normative Decoder (i.e bitstream syntax and
decoding algorithm).
106Summary (2/2)
- MPEG-4 Visual
- Tool box approach, i.e. consists of many tools.
- One tool, one functionality.
- Set of tools Object.
- Set of objects Combination Profile.
- Conformance points on combination profiles.
107SNHC Tools
- SNHC Synthetic/Natural Hybrid Coding
- An MPEG-4 subgroup working on the synthetic
tools. - Tools for Version 1
- Face Animation
- Dynamic 2D Meshes
- Scalable Textures
108SNHC Tool Face Animation
- Face an object capable of facial geometry ready
for rendering and animation - A synthetic representation of a human face
- visual manifestations of speech are intelligible
- facial expressions allow recognition of moods
- specified by the parameters in the incoming
bitstream
109SNHC Tool Face Animation
Defines a specific face via - 3D feature points -
3D mesh/scene graph - Face Texture - Face
Animation Table
110SNHC Tool Dynamic Meshes
- Specifically refers to triangular meshes
- Tessellation of a 2D visual object plane into a
connection of triangular patches - No addition and deletion of nodes, i.e. no change
in topology. - Used for video object
- manipulation
111(No Transcript)
112SNHC Tool Dynamic Meshes
113SNHC Tool Dynamic Meshes
114Conclusions
- Refer to exercise for further information
- Other search strategies
- Motion JPEG (MJPEG)
- DV
- Already used for high quality video coding
- Motion JPEG2000 (MJPEG2000)
- JPEG 2000 for video
- Collaborative or Competitive ?
- Compression ratio may be higher than MPEG-1/2
- Symmetric algorithm, however
- Part III of JPEG2000 standard
- original 40 53 67 80 100
115Conclusions
- Standard is usually not the best
- Demo
- Avxing
- Window media video
116Acknowledgement
- I would like to thank Prof. Tihao Chiang
of National Chiao-Tung University and his
ex-colleagues Dr. Ya-Qin Zhang, Iraj Sodagar, and
Sriram Sethuraman. Professor Chiang generously
gave me his transparency master for MPEG-4
tutorial and this helped me very much in
preparing this lecture.