Video Coding - PowerPoint PPT Presentation

1 / 116

About This Presentation

Title:

Video Coding

Description:

Video Coding 18 – PowerPoint PPT presentation

Number of Views:395

Avg rating:3.0/5.0

Slides: 117

Provided by: MsWi4

Category:

more less

Transcript and Presenter's Notes

Title: Video Coding

1
Video Coding

2
Introductions

Inter-frame redundancy
HVS low pass filter
very insensitive to high
spatial and temporal
frequency

3
Introductions

Conditional Replenishment
Motion-compensated Coding
3-D Transform Coding
Hierarchical Coding
MPEG-X

4
Conditional Replenishment
5
Conditional Replenishment

1 bpp with SNR31.02 dB

6
3-D Transformation
7
3-D Transformation

0.5 bpp with SNR30.04 dB

8
Motion Compensated Coding

Block diagram

9
Motion Compensated Coding

Motion estimation

10
Motion Compensated Coding

Exhaustive search

11
Motion Compensated Coding

Logarithmic search

12
Motion Compensated Coding

Three step search

13
Motion Compensated Coding

Hierarchical search - Zenith

14
Motion Compensated Coding

0.125 bpp with SNR35.8 dB

15
MPEG-X

MPEG-1 (ISO/IEC 11172, Nov 92)
Compression standard for progressive frame-based
video in SIF(360?240), targeted at 1.5 Mbits/s
1.2 Mbits/s for video, 250 Kbits/s for audio
Applications VCD, MP3
MPEG-2 (ISO/IEC 13818, Nov 94)
Compression standard for interlaced frame-based
video in CCIR-601(720?480) and high definition
format(1920 ?1088), wide range of bit rates 4 to
80 Mbits/s
Optimized around 4 Mbits/s
Applications DVD, HDTV Studio, and etc

16
MPEG-X

MPEG-4 (ISO/IEC 14496, Oct 98)
Multimedia standard for object-based video for
nature or synthetic source
Coding for various bandwidth (5 Kbps 270 Mbps)
Applications Internet, cable TV, 3G wireless
communication, and etc
MPEG-7 (ongoing)
Multimedia content description interface
Applications Internet, video search engine,
digital library
Specify only bitstream syntax and decoding

17
Why does company work in the standards?

Interoperability
War of formats
VHS vs. Beta DVIX vs. DVD
Patent Royalties
Licensing fee for an MPEG-2 box is US 4 from
MPEGLA
Total licensing fee for DVD is around US 10
Big companies can avoid being taxed by other
companies
250 Millions per year for RCA patent profiles
Create new markets
VCD Video Compact Disk
DVD Digital Versatile Disk
DBS Direct Broadcast System
HDTV Grand Alliance in the US, DVB in Europe

18
MPEG-1

Requirements (Part)
Coding of generic video with good quality (about
VHS video) at 1 to 1.5 Mbits/s
Random access to a frame in limited time
Frequent access points
Fast forward/reverse
Seek and play in FF/FR using access points
System supporting audio-visual synchronized
play/access
A practical/implementable decoder
And etc

19
MPEG-1

New Features (w.r.t H.261)
Flexible picture sizes, picture rates, etc
Picture size up to 4096?4096 supported
Normally at 360?240
Picture rates 23.976, 24 (movies), 25 (PAL),
29.97, 30, 50, 59.94, 60
Bi-directional motion compensation
Half-pel motion compensation
VLC for MV difference
And etc

20
MPEG-1

Motion compensation
Same as H.263
Half-pel resolution for motion vectors
Differential coding of motion vectors
Motion compensation on 16?16 luminance blocks
Motion vectors divided by 2 for chrominance
Different from H.263
VLC for MV difference

21
MPEG-2

Requirements (Part)
ITU-R 601 interlaced video with high quality at
49 Mbits/s
Scalable video coding for multi-quality video
applications
Maximum interoperability/compatibility with
MPEG-1
Support coding of non-interlaced and interlaced
formats of many frame rates
Support video formats of various aspect ratios
And etc

22
MPEG-2

NEW features
Allow 422 and 444 formats for chrominance
Frame-pictures and field pictures
Frame/field/dual-prime adaptive motion
compensation
New VLC table for DCT coefficients
Nonlinear quantization table
Slice always start and end at the same raw of
macroblocks
Motion vectors always coded in half-pel
And etc

23
MPEG-2

Chrominance Sampling

24
MPEG-2
25
MPEG-2

Coding of interlaced video
Frame-pictures or field pictures
Motion compensation
Frame prediction for frame-pictures frame
motion vectors
Same as MPEG-1
Field prediction for field pictures field
motion vectors
Field prediction for frame-pictures
Prediction from either field of the previous
frame
Good for fast motion
Dual-prime two predictions are formed for each
field from the two recent fields. They are then
averaged for final prediction
Field-pictures or frame-pictures
Only for P-pictures
16?8 MC for field pictures - topbottom halves of
each macroblock

26
MPEG-1/2

Data Structure M3, N15 (tradeoff on M)

27
MPEG-1/2

I-frames
No temporal redundacy reduction
Has the highest bit count
For random access, FF, REW features
P-frames
Forward motion-compensated prediction
B-frames
Both forward and backward motion-compensated
prediction
Usually results in the lowest bit count
Increase delay

28
MPEG-1/2

I-frame
JPEG DCT like
Highest data rate
Random access
FF/FR

29
MPEG-1/2

P-frame
motion compensated
coding
Predicted by previous
I or P frame
Prediction error is then
coded and transmitted

30
MPEG-1/2

B-frame
delay
buffer
lowest
data rate
Higher coding
efficiency
No error propagation

31
MPEG-1/2
intraframe processing
Variable Length Encoder
Buffer Control Strategy
Predictive frame processing
interpolative frame processing
32
MPEG-1/2 Results
Bit rate (Mbits/s) SIF-30 CVGA CCIR 601 29.97 FPS VGA HDTV 29.97 FPS HDTV 60 FPS SVGA
pels 352 720 1920 1280
Lines 240 480 1080 720
Original bit rates (Mbps) 30.4 121.5 745.7 663.6
1.1 Mbps Good Poor
4.0 Mbps Excellent Good
9.0 Mbps Excellent Excellent
18.0 Mbps Excellent Good Good
28.0 Mbps Excellent Excellent
33
MPEG-1 Demo

Original Y(720?480), UV(360 ?480)
Acer
original 20 60 96
Bike
original 33 49 66 81 97
Foot
original 33 50 66 108 109
Table Tennis
Original 40 49 98 147 150

34
MPEG-2 Demo
35
TV, Telecom Computer Convergence

Past
Television RCA, TCI, GI etc.
Telecommunications ATT, Hughes etc.
Computer IBM, Microsoft etc.
Now
Television
Hugues Thomson (RCA) Sarnoff -gt DIRECTV
Microsoft Philips -gt WebTV
Telecommunications
ATT TCI _at_home -gt Local Telephone Service
Internet
Computer
Window CE TV set-top box -gt Venus

36
MPEG-4

What existing standards can do
MPEG-1 Frame-based non-interlaced video (1.5
Mbps)
MPEG-2 Frame-based interlaced video (4 Mbps
270 Mbps)
H.261 Low bit rate video conference (64?p Kbps)
H.263 Very low bit rates video conference (10
Kbps)
What the existing standards can not do
Coding of video object with content information
(Metadata)
Coding of images for progressive transmission
Coding of multimedia information for various
bandwidths and media (5 Kbps 270 Mbps)
Interactive

37
What applications are relevant to us?
38
Internet Image RetrievalJPEG vs. EZW
After 1 second
After 4 seconds
After 8 seconds
39
Multiresolution Feature Search Using Wavelet
40
Internet Commerce Using Metadata
41
Telemedicine Using Wavelet Compression
42
Consumer Videophone - Modes Applications
Family / Home
Stand-alone wired videophone
PC-based videophone
POTS/ISDN
LAN
Good spatio-temporal Quality Low end-to-end
Delay Channel Error Resilience
LAN ISDN
Virtual classroom Virtual meeting
43
Virtual Set Example
44
MPEG-4 Virtual Set Compositing
45
MPEG-4 Virtual Set
46
MPEG-4 Key Functionalities
Compression
Content-based interactivity
Universal access
47
MPEG-4 Key Functionalities

Content-based interactivity
A scene is composed of audio-visual objects
Not just pixels or moving blocks
Objects can be of different nature
Text or images
Rectangular or arbitrary shape
2D or 3D objects
Natural or synthetic
Different coding schemes applied to different
objects
Composer puts objects back in a scene

48
MPEG-4 Key Functionalities

Universal access
Robustness in error-prone environments
Allow applications over wired and wireless
networks
Robust for severe error conditions, e.g. long
error bursts
Content-based scalability
Allowing scalability in content, quality and
complexity
Achieving content based scaling of visual
information

49
MPEG-4 Key Functionalities

Compression
Improved coding efficiency
MPEG-4 video shall provide subjectively better
visual quality at comparable bit rates compared
to existing or emerging standards
5-64 Kbps for mobile applications
Up to 20 Mbps for TV/film applications
Coding of multiple concurrent data streams
Can code multiple views of a scene efficiently,
e.g. stereo video

50
MPEG-4 AV Objects

Audiovisual Scene is composed of objects (AV)
Compositor puts objects in scene (AV, 23D)
Objects can be of different nature
natural or synthetic AV, text graphics,
animated faces, arbitrary shapes or rectangular
Encoding the object independently
Coding scheme can differ for individual objects
From low bitrates to (virtually) lossless quality

51
Object Functionalities in MPEG-4
52
Video object planes (VOPs)

Imagedifferent objectstextbackground (VOPs)
Single VOP backward compatible to MPEG-1/2
Composition or segmentation
Note segmentation is outside the scope of
MPEG-4
Characteristics of VOP
Separate object coding
Separate object manipulation
May have different spatial and temporal
resolutions
May be associated with different degrees of
accessibility sub-VOPs
May be separated or overlapping

53
Separated and overlapping VOP
54
Content-based object manipulation

Change of the spatial position of a VOP
Application of a spatial scaling factor to a VOP
Change of the speed with which a VOP moves
Insertion of new VOPs
Delete of an video object (VO) in the scene
Successive VOPs belonging to the same physical
objects in a scene are referred as vider objects
(VO)
Change of the scene area

55
Example of bit stream manipulation (1)
56
Example of bit stream manipulation (2)
57
Segmentation process

Depending on applications, segmentation can be
performed
Online (real-time) or offline (non-real time)
Automatic or semi-automatic
Examples
Video conferencing
Real-time, automatic
Separate foreground (communication partner) from
background
Object tacking in video
May allow off-line and semi-automatic
Separate moving object from others

58
Video object plan formation

Rectangular or arbitrary

59
Demo

Segmentation
Bit stream manipulation

60
Video object-based coding

Each Video Object in a Scene is Coded and
Transmitted Separately

61
Data structure in visual part of MPEG-4
Visual object Sequence (VS)
Video Object (VO)
Video Object Layer (VOL) Still Object Layer (SOL)
SOL0
Group of Video Object Plane (GOV)
Video Object Plane (VOP)
62
MPEG-4 Video Decoder
63
MPEG-4 Video Decoder
64
MPEG-4 Video Decoder

Scene description is necessary.
A language called the Binary Format for Scenes
(BIFS) based on the Virtual Reality Modeling
Language (VRML) has been developed by MPEG for
scene description.
The decoder can use the scene description and
additional input from the user to combine or
compose the objects to reconstruct the original
scene or create a variation on it.

65
MPEG-4 standard video tools
66
MPEG-4 standard video tools

The glue that will bind these tools together is
the MPEG-4 systems description language (MSDL)
which will have several components, including
Definitions for the interfaces between the coding
tools,
A mechanism to combine coding tools,
A mechanism to download new tools.
The MSDL will transmit to the decoder the bit
stream and the manner in which the tools have to
be used at the decoder to reconstruct the audio
and video.

67
Shape coding tool

Every VOP is coded macroblock by macroblock
The bounding rectangle of the VOP is extended on
the right-bottom side to multiples of 16x16
blocks (macroblock).

68
Description of VOP
69
Shape Coding

Binary alpha planes (shape information) are
encoded by context-based arithmetic encoding
(CAE).
Gray scale alpha planes are encoded by motion
compensated DCT similar to texture coding.
An alpha plane is bounded by a rectangle that
includes the shape of a VOP.
Intra (I-VOPs and P-VOPs) or inter (P-VOPS and
B-VOPs) shape coding at macro block level
Inter motion compensated shape

70
Shape Coding (cont.)

Motion vectors from texture motions or shape
motion of neighboring blocks
Coding modes
Opaque
Transparent
No-update
Intra Context based Arithmetic Encoding
Inter Context based Arithmetic Encoding
Lossless
Lossy
Motion compensation without update
Sub-sampling by factor 2 or 4

71
Shape Coding Tools - CAE

Context based Arithmetic Encoding (CAE) of the
pixel ?
Intra
Inter

72
Shape Coding Tools - CAE

Compute a context number.
Index a probability table using the context
number
Use the indexed probability to drive an
arithmetic encoder.

73
Motion Compensated DCT

A hybrid coding scheme used in H.261, H.263,
MPEG-1 and MPEG-2
Reduces the temporal/spatial correlation of video
objects in two steps
Temporal by motion compensation
Spatial by Discrete Cosine Transform (DCT)
transform coding.

74
Block Based Motion Compensation

Models transversal motion of block in frames with
a motion vector.
Motion compensation is performed block by block.

75
Motion Compensation Tools
Motion compensated coding modes (I, B, P)
76
Motion Compensation Tools - Motion Computation
77
Motion Compensation Tools - Padding
Process of normal padding of a block
Process of padding of a VOP
Process of extended padding of a block
78
Padding
79
Motion estimation compensation
80
Block-based compatibility for VOP
81
Texture Coding Tools (1/3)

The intra VOPs, as well as residual errors after
motion compensated prediction, are coded using
DCT on 8?8 blocks, in a manner similar to that
employed in MPEG-1, MPEG-2, H.261, and H.263.
Backward compatible to MPEG-1 and MPEG-2
Efficient prediction of DC and AC coefficients
for intra and inter-coded blocks can also be
employed (this approach is not available in
MPEG-1 and MPEG-2).

82
Texture Coding Tools (2/3)
83
Texture Coding Tools (3/3)
84
Adaptive DC prediction (texture coding)
Block (8x8)
85
Adaptive AC prediction (texture coding)
86
Coefficients Scanning (texture coding)

Alternate-Horizontal scan

Alternate-Vertical scan

zig-zag scan (H.263/MPEG-1)
87
Quantization (texture coding)

Method 1 Similar to that of H.263
Method 2 Similar to that of MPEG-2
Optimized non-linear quantization of DC
coefficients
Quantization matrices and loading mechanism

88
Scalability

Object scalability
Achieved by the data structure used and the shape
coding
Temporal scalability
Achieved by generalized scalability mechanism
Spatial scalability
Achieved by generalized scalable mechanism

89
Object scalability
90
Temporal scalability
91
Spatial scalability
92
Static Sprite Coding Tools (1/5)

A sprite is an image composed of pixels belonging
to a video object visible throughout a video
segment.
For instance, sprite generated from a panning
sequence will contain all the visible pixels of
the background object throughout the sequence.
Portions of this background may not be visible in
certain frames due to the occlusion of the
foreground objects or the camera motion.
Thus, the sprite contains all parts of the
background that were at least visible once.

93
Static Sprite Coding Tools (2/5)

The sprite encoding syntax can be utilized for
the transmission of any still image to the
decoder since a sprite is essentially just a
still image.
Static sprites are those that are directly copied
(including appropriate warping and cropping) to
generate a particular rendition of the sprite at
a particular time instant.
Sprite the panoramic view of the back ground.
Improves the coding efficiency for video
sequences with lots of revisiting backgrounds.

94
Static Sprite Coding Tools (3/5)
The main idea of static sprite coding technique
is to generate the reconstructed VOPs by directly
warping the quantized sprite using specified
motion parameters. Residual error between the
original VOP and the warped sprite is not added
to the warped sprite.
95
Static Sprite Coding Tools (4/5)

Basic sprite (a large static image) coding
Low latency sprite coding (sent hierarchically)
Scalable sprite coding

96
Static Sprite Coding Tools (5/5)
Sprite
Foreground Object

Decoded Frame
97
Wavelet Tool (1/3)

Discrete Wavelet Transform (DWT)
Still image coding mode
Separate DC band Coding
Zero-Tree Scanning (ZTS) and Multiscale Zero-Tree
Entropy (MZTE) coding

98
Wavelet Tool ZTS (2/3)

A general architecture for zerotree coding.
Provides tradeoffs between scalability,
complexity and efficiency.

Zero Tree Scanning (ZTE)
99
Wavelet Tool Multiscale ZTE(3/3)
100
Arbitrary Shaped Wavelet Tool

Shape-Adaptive Wavelet A more general case of
rectangular wavelet.
Zerotree Coding encodes only the interior nodes
Downsampling the original shape to obtain the
shapes in different resolutions.
Using a shape coding scheme to include the shape
information.

101
Shape adaptive wavelet coding - SNR Scalability
bitstream
30kbits
8kbits
5kbits
102
Shape adaptive wavelet coding - Spatial
Scalability
103
12-Bit Video Coding Tool

Allows compression of video data with precision
of up to 12-bits/pixel
The syntax, semantics, and coding tools are
extended
bit-precision
extended DC VLC tables
extended quantization mechanism
Insertion of marker bits to avoid start code
emulations

104
MPEG-4 demo

Coastguard
Original 128 Kbits/sec
Foreman
Original 128 Kbits/sec
Hall_monitor
Original 128 Kbits/sec

105
Summary (1/2)

MPEG-4
The first content-based standard, addressing
multimedia.
Object-based representation of a scene.
Both natural and synthetic.
Compression many other features.
Normative Decoder (i.e bitstream syntax and
decoding algorithm).

106
Summary (2/2)

MPEG-4 Visual
Tool box approach, i.e. consists of many tools.
One tool, one functionality.
Set of tools Object.
Set of objects Combination Profile.
Conformance points on combination profiles.

107
SNHC Tools

SNHC Synthetic/Natural Hybrid Coding
An MPEG-4 subgroup working on the synthetic
tools.
Tools for Version 1
Face Animation
Dynamic 2D Meshes
Scalable Textures

108
SNHC Tool Face Animation

Face an object capable of facial geometry ready
for rendering and animation
A synthetic representation of a human face
visual manifestations of speech are intelligible
facial expressions allow recognition of moods
specified by the parameters in the incoming
bitstream

109
SNHC Tool Face Animation
Defines a specific face via - 3D feature points -
3D mesh/scene graph - Face Texture - Face
Animation Table
110
SNHC Tool Dynamic Meshes

Specifically refers to triangular meshes
Tessellation of a 2D visual object plane into a
connection of triangular patches
No addition and deletion of nodes, i.e. no change
in topology.
Used for video object
manipulation

111
(No Transcript)
112
SNHC Tool Dynamic Meshes
113
SNHC Tool Dynamic Meshes
114
Conclusions

Refer to exercise for further information
Other search strategies
Motion JPEG (MJPEG)
DV
Already used for high quality video coding
Motion JPEG2000 (MJPEG2000)
JPEG 2000 for video
Collaborative or Competitive ?
Compression ratio may be higher than MPEG-1/2
Symmetric algorithm, however
Part III of JPEG2000 standard
original 40 53 67 80 100

115
Conclusions

Standard is usually not the best
Demo
Avxing
Window media video

116
Acknowledgement

I would like to thank Prof. Tihao Chiang
of National Chiao-Tung University and his
ex-colleagues Dr. Ya-Qin Zhang, Iraj Sodagar, and
Sriram Sethuraman. Professor Chiang generously
gave me his transparency master for MPEG-4
tutorial and this helped me very much in
preparing this lecture.

Write a Comment

User Comments (0)