Video Compression and MPEG

About This Presentation

Title:

Video Compression and MPEG

Description:

Video Compression and MPEG B. Acharya Video Basics Image Video Basics - Scanning Scanning is a process of sampling of a continuously varying 2D signals. – PowerPoint PPT presentation

Number of Views:469

Avg rating:3.0/5.0

Slides: 95

Provided by: B206

Category:

more less

Transcript and Presenter's Notes

Title: Video Compression and MPEG

1
Video Compression and MPEG

B. Acharya

2
Video Basics
3
Image
Video Cable
Video Monitor
4
Video Basics - Scanning

Scanning is a process of sampling of a
continuously varying 2D signals.
Raster Scanning converts 2-D image intensity into
1-D waveform.

5
Video Basics - The Scanning Raster
625 lines (PAL- Europe)
525 lines (NTSC)
Horizontal Blanking
Vertical Blanking
Active Video
6
Video Basics - The Progressive Raster
Scan lines viewed edge-on
y
Active Video
Note All scan lines are sampled at each time
instant.
Vertical Blanking
time
x
7
Video Basics - The Interlaced Raster
8
Video Basics Interlaced Raster Scan

IRS scans the pictures by sampling two fields
at different times such that two consecutive
lines of a frame belong to alternate fields.
This allows slow moving objects to be perceived
at higher vertical details and fast moving
objects at higher temporal rates.
It is used extensively in TV because of the band
width considerations, flickers and resolutions.

9
Common Rasters for Video Coding
10
Interlacing

Background
In 1930s, interlaced scanning was developed as a
bandwidth saving technique.
Persistence of vision causes two fields to fuse
into single image, without flicker.
All broadcasting today uses interlaced scanning.
Advantages
High vertical detail retained for still portions
of the scene.
Drawbacks
Reduced vertical detail for moving areas
Flicker at edges of objects (e.g., text), which
is why computer industry uses progressive
scanning for monitors.
More complicated signal processing for resizing,
frame rate conversion, etc.

11
Human Vision Basics

Human Visual System (HVS) has limitations that
can be exploited for video system design
limited response to black-and-white detail
even more limited response to color detail
image motion appears fluid at rates above 24 Hz
limited ability to track rapidly moving objects
insensitivity to noise
at object edges
in highly detailed areas of a scene
in bright areas of a scene
immediately after scene changes

12
Colorimetry Basics

In broadcast and studio applications, the
gamma-corrected RGB taking primaries are
transformed to YC1C2 transmission primaries.
Y is the luminance (luma) component C1 and C2
are the chrominance (chroma, or color difference)
components.
To exploit the HVS reduced spatial response to
chroma, C1 and C2 are further bandlimited in
spatial frequency compared to Y.
The exact transformation matrix is
system-dependent.

13
Colorimetry Basics

In 8-bit implementations,
Y occupies 220 levels 16, 235
Cr and Cb occupy 225 levels 16, 240

14
Compression

Data Information Redundancy
I need a glass of water, which is scientifically
called H2O
I need a glass of water
Compression Reduce Redundancy

15
Redundancy

Spatial
Similarity in pattern due to position
Temporal
Similarity in pattern over time
Statistical
Similarity due to pattern of occurrence

16
Image Compression Standards

Binary (Bi-level, BW) images
ITU-T Gr., Gr43 (Fax) (1980), JBIG (1994), JBIG2
Continuous Tone Still Images
Both Gray and Colour Image
JPEG (1992)
JPEG 2000
Moving Pictures
MPEG 1(1994), MPEG2 (1995)
MPEG 4 (96-03), MPEG 7, MPEG 21
H.261 (1990), H.263 (1995), H.264 (ongoing)

17
Image Compression -- Needs

Image (Signal) Processing
Decorrelation, Transformation
Reduce redundancy, compact representation
Quantization (Psychoanalysis)
Mask redundant data, loss of information
Reduce entropy
Entropy Encoding (Information Theory)
Encode data losslessly
Compact representation for compression
Variable-length (Run-length, Huffman, Arithmetic,
etc.)

18
Entropy

E average amount of information contained per
source sysmbol
-p(ak) x log2 p(ak)
Limit of compression
Example
Pre-processing can improve compression

19
Example (entropy)

Data 1 2 0 1 1 2 3 1 2 3 1 1 1 2 2 2
Symbols 0, 1, 2, 3
Probability 0.0625, 0.4375, 0.375, 0.125
E - ? pi log(p(ai))
-((-1.2) .0625 (-0.359)0.4375
(-0.426)0.375 (-0.903)0.125)
0.505

20
Pre-processing (Entropy)

Pre-processing ak? ak ak-1,
where kgt 1, a0 0
Data 1 1 2 1 0 1 1 2 1 1 2 0 0 1 0 0
S 0, 1 2
P 5/16, 8/16, 3/16 .3125, 0.5, 0.1875
E 0.445

21
What do we want in video?

Real time (Live viewing)
Low delay (No jitter)
Good quality (Minimal loss of information)
Easy and useful interactivity
Play, pause, random access, fast forward
Something more? ? ?
Content based retrieval, Editable, Movie quality
(high motion, spatial scalability)

22
Target area of DVT

Broadcasting
High bandwidth
Better quality
No delay
Internet (I/P Network)
Low Bandwidth
Restricted quality
Delay
Jitter
Loss of data
Quality degradation

Wireless
Low bandwidth
Small resolution
Future Technology
Interactive
Broadcasting
Advertisement
Games
Multimedia

23
Solutions

Decrease size of source
Compression
Retain quality
Eat the cake and have it too
Better Delivery
Handle delay
Conceal error
Post-processing

24
Video Compression
25
What is Video Compression?...Orange Juice
Analogy...
26
So? What to do?

Exploit limitations in Human Visual System
Limited color sensitivity (downscale CB and CR)
Limited sensitivity to edges (reduce high
frequency)
Can attain 501 or more compression efficiency
Remove spatial and temporal redundancy that exist
in natural video imagery
correlation itself can be removed in a lossless
fashion
only realizes about 21 compression efficiency

27
Step 1 Pre-processing

Pre-processing
Color conversion
RGB ? YCBCR
Downsizing color components
420, 422
? Reduction in source size

28
Chroma Formats and Picture Sizes
29
Macroblock Structures
30
Step 2 Transformation

Transformation
Want to discard high frequency components
Little visual quality loss
Spatial domain to frequency domain
Discrete Fourier Transform, Discrete Cosine
Transform

31
DFT

Any periodic function F(t), with period T, may be
represented by an infinite series of the form.

32
Cosine Transform

Original image M x N
A(i,j) intensity at (i,j) location
B(k1, k2) DCT coefficients

33
DCT and IDCT Formulas
34
DCT

DCT is an orthogonal transformation
2-D DCT is separable in x and y dimensions
Has good energy compaction properties
Efficient hardware realization
Theoretically lossless, but slightly lossy in
practice due to round off errors

35
DCT (contd)
After DCT
DC
low horizontal high
low vertical high
8x8 Forward DCT
pixels
DCT coefficients
36
DCT Example
Flower Garden
Block of 8x8 Pixels
Their DCT Coefficients
DC
Flat Area
Vertical Edge
Horizontal Edge
Diagonal Line
Single Pixel
37
2-D DCT Basis Images
38
Advantage DCT

Separates the image into parts
Spectral sub-bands of differing importance (with
respect to the image's visual quality).
All DCT multiplications are real
lowers the number of required multiplications
compared to DFT
For most images, much of the signal energy lies
at low frequencies

39
Step 3 Quantization
STEPS

Dividing DCT-coefficients by a number
Divisor is frequency-dependent value
Rounding or truncating to the nearest integer

Inverse quantization is like multiplication
Quantization coefficients can be tailored to
noise sensitivity of Human Visual System
Quantization is LOSSY!
Quantization causes information to be
irretrievably lost

40
Quantization - Example
41
Quantization Effect
42
Quantization Artifacts
43
Artifacts - Example

44
Step 4 Spatial Prediction

Neighbouring pixels have similarity
DCT coefficients of neighboring blocks have
correlation
Consider Left, Top, Left-Top

T
L-T
L

Differential coefficients are smaller
Lesser bits required to encode
Encode the difference coefficients

Similar neighbors
45
Difference Image
?

Pixel wise difference

46
Step 5 Scanning Order

Its rearrangement
Most of the coefficients after quantization
becomes zero
Zigzag Scan Order

1
0
0
0
0
DC
35
1
2
3
2
-1
0
0
0
0
0
1
0
-1
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
35, 1, 3, 1, 2, 2, 1, -1, 0, 0, 0, 1, -1, 0, 0,
0, ., 0
47
DC Coefficients

DC is average luminance/chrominance
Largest of 64 block coefficients
Kept as high as possible
DC moves slowly between blocks
? Differential encoding
Example DC values 12, 13, 11, 11, 10, .
Differences 12, 1, -2, 0, -1, .

48
Differential Encoding

Values are not sent as it is (bits)
Coded as (length, value) pair
Length number of bits used
Value actual bits used to represent the value
Example

Value Length Code
12 4 1100
1 1 1
-2 2 10
0 0
-1 1 0
49
AC Coefficients

Smaller values
Compared to DC values
Contain zeros, even after zigzag scanning
35, 1, 3, 1, 2, 2, 1, -1, 0, 0, 0, 1, -1, 0, 0,
0, ., 0
Skip the Zeros
Run Length Encoding

50
Run Length Encoding

Sequence (Run of zeros) encoded as pairs of (run,
value)
Run number of zeros in the run
Value next non-zero value

Example Sequence 35, 1, 3, 1, 2, 2, 1, -1, 0,
0, 0, 1, -1, 0, , 0 RLE (0,1), (0,3), (0,1),
(0,2), (0,2), (0,-1), (3,1), (0,-1), (0,0)
?(0,0) indicates end of block data
51
Further Encoding
Oops! Which way?

Replace long binary strings by shorter strings
(code words)
Length of code word depends on frequency of
occurrence
Small code occurs frequently
Huffman Coding
Provides tables of sequence and codeword
Has prefix property

52
Huffman Coding

Build a binary tree from least frequent symbol
Assign 1 to right edge and 0 to left edge

Sequence AAAABBCD
1.0
1
0
0.5
0.5
Character Frequency
A 4/8 0.5
B 2/8 0.25
C 1/8 0.125
D 1/8 0.125
Code
1
01
001
000
A
1
0
0.25
0.25
B
0
1
0.125
0.125
C
D
53
Step 6 Encoding

Length field of differential encoded DC
coefficients are Huffman coded
The prefix property helps decoder to determine
code unambiguously
Length and Run fields of AC coefficients are
grouped together and are Huffman coded
Also, has the default prefix property

54
Lets Recall

Sub-sample chrominance components

These steps give Intra-coded (I) frames

DCT of each 8 x 8 block

Quantize DCT coefficients

Scan each block in particular order

Code coefficients using Variable Length Coding

DCT
Q
Scan
VLC
55
Temporal Prediction

Similarity between consecutive frames
Most of the regions do not change
Small region changes due to motion
Use information of previous frame to predict
present frame

56
Gray-Scale Statistics of Prediction Error
One Frame of Original Image Pair
Prediction Error
Histogram
Histogram
57
How Does Motion Compensated Prediction Save Bits?
F
Current Macroblock
X
MVF
Motion Vector
Current Picture
Previous Picture

Good prediction means small prediction error
Needs fewer bits to code
Send DCT coefficients of (X F) block
Motion vectors are differentially coded
Difference with motion vectors of neighbouring
blocks

50 - 80 savings in bits
58
Prediction Direction (Forward)
Current
Previous
Forward
59
Prediction Direction (Backward)
Not a good match
Next
Current
Previous
60
Predictive Frames

Depends on direction that gives better prediction
P-frame (predictive)
B-frame (bi-directional predictive)

61
Motion Estimation motion vector
62
ME - MAD

MAD Mean Absolute Distortion

A search area is chosen for finding the MADs
Minimum MAD in the search area is chosen which
essentially gives the closest macroblock.

63
Forward Motion Estimation... used in P and B
frames ...
64
Example Forward Motion EstimationCase Good
prediction for still objects.
Inter-coded means predictive-coded or not-coded
65
Example Forward Motion EstimationCase Dealing
with featureless regions.
Macroblock Grid
Search Area
Previous I or P Picture. Within the search area,
many good matches are found. Encoder must pick
one and send appropriate motion vector.
Current P Picture. Current MB is shown with heavy
outline. Since a match is found, this MB is
intercoded.
66
Example of Forward Motion EstimationCase Good
prediction for linearly translating objects.
Macroblock Grid
Search Area
Current P Picture. Current MB is shown with heavy
outline. Since a match is found, this MB is
intercoded.
Previous I or P Picture. Within the search area,
a good match is found for this moving object.
Encoder sends appropriate forward motion vector.
67
Example of Forward Motion EstimationCase A good
prediction may be missed because it is outside
the search area.
Macroblock Grid
Search Area
Current P Picture. Current MB is shown with heavy
outline. Since no match is found, this MB is
intracoded.
Previous I or P Picture. Within the search area,
no good match is found. Note that a good match
would be found with a larger search area. Search
area is an important encoder design parameter.
68
Example of Forward Motion EstimationCase A good
prediction may come from an unrelated object.
Macroblock Grid
Search Area
Current P Picture. Current MB is shown with heavy
outline. Since a match is found, this MB is
intercoded.
Previous I or P Picture. Within the search area,
a good match is found, but within a different
object. There is no requirement that
motion vectors represent true motion of objects.
69
Example of Forward Motion EstimationCase
Prediction Error should have low energy.
Macroblock Grid
Prediction Error Picture, with MB Type and Motion
Vectors Superimposed. (I Intra, P Inter)
Previous I or P Picture
Current P Picture
70
Group of Pictures (GOP)

Intra (I) pictures ? intraframe-only spatial DCT
Predicted (P) pictures ? DCT with forward
prediction
Bi-directional (B) pictures ? DCT with
bi-directional prediction

71
Anchor Pictures

I and P pictures
stored in two frame buffers in encoder and
decoder
form the basis for prediction of P and B pictures

72
I Pictures

DCT coded without reference to any other pictures
stored in a frame buffer in encoder and decoder
used as basis of prediction for entire GOP

73
P Pictures
Forward Prediction

DCT coded with reference to the preceding anchor
picture
stored in a frame buffer in encoder and decoder
use forward prediction only

74
B Pictures

DCT coded with reference to either the preceding
anchor picture, the following anchor picture, or
both
use forward, backward or bi-directional prediction

75
Forward Prediction

a forward-predicted macroblock depends on decoded
pixels from the immediately preceding anchor
picture
can be used to code macroblocks in P and B
pictures

76
Backward Prediction
Time

a backward-predicted macroblock depends on
decoded pixels from the immediately following
anchor picture
can only be used to code macroblocks in B pictures

77
Bi-directional (Interpolated) Prediction

a bi-directionally-predicted macroblock depends
on decoded pixels from the anchor pictures
immediately following and immediately preceding
can only be used to code macroblocks in B pictures

78
Review Encoding Steps
Residual Image
-
DCT
Q
Scan
VLC

-
Q 1
Original Image
Predicted Image
Encoded Image
Motion Estimation
DCT -1

Motion Compensation
Reconstructed Image
Motion Vectors
79
Remember

Motion compensation uses decoded picture as
reference image

WHY????
80
A Typical Motion Estimation Architecture
81
Few More Terms

Group of Pictures (GOP)
Slice
Field Coding
Skipped Macroblocks
Rate Control

82
Picture Orderings
Group of Pictures

Two Distinct Picture Orderings
Display Order (input to encoder, output of
decoder)
Coding Order (output of encoder, input to
decoder)
These are different if B frames are present
B frames must be reordered so that future
anchor pictures are available for prediction.
Note that reordering causes DELAY!

83
Slice Structures

A slice is a collection of macroblocks in raster
scan order.
Restriction on slice sizes
MPEG-1 has none. Can be single MB or entire
picture.
MPEG-2 restricts a slice to be contained within a
row of macroblocks
MPEG-2 allows gaps between slices in General
Slice Structure
MPEG-2 defines Restricted Slice Structure, in
which no gaps are allowed. This is used in most
Profiles and Levels.

84
MPEG-2 Field/Frame DCT Coding

Frame DCT Normal MPEG-1 mode of coding
Field DCT Split into top and bottom fields
MPEG-2 encoder may choose Field DCT on any
macroblock.
Decoder must interpret coding flag correctly,
or severe errors will occur.

85
Skipped Macroblocks

MBs cannot be skipped in I Pictures
MBs can be skipped in P and B pictures if
certain rules apply

86
Rate Control

There may be delay between encoding and decoding
There should not be delay during displaying

Solutions
IntroduceBuffer
Rate control

87
Rate Control

A buffer is used to smooth out the bit rate
Rate controller adjusts quantizer
Overflow and underflow of decoders buffer
(Video Buffer Verifier)
Buffer size affects image quality and overall
delay
Rate control algorithm is crucial for high
quality compression

88
MPEG Encoder Block
Video In
Rate Control
Video Out
subtractor
Q
DCT
Buffer
Prediction
VLC
Q-1
RLC
MUX
Motion Compensator
DCT-1
SUM
Prediction Picture
Motion Vectors
Motion Estimator
89
MPEG-2 Video Decoding Process
NOTE This is a simplified, high-level
functional diagram that integrates several
separate diagrams in the MPEG-2 Video Spec
(ISO/IEC 13818-2).
90
Video Buffer Verifier (VBV)

The VBV is a hypothetical input rate buffer for
the video decoder
connected to the output of an encoder.
The encoder keeps track of the VBV fullness
must ensure that it does not overflow or
underflow.
Assuming constant end-to-end delay, the encoder
buffer is the mirror image of the VBV.

91
MPEG's VBV Water Tank Analogy(Normal Operation)
92
MPEG's VBV Water Tank Analogy(Overflow Condition)
93
MPEG's VBV Water Tank Analogy(Underflow
Condition)
94
VBV Buffer Size and VBV Delay
-T/2
95
CBR vs. VBR VBV Models
VBV Fullness
VBV Fullness
96
MUX- Video Bitstream
97
Sequence

For CD-ROM applications, sequences can be used to
indicate relatively long clips (e.g. shots,
scenes or entire movies)
For broadcast applications, sequence headers are
usually sent frequently (e.g., every GOP) so that
key bitstream info is obtained at channel changes

98
Major Application Areas

MPEG-1 Video
1 - 3 Mbps CD-ROM Multimedia
Telecommunications and Near Video on Demand
MPEG-2 Video
3 - 15 Mbps SDTV Broadcast (e.g., ATSC and DVB)
Digital Video Disk (DVD)
15 - 20 Mbps HDTV Broadcast (e.g., ATSC)
25 - 50 Mbps SDTV Production
100 - 300 Mbps HDTV Production

99
Concluding Remarks

The MPEG video compression standard is the result
of many years of competitive and, ultimately,
collaborative effort among many commercial and
academic laboratories
MPEG video compression can increase a
broadcasters channel capacity by 8x or more
MPEG video compression is being used successfully
in many application areas, such as
CD-ROM and DVD multimedia, Satellite Broadcast,
Terrestrial Broadcast, Cable Broadcast, Telco
Video-on-Demand Systems