Title: Video Compression and MPEG
1Video Compression and MPEG
2Video Basics
3Image
Video Cable
Video Monitor
4Video Basics - Scanning
- Scanning is a process of sampling of a
continuously varying 2D signals. - Raster Scanning converts 2-D image intensity into
1-D waveform.
5Video Basics - The Scanning Raster
625 lines (PAL- Europe)
525 lines (NTSC)
Horizontal Blanking
Vertical Blanking
Active Video
6Video Basics - The Progressive Raster
Scan lines viewed edge-on
y
Active Video
Note All scan lines are sampled at each time
instant.
Vertical Blanking
time
x
7Video Basics - The Interlaced Raster
8Video Basics Interlaced Raster Scan
- IRS scans the pictures by sampling two fields
at different times such that two consecutive
lines of a frame belong to alternate fields. - This allows slow moving objects to be perceived
at higher vertical details and fast moving
objects at higher temporal rates. - It is used extensively in TV because of the band
width considerations, flickers and resolutions.
9Common Rasters for Video Coding
10Interlacing
- Background
- In 1930s, interlaced scanning was developed as a
bandwidth saving technique. - Persistence of vision causes two fields to fuse
into single image, without flicker. - All broadcasting today uses interlaced scanning.
- Advantages
- High vertical detail retained for still portions
of the scene. - Drawbacks
- Reduced vertical detail for moving areas
- Flicker at edges of objects (e.g., text), which
is why computer industry uses progressive
scanning for monitors. - More complicated signal processing for resizing,
frame rate conversion, etc.
11Human Vision Basics
- Human Visual System (HVS) has limitations that
can be exploited for video system design - limited response to black-and-white detail
- even more limited response to color detail
- image motion appears fluid at rates above 24 Hz
- limited ability to track rapidly moving objects
- insensitivity to noise
- at object edges
- in highly detailed areas of a scene
- in bright areas of a scene
- immediately after scene changes
12Colorimetry Basics
- In broadcast and studio applications, the
gamma-corrected RGB taking primaries are
transformed to YC1C2 transmission primaries. - Y is the luminance (luma) component C1 and C2
are the chrominance (chroma, or color difference)
components. - To exploit the HVS reduced spatial response to
chroma, C1 and C2 are further bandlimited in
spatial frequency compared to Y. - The exact transformation matrix is
system-dependent.
13Colorimetry Basics
- In 8-bit implementations,
- Y occupies 220 levels 16, 235
- Cr and Cb occupy 225 levels 16, 240
14Compression
- Data Information Redundancy
- I need a glass of water, which is scientifically
called H2O - I need a glass of water
- Compression Reduce Redundancy
15Redundancy
- Spatial
- Similarity in pattern due to position
- Temporal
- Similarity in pattern over time
- Statistical
- Similarity due to pattern of occurrence
16Image Compression Standards
- Binary (Bi-level, BW) images
- ITU-T Gr., Gr43 (Fax) (1980), JBIG (1994), JBIG2
- Continuous Tone Still Images
- Both Gray and Colour Image
- JPEG (1992)
- JPEG 2000
- Moving Pictures
- MPEG 1(1994), MPEG2 (1995)
- MPEG 4 (96-03), MPEG 7, MPEG 21
- H.261 (1990), H.263 (1995), H.264 (ongoing)
17Image Compression -- Needs
- Image (Signal) Processing
- Decorrelation, Transformation
- Reduce redundancy, compact representation
- Quantization (Psychoanalysis)
- Mask redundant data, loss of information
- Reduce entropy
- Entropy Encoding (Information Theory)
- Encode data losslessly
- Compact representation for compression
- Variable-length (Run-length, Huffman, Arithmetic,
etc.)
18Entropy
- E average amount of information contained per
source sysmbol - -p(ak) x log2 p(ak)
- Limit of compression
- Example
- Pre-processing can improve compression
19Example (entropy)
- Data 1 2 0 1 1 2 3 1 2 3 1 1 1 2 2 2
- Symbols 0, 1, 2, 3
- Probability 0.0625, 0.4375, 0.375, 0.125
- E - ? pi log(p(ai))
- -((-1.2) .0625 (-0.359)0.4375
- (-0.426)0.375 (-0.903)0.125)
- 0.505
20Pre-processing (Entropy)
- Pre-processing ak? ak ak-1,
- where kgt 1, a0 0
- Data 1 1 2 1 0 1 1 2 1 1 2 0 0 1 0 0
- S 0, 1 2
- P 5/16, 8/16, 3/16 .3125, 0.5, 0.1875
- E 0.445
21What do we want in video?
- Real time (Live viewing)
- Low delay (No jitter)
- Good quality (Minimal loss of information)
- Easy and useful interactivity
- Play, pause, random access, fast forward
- Something more? ? ?
- Content based retrieval, Editable, Movie quality
(high motion, spatial scalability)
22Target area of DVT
- Broadcasting
- High bandwidth
- Better quality
- No delay
- Internet (I/P Network)
- Low Bandwidth
- Restricted quality
- Delay
- Jitter
- Loss of data
- Quality degradation
- Wireless
- Low bandwidth
- Small resolution
- Future Technology
- Interactive
- Broadcasting
- Advertisement
- Games
- Multimedia
23Solutions
- Decrease size of source
- Compression
- Retain quality
- Eat the cake and have it too
- Better Delivery
- Handle delay
- Conceal error
- Post-processing
24Video Compression
25What is Video Compression?...Orange Juice
Analogy...
26So? What to do?
- Exploit limitations in Human Visual System
- Limited color sensitivity (downscale CB and CR)
- Limited sensitivity to edges (reduce high
frequency) - Can attain 501 or more compression efficiency
- Remove spatial and temporal redundancy that exist
in natural video imagery - correlation itself can be removed in a lossless
fashion - only realizes about 21 compression efficiency
27Step 1 Pre-processing
- Pre-processing
- Color conversion
- RGB ? YCBCR
- Downsizing color components
- 420, 422
- ? Reduction in source size
28Chroma Formats and Picture Sizes
29Macroblock Structures
30Step 2 Transformation
- Transformation
- Want to discard high frequency components
- Little visual quality loss
- Spatial domain to frequency domain
- Discrete Fourier Transform, Discrete Cosine
Transform
31DFT
- Any periodic function F(t), with period T, may be
represented by an infinite series of the form.
32Cosine Transform
- Original image M x N
- A(i,j) intensity at (i,j) location
- B(k1, k2) DCT coefficients
33DCT and IDCT Formulas
34DCT
- DCT is an orthogonal transformation
- 2-D DCT is separable in x and y dimensions
- Has good energy compaction properties
- Efficient hardware realization
- Theoretically lossless, but slightly lossy in
practice due to round off errors
35DCT (contd)
After DCT
DC
low horizontal high
low vertical high
8x8 Forward DCT
pixels
DCT coefficients
36DCT Example
Flower Garden
Block of 8x8 Pixels
Their DCT Coefficients
DC
Flat Area
Vertical Edge
Horizontal Edge
Diagonal Line
Single Pixel
372-D DCT Basis Images
38Advantage DCT
- Separates the image into parts
- Spectral sub-bands of differing importance (with
respect to the image's visual quality). - All DCT multiplications are real
- lowers the number of required multiplications
compared to DFT - For most images, much of the signal energy lies
at low frequencies
39Step 3 Quantization
STEPS
- Dividing DCT-coefficients by a number
- Divisor is frequency-dependent value
- Rounding or truncating to the nearest integer
- Inverse quantization is like multiplication
- Quantization coefficients can be tailored to
noise sensitivity of Human Visual System - Quantization is LOSSY!
- Quantization causes information to be
irretrievably lost
40Quantization - Example
41Quantization Effect
42Quantization Artifacts
43Artifacts - Example
44Step 4 Spatial Prediction
- Neighbouring pixels have similarity
- DCT coefficients of neighboring blocks have
correlation - Consider Left, Top, Left-Top
T
L-T
L
- Differential coefficients are smaller
- Lesser bits required to encode
- Encode the difference coefficients
Similar neighbors
45Difference Image
?
46Step 5 Scanning Order
- Its rearrangement
- Most of the coefficients after quantization
becomes zero - Zigzag Scan Order
1
0
0
0
0
DC
35
1
2
3
2
-1
0
0
0
0
0
1
0
-1
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
35, 1, 3, 1, 2, 2, 1, -1, 0, 0, 0, 1, -1, 0, 0,
0, ., 0
47DC Coefficients
- DC is average luminance/chrominance
- Largest of 64 block coefficients
- Kept as high as possible
- DC moves slowly between blocks
- ? Differential encoding
- Example DC values 12, 13, 11, 11, 10, .
- Differences 12, 1, -2, 0, -1, .
48Differential Encoding
- Values are not sent as it is (bits)
- Coded as (length, value) pair
- Length number of bits used
- Value actual bits used to represent the value
- Example
Value Length Code
12 4 1100
1 1 1
-2 2 10
0 0
-1 1 0
49AC Coefficients
- Smaller values
- Compared to DC values
- Contain zeros, even after zigzag scanning
- 35, 1, 3, 1, 2, 2, 1, -1, 0, 0, 0, 1, -1, 0, 0,
0, ., 0 - Skip the Zeros
- Run Length Encoding
50Run Length Encoding
- Sequence (Run of zeros) encoded as pairs of (run,
value) - Run number of zeros in the run
- Value next non-zero value
Example Sequence 35, 1, 3, 1, 2, 2, 1, -1, 0,
0, 0, 1, -1, 0, , 0 RLE (0,1), (0,3), (0,1),
(0,2), (0,2), (0,-1), (3,1), (0,-1), (0,0)
?(0,0) indicates end of block data
51Further Encoding
Oops! Which way?
- Replace long binary strings by shorter strings
(code words) - Length of code word depends on frequency of
occurrence - Small code occurs frequently
- Huffman Coding
- Provides tables of sequence and codeword
- Has prefix property
52Huffman Coding
- Build a binary tree from least frequent symbol
- Assign 1 to right edge and 0 to left edge
Sequence AAAABBCD
1.0
1
0
0.5
0.5
Character Frequency
A 4/8 0.5
B 2/8 0.25
C 1/8 0.125
D 1/8 0.125
Code
1
01
001
000
A
1
0
0.25
0.25
B
0
1
0.125
0.125
C
D
53Step 6 Encoding
- Length field of differential encoded DC
coefficients are Huffman coded - The prefix property helps decoder to determine
code unambiguously - Length and Run fields of AC coefficients are
grouped together and are Huffman coded - Also, has the default prefix property
54Lets Recall
- Sub-sample chrominance components
These steps give Intra-coded (I) frames
- Quantize DCT coefficients
- Scan each block in particular order
- Code coefficients using Variable Length Coding
DCT
Q
Scan
VLC
55Temporal Prediction
- Similarity between consecutive frames
- Most of the regions do not change
- Small region changes due to motion
- Use information of previous frame to predict
present frame
56Gray-Scale Statistics of Prediction Error
One Frame of Original Image Pair
Prediction Error
Histogram
Histogram
57How Does Motion Compensated Prediction Save Bits?
F
Current Macroblock
X
MVF
Motion Vector
Current Picture
Previous Picture
- Good prediction means small prediction error
- Needs fewer bits to code
- Send DCT coefficients of (X F) block
- Motion vectors are differentially coded
- Difference with motion vectors of neighbouring
blocks
50 - 80 savings in bits
58Prediction Direction (Forward)
Current
Previous
Forward
59Prediction Direction (Backward)
Not a good match
Next
Current
Previous
60Predictive Frames
- Depends on direction that gives better prediction
- P-frame (predictive)
- B-frame (bi-directional predictive)
61Motion Estimation motion vector
62ME - MAD
- MAD Mean Absolute Distortion
- A search area is chosen for finding the MADs
- Minimum MAD in the search area is chosen which
essentially gives the closest macroblock.
63Forward Motion Estimation... used in P and B
frames ...
64Example Forward Motion EstimationCase Good
prediction for still objects.
Inter-coded means predictive-coded or not-coded
65Example Forward Motion EstimationCase Dealing
with featureless regions.
Macroblock Grid
Search Area
Previous I or P Picture. Within the search area,
many good matches are found. Encoder must pick
one and send appropriate motion vector.
Current P Picture. Current MB is shown with heavy
outline. Since a match is found, this MB is
intercoded.
66Example of Forward Motion EstimationCase Good
prediction for linearly translating objects.
Macroblock Grid
Search Area
Current P Picture. Current MB is shown with heavy
outline. Since a match is found, this MB is
intercoded.
Previous I or P Picture. Within the search area,
a good match is found for this moving object.
Encoder sends appropriate forward motion vector.
67Example of Forward Motion EstimationCase A good
prediction may be missed because it is outside
the search area.
Macroblock Grid
Search Area
Current P Picture. Current MB is shown with heavy
outline. Since no match is found, this MB is
intracoded.
Previous I or P Picture. Within the search area,
no good match is found. Note that a good match
would be found with a larger search area. Search
area is an important encoder design parameter.
68Example of Forward Motion EstimationCase A good
prediction may come from an unrelated object.
Macroblock Grid
Search Area
Current P Picture. Current MB is shown with heavy
outline. Since a match is found, this MB is
intercoded.
Previous I or P Picture. Within the search area,
a good match is found, but within a different
object. There is no requirement that
motion vectors represent true motion of objects.
69Example of Forward Motion EstimationCase
Prediction Error should have low energy.
Macroblock Grid
Prediction Error Picture, with MB Type and Motion
Vectors Superimposed. (I Intra, P Inter)
Previous I or P Picture
Current P Picture
70Group of Pictures (GOP)
- Intra (I) pictures ? intraframe-only spatial DCT
- Predicted (P) pictures ? DCT with forward
prediction - Bi-directional (B) pictures ? DCT with
bi-directional prediction
71Anchor Pictures
- I and P pictures
- stored in two frame buffers in encoder and
decoder - form the basis for prediction of P and B pictures
72I Pictures
- DCT coded without reference to any other pictures
- stored in a frame buffer in encoder and decoder
- used as basis of prediction for entire GOP
73P Pictures
Forward Prediction
- DCT coded with reference to the preceding anchor
picture - stored in a frame buffer in encoder and decoder
- use forward prediction only
74B Pictures
- DCT coded with reference to either the preceding
anchor picture, the following anchor picture, or
both - use forward, backward or bi-directional prediction
75Forward Prediction
- a forward-predicted macroblock depends on decoded
pixels from the immediately preceding anchor
picture - can be used to code macroblocks in P and B
pictures
76Backward Prediction
Time
- a backward-predicted macroblock depends on
decoded pixels from the immediately following
anchor picture - can only be used to code macroblocks in B pictures
77Bi-directional (Interpolated) Prediction
- a bi-directionally-predicted macroblock depends
on decoded pixels from the anchor pictures
immediately following and immediately preceding - can only be used to code macroblocks in B pictures
78Review Encoding Steps
Residual Image
-
DCT
Q
Scan
VLC
-
Q 1
Original Image
Predicted Image
Encoded Image
Motion Estimation
DCT -1
Motion Compensation
Reconstructed Image
Motion Vectors
79Remember
- Motion compensation uses decoded picture as
reference image
WHY????
80A Typical Motion Estimation Architecture
81Few More Terms
- Group of Pictures (GOP)
- Slice
- Field Coding
- Skipped Macroblocks
- Rate Control
82Picture Orderings
Group of Pictures
- Two Distinct Picture Orderings
- Display Order (input to encoder, output of
decoder) - Coding Order (output of encoder, input to
decoder) - These are different if B frames are present
- B frames must be reordered so that future
anchor pictures are available for prediction.
Note that reordering causes DELAY!
83Slice Structures
- A slice is a collection of macroblocks in raster
scan order. - Restriction on slice sizes
- MPEG-1 has none. Can be single MB or entire
picture. - MPEG-2 restricts a slice to be contained within a
row of macroblocks - MPEG-2 allows gaps between slices in General
Slice Structure - MPEG-2 defines Restricted Slice Structure, in
which no gaps are allowed. This is used in most
Profiles and Levels.
84MPEG-2 Field/Frame DCT Coding
- Frame DCT Normal MPEG-1 mode of coding
- Field DCT Split into top and bottom fields
- MPEG-2 encoder may choose Field DCT on any
macroblock. - Decoder must interpret coding flag correctly,
or severe errors will occur.
85Skipped Macroblocks
- MBs cannot be skipped in I Pictures
- MBs can be skipped in P and B pictures if
certain rules apply
86Rate Control
- There may be delay between encoding and decoding
- There should not be delay during displaying
- Solutions
- IntroduceBuffer
- Rate control
87Rate Control
- A buffer is used to smooth out the bit rate
- Rate controller adjusts quantizer
- Overflow and underflow of decoders buffer
(Video Buffer Verifier) - Buffer size affects image quality and overall
delay - Rate control algorithm is crucial for high
quality compression
88MPEG Encoder Block
Video In
Rate Control
Video Out
subtractor
Q
DCT
Buffer
Prediction
VLC
Q-1
RLC
MUX
Motion Compensator
DCT-1
SUM
Prediction Picture
Motion Vectors
Motion Estimator
89MPEG-2 Video Decoding Process
NOTE This is a simplified, high-level
functional diagram that integrates several
separate diagrams in the MPEG-2 Video Spec
(ISO/IEC 13818-2).
90Video Buffer Verifier (VBV)
- The VBV is a hypothetical input rate buffer for
the video decoder - connected to the output of an encoder.
- The encoder keeps track of the VBV fullness
- must ensure that it does not overflow or
underflow. - Assuming constant end-to-end delay, the encoder
buffer is the mirror image of the VBV.
91MPEG's VBV Water Tank Analogy(Normal Operation)
92MPEG's VBV Water Tank Analogy(Overflow Condition)
93MPEG's VBV Water Tank Analogy(Underflow
Condition)
94VBV Buffer Size and VBV Delay
-T/2
95CBR vs. VBR VBV Models
VBV Fullness
VBV Fullness
96MUX- Video Bitstream
97Sequence
- For CD-ROM applications, sequences can be used to
indicate relatively long clips (e.g. shots,
scenes or entire movies) - For broadcast applications, sequence headers are
usually sent frequently (e.g., every GOP) so that
key bitstream info is obtained at channel changes
98Major Application Areas
- MPEG-1 Video
- 1 - 3 Mbps CD-ROM Multimedia
- Telecommunications and Near Video on Demand
- MPEG-2 Video
- 3 - 15 Mbps SDTV Broadcast (e.g., ATSC and DVB)
- Digital Video Disk (DVD)
- 15 - 20 Mbps HDTV Broadcast (e.g., ATSC)
- 25 - 50 Mbps SDTV Production
-
- 100 - 300 Mbps HDTV Production
99Concluding Remarks
- The MPEG video compression standard is the result
of many years of competitive and, ultimately,
collaborative effort among many commercial and
academic laboratories - MPEG video compression can increase a
broadcasters channel capacity by 8x or more - MPEG video compression is being used successfully
in many application areas, such as - CD-ROM and DVD multimedia, Satellite Broadcast,
Terrestrial Broadcast, Cable Broadcast, Telco
Video-on-Demand Systems