Image and Video Compression - PowerPoint PPT Presentation

1 / 15

About This Presentation

Title:

Image and Video Compression

Description:

Number of Views:117

Avg rating:3.0/5.0

Slides: 16

Provided by: PadmaMund

Learn more at: https://redirect.cs.umbc.edu

Category:

Tags: compression | image | jpeg | standard | video

Transcript and Presenter's Notes

Title: Image and Video Compression

1
Image and Video Compression

Two dimensional array of pixel values
Spatial redundancy and temporal redundancy
Human eye is less sensitive to chrominance signal
than to luminance signal (U and V can be coarsely
coded)
Human eye is less sensitive to the higher spatial
frequency components
Human eye is less sensitive to quantizing
distortion at high luminance levels

2
JPEG Encoder

International standards body -- Joint
Photographic Experts Group
JPEG encoder schematic
Image/block preparation
DCT computation
Quantization
Entropy coding -- vectoring, differential
encoding, run-length encoding, Huffman encoding
Frame building

3
Image/block Preparation

Source image as 2-D matrix of pixel values
R, G, B format requires three matrices, one each
for R, G, B quantized values
In Y, U, V representation, the U and V matrices
can be half as small as the Y matrix
Source image matrix is divided into blocks of 8X8
submatrices
Smaller block size helps DCT computation and
individual blocks are sequentially fed to the DCT
which transforms each block separately

4
DCT Computation

Each pixel value in the 2-D matrix is quantized
using 8 bits which produces a value in the range
of 0 to 255 for the intensity/luminance values
and the range of -128 to 127 for the
chrominance values. All values are shifted to the
range of -128 to 127 before computing DCT
All 64 values in the input matrix contribute to
each entry in the transformed matrix
The value in the location F0,0 of the
transformed matrix is called the DC coefficient
and is the average of all 64 values in the matrix
The other 63 values are called the AC
coefficients and have a frequency coefficient
associated with them
Spatial frequency coefficients increase as we
move from left to right (horizontally) or from
top to bottom (vertically). Low spatial
frequencies are clustered in the left top corner.

5
Quantization

The human eye responds to the DC coefficient and
the lower spatial frequency coefficients
If the magnitude of a higher frequency
coefficient is below a certain threshold, the eye
will not detect it
Set the frequency coefficients in the transformed
matrix whose amplitudes are less than a defined
threshold to zero (these coefficients cannot be
recovered during decoding)
During quantization, the size of the DC and AC
coefficients are reduced
A division operation is performed using the
predefined threshold value as the divisor

6
Quantization Table

Threshold values vary for each of the 64 DCT
coefficients and are held in a 2-D matrix
Trade off between the level of compression
required and the information loss that is
acceptable
JPEG standard includes two default quantization
tables -- one for the luminance coefficients and
the other for use with the two sets of
chrominance coefficients. Customized tables may
be used

7
Entropy Coding

Vectoring -- 2-D matrix of quantized DCT
coefficients are represented in the form of a
single-dimensional vector
After quantization, most of the high frequency
coefficients(lower right corner) are zero.
To exploit the number of zeros, a zig-zag scan of
the matrix is used
Zig-zag scan allows all the DC coefficients and
lower frequency AC coefficients to be scanned
first
DC are encoded using differential encoding and AC
coefficients are encoded using run-length
encoding. Huffman coding is used to encode both
after that.

8
Differential Encoding

DC coefficient is the largest in the transformed
matrix.
DC coefficient varies slowly from one block to
the next.
Only the difference in value of the DC
coefficients is encoded. Number of bits required
to encode is reduced.
The difference values are encoded in the form
(SSS, value) where SSS field indicates the number
of bits needed to encode the value and the value
field indicates the binary form.

9
Run-length Encoding

63 values of the AC coefficients
Long strings of zeros because of the zig-zag scan
Each AC coefficient encoded as a pair of values
-- (skip, value), skip indicates the number of
zeros in the run and value is the next non-zero
coefficient

10
Huffman Encoding

Long strings of binary digits replaced by shorter
codewords
Prefix property of the huffman codewords enable
decoding the encoded bitstream unambiguously

11
Frame Building

12
Video Compression

Video as a sequence of pictures (or frames)
JPEG algorithm applied to each frame -- moving
JPEG (MJPEG). Exploits only spatial redundancy.
High correlation between successive frames. Only
small portion of each frame is involved with any
motion that is taking place.
A combination of actual frame contents and
predicted frame contents are used.
Motion estimation and motion compensation

13
Frame/Picture Types

Interframe and intraframe coding. High
compression ratios can be achieved by using both.
Random access requirement of image retrieval is
satisfied by pure intraframe coding.
I-frames are coded without reference to other
frames. Serve as reference pictures for
predictive-coded frames.
P-frames are coded using motion compensated
prediction from a past I-frame or P-frame.
B-frames are bidirectionally predictive-coded.
Highest degree of compression, but require both
past and future reference pictures for motion
compensation.
D-frames are DC-coded. Of the DCT coefficients
only the DC coefficients are present. Used in
interactive applications like VoD for rewind and
fast-forward operations.

14
Picture Sequence

15
MPEG-video Encoding

Input frames are preprocessed (color space
conversion and spatial resolution adjustment).
Frame types are decided for each frame/picture
Each picture is divided into macroblocks of 16 X
16 pixels.
Macroblocks are intracoded for I frames and
predictive coded or intracoded for P and B frames
Macroblocks are divided into six blocks of 8 X 8
pixels (4 luminance and 2 chrominance) and DCT is
applied to each block and transform coefficients
are quantized and zig-zag scanned and
variable-length coded.