Title: CHAPTER 3 Fundamentals of Lossy Image Compression
1CHAPTER 3Fundamentals of Lossy Image Compression
2Lossy Compression System
- Lossy compression of images deals with
compression processes where decompression yields
an imperfect reconstruction of the original image
data. - There is always a bound on the minimum bit rate
of the compressed bit stream. - Image data tend to have a high degree of spatial
redundancy. - Within such a system, compression is achieved by
exploiting both the spatial redundancies within
the image and the perceptual characteristics of
the human visual system so that the loss due to
compression may not be discernible to the viewer. -
3Sample-Based Coding
- There are two classes of lossy compression
schemes for images - sample-based coding
- block-based coding
- Spatial domain block coding
- Transform-domain block coding
- In sample-based coding, the image samples are
compressed on a sample-by-sample bases. The
samples can be either in the spatial domain or in
the frequency. - Differential pulse code modulation (DPCM)
xij
eij
qij
Quantizer
qij
-
Pij
Pij
Predictor
Predictor
Encoder
Decoder
4Quantizer
- If the image is highly correlated, Pij will track
xij, and eij will consequently be quite small. - The residue signal eij is quantized. The
quantizer maps several of its inputs into a
single output. This process is irreversible and
is the main cause of information loss. - For a uniform quantizer, the quantization process
can be expressed as - Since the variance of eij is lower than the
- variance of xij, quantizing eij will not
introduce - significant distortion. Furthermore, the
lower - variance corresponds to lower entropy and
- thus to higher compression.
qij
5Block-Based Coding
- In spatial-domain block coding, the pixels are
grouped into blocks, and the blocks are then
compressed in the spatial domain. - In transform-domain block coding, the pixels are
grouped into blocks, and the blocks are then
transformed to another domain, such as the
frequency domain. - The motivation for transform coding is a more
compact representation of the data. - Some of the most commonly used transform include
the discrete Fourier transform (DFT), the
discrete cosine transform (DCT), the discrete
sine transform (DST), the discrete Hadamard
transform (DHT), and the Karhunen-Loeve transform
(KLT).
6Compaction Efficiency for Various Image Transforms
7Compaction Efficiency for Various Image
Transforms (Cont.)
- The KLT basis is the most efficient in terms of
compaction efficiency, since all the energy is
compacted into the top left corner. - It packs the most energy in the least numbers of
elements in Y. - It minimizes the total entropy of the sequence,
and - It completely decorrelated the element in X.
- The KLT has several implementation-related
deficiencies - The basis functions are image dependent. The
other basis functions (DFT, DCT, DST, and DHT)
are image independent. - The compaction efficiency of DCT basis is close
to the produced by the KLT. Therefore, it is
widely used in image and video compression
standards.
8Basic Transformation Forms
9Transform Coding
- Spatial image data (image or motion-compensated
residual image) are transformed into a different
representation, transform domain. - Make the image data easy to be compressed.
- Techniques
- Discrete cosine transform (DCT)
- Usually applied to small regular locks of image,
ex. 8 ? 8 squares. - JPEG, H26X, MPEG-x
- Discrete wavelet transform (DWT)
- Usually applied to larger image section, ex.
Tiles, or to complete image - JPEG 2000, MPEG-4 still texture
10Blocks
- Process the data in blocks of 8 x 8 samples
- Convert Red-Green-Blue into Luminance (greyscale)
and Chrominance (Blue colour difference and Red
colour difference) - Use half resolution for Chrominance (because eye
is more sensitive to greyscale than to colour)
11Discrete Cosine Transform
- Transform each block of 8 x 8 samples into a
block of 8 x 8 spatial frequency coefficients
12Discrete Cosine Transform
13An Example of Energy Compaction
14Two-Dimensional DCT (1974)
15Discrete Cosine Transform
- Any 8 x 8 block of pixels
- can be represented as a
- sum of 64 basis patterns
- (black and white patterns)
- Output of the DCT is the
- set of weights for these
- basis patterns (the DCT
- coefficients)
- multiply each basis pattern
- by its weight and add them
- together
- result is the original image
16Discrete Cosine Transform
- Most image blocks only contain a few significant
coefficients (usually the lowest frequencies)
17Hardware Architectures of Discrete Cosine
Transform
18Hardware/Software Tradeoff
- For low-end applications, using software is
powerful enough. - For high-end application, must use hardware
approach. - For middle-end applications, either software or
hardware approach is possible, depending on the
target design platform.
19DCT Algorithm Classification
- Direct 2-D Method
- The 2-D transforms, DCT and IDCT, to be applied
directly on the N ? N input data items. - Row-Column Method
- The 2-D transform can be carried out with two
passes of 1-D transforms. - The separability property of 2-D DCT/IDCT allows
the transform to be applied on one dimension
(row) then on the other (column) - Require 2N instances of N-point 1-D DCT to
implement an N ? N 2-D DCT.
20Straightforward Approach
- Carry out the computation as full matrix-vector
multiplications - 1-D transform requires N ? N multiplications and
N ? (N-1) additions - 2-D transform requires N4 multiplications and N ?
N ? (N ? N -1) additions - Although requiring the most number of operations,
this method is very regular. - Most suitable for vector processors or deeply
pipelined architecture for high PE utilization - 1-D fast algorithm ? O(NlogN)
- 2-D fast algorithm ? O(N2logN)
211-D DCT Definition
224-Point DCT (N4)
234-Point DCT Matrix Form
244-Point DCT
254-Point DCT
16 Mult reduced to 6
26Butterfly First DCT Stage
P0 M0
x(0) x(3)
P0 X(0) X(3) M0 X(0) X(3)
-
P1 M1
x(1) x(2)
P1 X(1) X(2) M1 X(1) X(2)
-
Reversed input order
27Butterfly Second Stage
X(0)P0P1?c2 X(1)M0 ? c1 M1 ? c3
X(2)P0-P1?c2 X(3)M0 ? c3 - M1 ? c1
P0 M0
X(0) X(1)
X(2) X(3)
P1 M1
c1
284-Point DCT
P0 M0
P1 M1
298-Point DCT
30Row-Column Method Example
- A. Madisetti and A. N. Willson Jr., A 100 MHz
2-D 8 ? 8 DCT/IDCT Processor for HDTV
Applications, IEEE Transactions on Circuits and
Systems for Video Technology, vol. 5, no. 2,
pp. 158-165, Apr. 1995.
31Description of Algorithms
32Description of Algorithms (Cont.)
- A straightforward implementation requires N4
multiplications for the evaluation of the DCT and
IDCT, respectively. - Decomposition to triple matrix product results in
a reduction in computational complexity to 2N3
multiplications. - Since 2N3 multiplications must be performed in N2
clock cycles (or input sample periods), the
computational requirement of such an
implementation is 2N multiplies per input sample. - For an input sample rate of 100 MHz, the
computation requirement is 1.6 GOPS, where each
operation is a multiply-accumulate.
33Row-Column Method
- Basic concept
- 2-D DCT 1-D DCT (Row) ? 1-D DCT (Column)
- Each 1-D DCT unit must be capable of computing N
multiplies per input sample.
YAX
ZYAT
Transpose Memory
1-D DCT/IDCT
1-D DCT/IDCT
Z
X
DCT
DCT for row
for column
34Row-Column Method (Cont.)
- Let first consider the computation of the triple
matrix product Z AXAT for the DCT or Z ATXA
for the IDCT. This is computed as Y AX and Z
YAT for the DCT and Y ATX and Z YA for the
IDCT.
35Computation of the DCT
- Even rows of A are even-symmetric and odd rows
are odd-symmetric.
36Matrix Decomposition
- Reduce an 8 ? 8 matrix computations to two 4 ? 4
matrix computations.
37Computation of the IDCT
38System Architecture
39System Architecture (Cont.)
Z
X
Y
40Architecture of Data Reorder Unit (DRU)
INSEL
41Data Flow of DRU
X(3)X(2)X(1)X(0)
Y(3)Y(2)Y(1)Y(0)
x0x1x2x3
42Data Flow of DRU (Cont.)
X0X1X2X3
X7X6X5X4
X0 X6 X2 X4
X0-X7 X1-X6 X2-X5 X3-X4
X7 X1 X5 X3
X0X7 X1X6 X2X5 X3X4
The first four clock cycles
43Data Flow of DRU (Cont.)
The next four clock cycles
44ACF Matrix-Vector Multiplication
45ACF Matrix-Vector Multiplier
Broadcasting to a, c, f multipliers
Timing and Control
xe
Ye
Mult a
Mult c
Mult f
ACC 0
ACC 1
ACC 2
ACC 3
MUX 41
46BDEG Matrix-Vector Multiplication
47BDEG Matrix-Vector Multiplier
48Hardwired Multiplier
Signed Digit Representation of the DCT
Coefficients
49Accumulator
50Transpose Memory
51Transpose Memory (Cont.)
52Finite Wordlength Analysis
53Implementation Results
541-D Approach with DA
55DCT Algorithm
56DCT Algorithm (Cont.)
57DCT Algorithm (Cont.)
58Block Diagram
59Input Data Format Converter
60PreAdd and Postadd
61DA-Based DCT Core
62DA-Based DCT Core (Cont.)
63DA-Based DCT Core (Cont.)
64Transpose Memory
651-D Approach with Systolic Array
- IEEE Transactions onCircuits and Systems for
Video Technology, Volume 5, Issue 2, April 1995
Page(s)150 - 157
66DCT Algorithm
67Three Steps
68Systolic Array
69Systolic Array (Cont.)
70Features of 1-D Approach with Systolic Array
71Direct 2-D DCT Architecture
72Direct 2-D DCT Architecture
73Data Flow Graph