Title: Video compression part 2
1Video compression ( part 2)
- Since its digital, theres no generation loss
- WRONG!
- All video compression is lossy
2Video compression concepts
- Spatial vs. temporal compression
- spatial intra-frame, just image compression
- temporal inter-frame
- based on motion compensation
- Two kinds of frames
- key frames spatial compression only
- difference frames spatial and temporal
- relative to some other frame
3Motion JPEG
- Just a sequence of JPEG images
- no temporal compression
- Supported by many capture cards
- 71 compression is possible
4Software Codecs
- Cinepak, Intel Indeo, Sorenson
- use vector quantization and
- temporal compression
- Cinepak and II use simple differencing
- Sorensons temp. comp. is sophisticated
- highly asymmetric
- VHS quality for 320x240, 12 fps
- Compression to 50 kBps
5MPEG-1
- Two ways to deal with moving material
- reuse the background in multiple frames
- move the foreground actor across the scene
- in practice, these are the same thing (why?)
- MPEG uses motion compensation
- attempts to find nearby areas with similar pixel
patterns - 16x16 macroblocks plus motion vectors
describing how they shift - not trying for perfect representation
6MPEG sequencing
- Three types of pictures (frames)
- I-picture intra-frame compression only
- P-picture predicted, difference from earlier I
- 31 compression
- B-picture bidirectional prediction
- based on earlier I/P, later I/P
- 4.51 compression, but reconstruction is complex
- Group of pictures
- begins with I-picture
- IBBPBBPBB is a common pattern
7Display vs. bitstream ordering
- Display order is the order in which the images
should be shown - requires decoder to buffer B pictures
- Bitstream order
- requires buffering of 2 I/P images
- first series has unusual order to bootstrap
process
8Which method provides higher compression JPEG or
MPEG?
9Resolution Pixels Raw Compressed Bytes of
Data
176 X 112 42KB 13KB
352 X 240 169KB 43KB
704 X 480 676KB 70KB
176 x 112
352 x 240
704 x 480
10Define Compression in video ?
Information is captured at the source and is
encoded (compressed) by an encoder. The
compressed data can then be transmitted across a
network or telecommunications link and decoded
(decompressed) by a decoder. The decoded
information can then be displayed. The
encoder/decoder, or codeccan be software,
hardware, or both.
11Lossy Compression
Types of Lossy Video Compression Single Frame
Compression JPEG Wavelet Motion Based
Compression Motion JPEG MPEG-1 MPEG-2 H-dot
12JPEG
Joint Photographic Expert Group Compression
ranges from 51 to 251 Strengths Good
Quality Independent Compression Standard across
Industries Limitations File Size Edge or Block
Artifacts
13JPEG Artifacts (Lowest Quality setting)
14Wavelet
Frequency Based Compression Algorithm Compression
Ranges from 51 to 301 Strengths Good
Quality Independent Compression Limitations Smeari
ng Non-Standard Algorithms
15Wavelet vs. JPEG
This picture is known as Lena. Why is it
controversial?
301 Compression
16Motion JPEG
JPEG based Compression Ranges from 51 to
251 Strengths Good Quality Series of JPEG
compressed images Limitations File Size Edge or
Block Artifacts No Standard Algorithms
You will frequently see references to "motion
JPEG" or "M-JPEG" for video. There is no such
standard. Various vendors have applied JPEG to
individual frames of a video sequence, and have
called the result "M-JPEG". Unfortunately, in the
absence of any recognized standard, they've each
done it differently. The resulting files are
usually not compatible across different
vendors. (http//www.faqs.org/faqs/jpeg-faq/part
1/section-20.html)
17MPEG-1
Moving Picture Expert Group Compression Ranges
from 501 to 1001 Strengths Good Quality Uses
Inter-frame Compression Limitations Resolution
Limited to 352 X 240 Blockiness when compression
is too high Quality of Video depends on amount of
change
18MPEG-2
Moving Picture Expert Group Works by analysing
the video picture for repetition, called
redundancy. Compression Ranges from 101 to
401 Strengths Excellent Quality Uses
Inter-frame Compression 720 X 480
Resolution Limitations Blockiness when
compression is too high Quality of Video depends
on amount of change
19What is H.26x
- Called H dot compression.
- H.261 and H.263 are part of the H.323 family of
standards. - Similar to MPEG, the H.series also uses motion
prediction when compressing - It is intended for real time video
teleconferencing. - Most common resolution is 352X288, also known as
CIF (Common Interchange Format) or QCIF 176X144
(Quarter CIF). - H.series is most suitable for video transmission
over ISDN, or PSTN lines, but some CCTV
manufacturers use it for recording also.
20Principles of compression
- Compression (or source coding) is achieved by
suppressing information - redundant information
- irrelevant information
- Suppression of redundant information ? lossless
compression example PCM to DPCM,DCTThe
original signal and the one obtained after
encoding and decoding are identical
21Principles of compression
- Suppression of irrelevant information ? lossy
compression Example bandwidth limitation,
masking in audio The original signal and the
one obtained after encoding and decoding are
different but are perceived as identical
22The compression trade-off
- Compression techniques are still making progress
- Trade-off Complexity/Quality/Bit Rate
- New technique may result in new trade-off
Complexity
Quality
MPEG Layer 2
MPEG Layer 1
MPEG Layer 3
Other Technique Speech coding
MPEG AAC
Bitrate
23Video compression in MPEG (1/6)
- Principles
- removal of intrapicture redundancy Image is
decomposed in 88 pixels subimages.Each subimage
contains redundant information DCT transfomation
(in frequency domain) decorrelates the input
signal.( most energy in low spatial frequencies) - removal of interpicture redundancy coding of
difference with an interpolated picture (moving
vectors. - high frequent spatial frequencies quantized with
lower resolution than low ones(remove
irrelevancy). - zig-zag scan and VLC (remove redundancy)
24Video compression in MPEG (3/6)
- Spatial redundancy reduction (DCT example)
25Video compression in MPEG (4/6)
- Temporal redundancy reduction
263 Possible questions
- Explain the DCT process in detail ?
- What is spatial redundancy reduction ?
- What is temporal redundancy reduction
27DCT ?
- A discrete cosine transform (DCT) is a
Fourier-related transform similar to the discrete
Fourier transform (DFT), but using only real
numbers. DCTs are equivalent to DFTs of roughly
twice the length, operating on real data with
even symmetry (since the Fourier transform of a
real and even function is real and even), where
in some variants the input and/or output data are
shifted by half a sample. There are eight
standard DCT variants, of which four are common.
28DCT ?
- The most common variant of discrete cosine
transform is the type-II DCT, which is often
called simply "the DCT" its inverse, the
type-III DCT, is correspondingly often called
simply "the inverse DCT" or "the IDCT".
29DCT ?
- Two related transforms are the discrete sine
transform (DST), which is equivalent to a DFT of
real and odd functions, and the modified discrete
cosine transform (MDCT), which is based on a DCT
of overlapping data.
30Video compression in MPEG
- Model of a possible encoder
31Synchronisation
- Synchronisation in the multimedia context
- refers to the mechanism that ensures a
temporal consistent presentation of the
audio-visual information to the user
32Video Compression
- A video consists of a time-ordered sequence of
frames, i.e., - images.
- An obvious solution to video compression would
be predictive coding based on previous frames. - Compression proceeds by subtracting images
subtract in time order and code the residual
error. It can be done even better by searching
for just the right parts of the image to subtract
from the previous frame.
33Video Compression with MotionCompensation
- Consecutive frames in a video are similar --
temporal redundancy exists. -
- Temporal redundancy is exploited so that not
every frame of the video needs to be coded
independently as a new image. - The difference between the current frame and
other -frame(s) in the sequence will be coded -
small values and low entropy, good for
compression.
34- Steps of Video compression based on Motion
Compensation (MC) - 1. Motion Estimation (motion vector search).
- 2. MC-based Prediction.
- 3. Derivation of the prediction error, i.e., the
difference.
35H.261 Video Coding
H.261 An earlier digital video compression
standard, its principle of MC-based compression
is retained in all later video compression
standards. The standard was designed for
videophone, video conferencing and other
audiovisual services over ISDN. The video codec
supports bit-rates of pX64 kbps, where p ranges
from 1 to 30 (Hence also known as pX64). Require
that the delay of the video encoder be less than
150 msec so that the video can be used for
real-time bi-directional video conferencing.
36(No Transcript)
37H.261 Frame Sequence
Two types of image frames are defined
Intra-frames (I-frames) and Inter-frames
(P-frames). Motion vectors in H.261 are always
measured in units of full pixel and they have a
limited range of 15 pixels, i.e., p 15.
38Inter-frame (P-frame) Predictive Coding
H.261
- H.261 P-frame coding scheme based on motion
compensation - For each macroblock in the Target frame, a motion
vector is allocated by one of the search methods
discussed earlier. - After the prediction, a difference macroblock is
derived to measure the prediction error. - Each of these 8x8 blocks go through DCT,
quantization, zigzag scan and entropy coding
procedures.
39- The P-frame coding encodes the difference
macroblock (not the Target macroblock itself). - Sometimes, a good match cannot be found, i.e.,
the prediction error exceeds a certain acceptable
level. - The MB itself is then encoded (treated as an
Intra MB) and in this case it is termed a
non-motion compensated MB. - For motion vector, the difference MVD is sent for
entropy coding - MVD MVPreceding - MVCurrent
40P-frame coding
41H261 Encoder
42H261 Decoder
43H261 Bitstream
44MPEG frame sequence
45Video Parsing (1)
- A shot is a sequence of continuous frames
representing a continuous action in time and
space. - Video parsing is the process of dividing a video
stream into shots based on the contents of the
video. - Video parsing enables subsequent video
processing, such as video indexing and video
editing. - The two main types of shot transitions are sharp
transitions and dissolve transitions.
46Video Parsing (2)
- A sharp transition is an abrupt transition
between two shots, which lasts only two frames. - A dissolve transition is a gradual transition,
involving fading in of one shot and fading out of
another shot. - sharp transition dissolve transition
- w w
- shot 1 shot 1
- t t
- w w
- shot 2 shot 2
- t t
47MPEG Video Stream (3)
- MPEG allows video frames to be coded in one of
three formats intra-frame (I), forward predicted
(P), and bidirectionally predicted (B). - The number of frames from one I frame to the next
is called a group of pictures (GOP). - I frames are coded independently of other frames.
- P and B frames are coded using macroblock-based
motion compensation. - For each macroblock in a P frame, it can be
skipped, forward predicted, or intra-coded. - For each macroblock in a B frame, it can be
skipped, forward predicted, backward predicted,
bidirectionally predicted, or intra-coded.
48Video Frame Coding Techniques
I
P
P
P
P
P
P
I
B
B
P
B
B
P
49What is a Codec?
- A video codec is a device or software module that
enables video compression or decompression for
digital video.
50Why are codecs important?
- Today, digital video codecs are found in DVDs and
on the Internet. Online video material is encoded
in a variety of codecs, and this has led to the
availability of codec packs - Codec packs are pre-assembled set of commonly
used codecs combined with an installer available
as a software package for computers.
51Terms to know for video codecs
- Video compression refers to reducing the quantity
of data used to represent video content without
excessively reducing the quality of the picture. - Bitrate or datarate is the number of bits used
per unit of time to represent a continuous medium
such as audio or video after compression. Bitrate
(bps) is the standard unit of measurement for
data compression.
52Types of compression
- Temporal Compression- compression by removing
redundant frames. - Spatial Compression-compression by removing
redundant pixels in a frame.
53Common video formats
- AVI-Audio Video Interleave is a multimedia
container format made by Microsoft in November
1992 as part of the Video for Windows technology.
AVI files contain both audio and video data in a
standard container that allows simultaneous
playback.
54Common video formats
- Divx, xvid, and 3ivx- popular codecs that are
commonly used today. These codecs use spartial
compression.
55Common video codecs
- H.263(also known as MPEG4 and AVC)- developed by
the ITU-T, was the first practical digital video
compression standard. This format uses a type of
Temporal compression.
56Common video codecs
- WMV(Windows Media Video) Microsoft's video codec
designs including WMV 7, WMV 8, and WMV 9. WMV
can do high resolution video, but it is known for
having high lossy compression due to using
spartial compression.
57Common video codecs
- .rmvb (Real Media Variable Bitrate)- A
increasingly popular format used for video today.
Its increasing popularity is due to its
surprisingly good compression ratio. It achieves
this by varying the bitrate of the video
according to the amount of action. The less
action there is the smaller the bitrate at that
point.This is a type of temporal compression.
58Common video formats
- MKV(Matroska) is a project to develop an open
multimedia container format similar to MPEG's
MP4. MKV is a unique format because it allows
multiple streams of audio and video to be encoded
together. Each stream is also encoded in another
format such as MPEG4 or avi.
59More video codecs
- VOB (DVD-Video Object) a container format
contained in a DVD. VOB files are very similar to
MPEG-2 files and can even have their extensions
changed to mpeg2 and still play.
60Revisiting JPEG
61Definitions in the JPEG Standard
- Three levels of definition
- Baseline system (every codec must implement it)
- Extended system (methods to extend the baseline
system) - Special lossless function (ensures lossless
compression/decompression)
62Overview of JPEG Components
- Components describing four levels of JPEG
compression are - Baseline sequential codec (consists of three
steps formation of DCT coefficients,
quantization, and entropy encoding) - DCT progressive mode (multiple scans refining
the image) - Predictive lossless encoding (simple predictive
method) - Hierarchical mode (provides multiple resolutions)
63Sequential JPEG Encoder and Decoder
Source Image Data
Compressed Image Data
Forward Discrete Cosine Transform
Quantizer
Entropy Encoder
Table Specification
Table Specification
8x8 blocks
Reconstructed Image Data
Compressed Image Data
Entropy Decoder
Dequantizer
Inverse DCT
Table Specification
Table Specification
64Benefits Provided by DCT
- DCT is proven to be optimal transform for large
classes of images - DCT is an orthogonal transform it allows
conversion of the - spatial representation an 8x8image to the
frequency domain - therefore reducing the number of data points
- DCT coefficients are easily quantized to achieve
good compression - DCT algorithm is efficient and easy to implement
- DCT algorithm is symmetrical
65DCT Calculation
The formula for discrete cosine transform
(creating DCT coefficients) is
The formula for inverse discrete cosine transform
(to restore the original pixel information) is
66Quantization
- Quantization is a process that attempts to
determine what - information can be safely discarded without a
significant loss - in visual fidelity (lossy stage)
- Based on a set of quantization tables derived
from empirical - experimentation
- The quantized coefficient is described by the
following equation
67JPEG Encoding Example
(a) Original 8x8 block
(b) Shifted block
(c) Block after FDCT
140 144 1471140 140 155 179 175 144 152 140 147
140 148 167 179 152 155 136 167 163 162 152
172 168 145 156 160 152 155 136 160 162 148 156
148 140 136 147 162 147 167 140 155 155 140 136
162 136 156 123 167 162 144 140 147 148 155 136
155 152 147 147 136
12 16 19 12 11 27 51 47 16 24 12 19 12 20 39
51 24 27 8 39 35 34 24 44 40 17 28 32 24 27 8
32 34 20 28 20 12 8 19 34 19 39 12 27 27 12 8
34 8 28 -5 39 34 16 12 19 20 27 8 27 24 19 19 8
185 -17 14 -8 23 -9 -13 -18 20 -34 26 -9
-10 10 13 6 -10 -23 -1 6 -18 3 -20
0 -8 -5 14 -14 -8 -2 -3 8 -3 9 7
1 -11 17 18 15 3 -2 -18 8 8 -3 0
-6 8 0 -2 3 -1 -7 -1 -1 0 -7
-2 1 1 1 -6 0
(d) Quantization Table
(e) Block after quantization
3 5 7 9 11 13 15 17 5 7 9 11
13 15 17 19 7 9 11 13 15 17 19 21
9 11 13 15 17 19 21 23 11 13 15 17 19
21 23 25 13 15 17 19 21 23 25 27 15
17 19 21 23 25 27 29 17 19 21 23 25
27 29 31
61 -3 2 0 2 0 09 -1 4 -4 2 0
0 0 0 0 -1 -2 0 0 -1 0 -1 0
0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 -1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0
(f) Zig-zag sequence 61,-3,4,-1,-4,2,0,2,-2,0,0,0
,0,0,2,0,0,0,1,0,0,0,0,0,0,-1,0,0,-1,0,0,
0,0,-1,0,0,0,0,0,0,0,-1,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0 (g) Intermediate symbol
sequence (6)(61),(0,2)(-3),(0,3)(4),(0,1)(-1),(0,3
)(-4),(0,2)(2),(1,2)(2),(0,2)(-2), (0,2)(-2),(5,2)
(2),(3,1)(1),(6,1)(-1),(2,1)(-1),(4,1)(-1),(7,1)(-
1),(0,0) (e) Encoded bit sequence (total 98
bits) 11101111010010010000010001101101101110010111
11111011 11011101011111011011100011101101111101001
010
68Video Compression
- Utilizes two basic compression techniques
- Interframe compression (temporal)
- compression between frames
- designed to minimize data redundancy in
successive - pictures
- Intraframe compression (spatial)
- occurs within individual frames
- designed to minimize the duplication of data in
each - picture
69Classification of Scalable Video Compression
Techniques
- DCT-based schemes
- H.261
- H.263
- MPEG1
- MPEG2
- Wavelet/sub-band
- Fractal-based
- Image segmentation/region based
- MPEG4
70Various MPEG Standards
- MPEG-1
- 320x240 full-motion video
- 1.5 Mb/s
- MPEG-2
- higher resolution and transmission rate
- defines different levels (profiles) for
scalability - MPEG-4
- full-motion video at low bitrate (9-40 Kbps)
- intended for interactive multimedia, video
- telephony
71MPEG Compression Standards
- Implements both intraframe and interframe coding
- Intraframe is DCT-based and very similar to JPEG
- 1. Conversion to YUV color space
- 2. DCT
- 3. Scalar quantization
- 4. RLE and Huffman encoding
- Interframe uses block-based motion compensation
- utilized for reducing temporal redundancy
- MPEG is an asymmetric algorithm
72MPEG Picture Types
- Three types of pictures
- Intrapictures (I)
- Unidirectional predicted pictures (P)
- Bidirectional predicted pictures (B)
- Grouped together (typically 12 pictures) in GOPs
73Motion Compression for Coding MPEG
Forward prediction Pf(I)
I
B
B
B
P
B
B
B
I
Bidirectional prediction Bf(I,P)
Bidirectional prediction Bf(I,P)
74Motion Compensation
- Based on assumption that the current picture is
some translation - of the previous one
- Generates motion vector for each block
- Matching is done by
- prediction (requires only current and reference
frame) - interpolation (in relation to two frames, one
from the - past and next)
- In MPEG (unlike H.261) motion compensation is
applied - bidirectionally
75Non-DCT Based Compression Techniques
- Fractals Compression
- A digitized image is broken into segments by
applying - fractal mathematics
- Work by reducing an image to a set of
mathematical functions - Asymmetric algorithm
- Main advantage dimension is independent of the
scale of the - object
- Implemented within Progressive Networks
RealVideo - Subband/Wavelet Coding
- Most widely used technique is wavelet transform
- Used in VDONet and Vxtreme products