Title: Overview of Digital Video 1
1Overview of Digital Video (1)
- Digital video is regarding video information
that is stored and transmitted - in digital form.
- Analog video only use a bandwidth of few
megahertz, but the bit rate of - the digital signal for transmitting the same
video content is typically over - 100 Mbps, which is too high to be feasible for
most of the networks today. - Video compression is the solution for both
stored video and video - transmission over the network. The the video
compression techniques - have been evolving for the past two decades.
- Due to the advance of the processor technology
and the development of - international video compression standards, a
wide range of video - communications applications have been developed
in recent years.
2Overview of Digital Video (2)
- Representation of Video Information
- A color image is represented in terms of
component signals. - A color can be synthesized by combining three
color primaries, - red, green and blue (RGB).
- Each of the three primary contains information
of luminance (brightness) - and chrominance (color), which can be
represented separately. - A luminance signal (Y) can be produced from a
weighted sum of R, G, - and B components.
- The chromaticity of the color can be represented
by the color difference - signal.
- Cr wr (R-Y)
- Cb wb (B-Y)
- Cg wg (G-Y)
- where wr , wb , and wg are weighting factors.
3Overview of Digital Video (3)
- Standards for Analog Color TV
- NTSC (National Television System Committee),
this is used in the USA - and Japan. NTSC use 525 lines per frame, and
its field rate is 60 Hz (i.e. - 30 frame per second)
- PAL (Phase Alternation Line) system is used in
most of Western Europe. - SECAM (Sequential Colour avec Memoire ) is used
in France and in - parts of Eastern Europe.
- In both PAL and SECAM systems, each frame
consists of 625 active - lines, and the field rate is 50 Hz (i.e. 25
frames per second) . - All the above systems use three components
luminance Y, blue color - difference U (equivalent to Cb), and red color
difference V (equivalent - to Cr).
4Overview of Digital Video (4)
- ITU-R (International Telecommunications Union -
Radio) Standard - The ITU-R (former CCIR) provides a standard
method of encoding - television information in digital form.
- The luminance and color difference components
are sampled with a - precision of 8-bits.
- The luminance component of an NTSC frame is
sampled to produce an - image of 525 lines, each containing 858
samples.The active area of the - digitized frame is 720 x 486 picture elements
(pixels). - The luminance component of a PAL or SECAM frame
is sampled with - 625 lines and 864 samples, and the active area
is 720x576 pixels. - The color difference signals are sampled with
the same vertical resolution - (486 or 576 active lines) and the horizontal
resolution halved. - Only the odd-numbered luminance pixels in each
line have associated - color difference pixels.
5Overview of Digital Video (5)
Bit rate of CCIR 601 digital TV signal NTSC
original bit rate 30 x 8 x( (858x525)
(429x525x2)) 216.216Mbps - 30 frame per
second - 858 x 525 luminance samples - 429 x 525
x 2 chrominance samples - 8 bits per
sample PAL/SECAM original bit rate 25 x 8 x(
(864x625)(432x625x2))
216.0Mbps - 25
frame per second - 864 x 625 luminance samples -
432 x 625 x 2 chrominance samples - 8 bits per
sample
6Why Video Compression ?
A single digital TV signal for CCIR 601 format
needs 216 Mbps bit rate, which is unacceptable
for most practical network transmission. On the
other hand, a traditional analog TV signal with
similar quality only requires 6 7 MHz.
Obviously, network transmission of original
digital TV signal is too expensive to be
practical. Before digital TV or video can be
transmitted through the network, the data rate
needs to be reduced. This means to compress
(encode) the digital video information prior to
the network transmission and decompress (decode)
the received video information before displaying
it. Digital video information contains a
considerable amount of redundancy. Video data is
usually highly correlated both spatially and
temporally. This redundancy can be removed by
coding the data in a more efficient way.
7Overview of Compression Methods (1)
- Types of redundancy
- Spatial redundancy - the values of neighboring
pixels are strongly - correlated in almost all natural images.
- Redundancy in scale - important image features
such as straight edges - and constant regions are invariant under
rescaling. - Redundancy in frequency - in images composed of
more than one - spectral band, the spectral values for the same
pixel location are often - correlated and an audio signal can completely
mask a sufficiently - weaker signal in its frequency vicinity.
- Temporal redundancy - adjacent frames in a
video sequence often - show very little change, and a strong audio
signal in a given time block - can mask an adequately lower distortion in a
previous or future block. - Stereo redundancy - audio coding methods can
take advantage of the - correlations between stereo channels.
-
8Overview of Compression Methods (2)
Characteristics of Compression Methods
9Overview of Compression Methods (3)
Relation between perceptible quality and required
bandwidth
Quality
high
lossless compression
lossy compression
low
Bandwidth
high
low
10Overview of Compression Methods (4)
Coding Techniques for Multimedia Systems
11Overview of Compression Methods (5)
- Entropy coding (lossless)
- Entropy is defined as the average information
content of given data. - It defines the minimum number of bits needed to
represent the information - content without information loss.
- Entropy coding is a lossless technique, it tries
to achieve this theoretical - lower limit.
- Source coding (lossy)
- It distinguishes relevant and irrelevant data.
- It takes into account the semantics of the data
and removes the irrelevant - data, so that the original data stream can be
compressed. - Hybrid coding (lossy)
- This is a combination of entropy coding and
source coding
Compressed data
Uncompressed data
Source Coding steps
Entropy Coding steps
Preparation
12Basic Coding Methods (1)
- Entropy coding does not use semantics of the
data only bit stream is - Considered.
- Run-Length Coding
- Uncompressed data ABCCCCCCCDEFGH
- Run-Length coded A!7CDEFGH
- Huffman Coding
- p(A) ¾, p(B) 1/8,
- p(C) p(D) 1/16
- w(A) 1, w(B) 01,
- w( C) 001,
- w(D) 000
0
1
0
1
1
0
13Basic Coding Methods (2)
- Arithmetic Coding
- This coding method often generates slightly
better results in audio and - video coding because it works with floating
points instead of the characters. - used in Huffman coding.
- The floating points is computationally more
expensive. - It has been shown that average compression
achieved by arithmetic and - Huffman coding is very similar.
- The algorithm for arithmetic coding is covered
by patents held by IBM, - ATT and Mitsubishi
14Basic Coding Methods (3)
- Discrete Cosine Transform Coding
- Pixels are grouped into blocks, which are
transformed into another domain - to form a set of coefficients, these
coefficients are coded and transmitted. - Compression is done by quantizing the
coefficients so that the useful - coefficient are transmitted and the remaining
coefficients are discarded. - The most effective compaction is achieved using
Karhunen-Loeve - Transform (KLT), but it is very
computationally intensive, while discrete - cosine transform (DCT) is a widely used
alternative to KLT. - A DCT-based image coding system usually consists
of following steps - - Separate the image into blocks
- - Discrete Cosine Transform
- - Quantization
- - Encoding
15Basic Coding Methods (4)
Discrete Cosine Transform The DCT converts a
block of pixels into a block of transform
coefficients of the same dimensions. These
coefficients represent the spatial frequency
components that make up the original pixel
block. Each coefficient can be thought of as a
weight that is applied to an appropriate basis
function.
DC basis function
Increasing horizontal frequency
Increasing vertical frequency
16Basic Coding Methods (5)
- A gray-scale 8 x 8 pixel block can be fully
represented by a weighted sum - of these 64 basis functions.
- The appropriate weights that are required to
produce a particular block are - the DCT coefficients for that block.
- The two-dimensional DCT of an N x N block of
pixel values
and the inverse DCT
where F(u,v) is the transform coefficient and
f(i,j) is the pixel value, and
17Basic Coding Methods (6)
- Each coefficient in the forward transform is
calculated by summing 64 - separate calculations, this results a total of
64 x 64 4,096 calculations - for transforming an 8 x 8 block of pixels.
- The above calculation process can be replaced by
a one-dimensional - transform along all the rows and then all the
columns of the block. - Since each coefficient in the one-dimensional
transform requires 8 - calculations, so either a row or a columns
needs 8 x 8 64 calculations, - this results 64 x 8 (for 8 rows) 64 x 8 (for
8 columns) 1,024 - calculations.
- The computational complexity can be further
reduced by replacing the - cosine form of the transform with an algorithm
which only perform a - series of multiplication and addition
operations
18Basic Coding Methods (7)
Frequency distribution
vertical
DC
DC
low
diagonal
medium
horizontal
high
block features
frequency distribution
19Basic Coding Methods (8)
Zigzag Sequence Order of AC coefficients
with increasing frequency
20Basic Coding Methods (9)
- Differential Pulse Code Modulation (DPCM)
-
- The image is scanned in a raster fashion
- Each pixel is represented as a number with a
limited precision - A predictive pixel value based on the previously
transmitted pixel is - transmitted instead of the actual value.
- The prediction error between the predicted pixel
value and the actual value - is quantized and transmitted.
- Encoding the quantized error using variable
length codes to achieve further - compression.
Current pixel
21Basic Coding Methods (10)
- Motion-Compensated Prediction
- Temporal redundancies between two frames bin a
video sequence can be - exploited.
- The idea is to look for a certain area in
previous or subsequent frame that - matches very closely an area of the same size
in the current frame. - If successful, a best matching block can be
found, and the difference signal - between the block intensity values of the
block in the current frame and - the block in the reference frame is
calculated. - The motion vector, which represents the
translation of corresponding - blocks in both x- and y-direction is
determined. - The difference signal and the motion vector
represent the deviation between - reference block and predicted block, both are
called prediction error.
22Basic Coding Methods (11)
- There are three types of motion-compensated
prediction - Unidirectional motion-compensatd prediction
- Bidirectional motion-compensatd prediction
- Interpolative motion-compensated prediction
Forward Motion- Compensated Prediction
po
23Still Image Coding (1)
Image Coding
Encoder
Decoder
Input image
Compressed data
Image output
Encoder model
Entropy Encoder
Entropy decoder
Decoder model
- Image coding techniques
- Predictive coding
- Discrete Cosine Transform Coding
- Subband coding
- Fractal coding
24Still Image Coding (2)
- JPEG (Joint Photographic Experts Group)
- JPEG standard defines 4 coding modes
- Sequential DCT-based encoding - each component
is encoded in a single - scan based on DCT.
- Progressive DCT-based encoding - it uses
multiple scans, each scan - contains a partially encoded version of the
image. The scans can be - decoded sequentially so that a rough image is
quickly decoded and this is - then built up using further scans.
- Hierarchical encoding - each component is
encoded at multiple resolutionstor, - dufferring by a factor of two or
- Lossless encoding it is based on DPCM system.
This mode provides - compression without any loss of quality using
more time consuming - algorithms.
25Still Image Coding (3)
Four Coding Modes of JPEG
Progressive DCT
Sequential DCT
Sequential lossless
Hierarchical
26Still Image Coding (4)
27Still Image Coding (5)
- Image preparatioon
- A source image must have a rectangular format
and consist of 1 to 255 - planes or components, such as RGB or YUV. After
the separation of - components, each component is divided into data
units of 8x8 pixel blocks. - Picture processing
- The baseline mode compresses the data by
applying a two-dimensional - DCT, then quantizing and entropy coding the
corresponding DCT - coefficients. There are 64-element quantization
table associated with - 64 DCT coefficient.
- Entropy encoding
- JPEG specifies both Huffman and arithmetic
encoding for entropy coding.
28Coding of Moving Images
A video CODEC can be anything from the simplest
A2D device, through to something that does
picture pre-processing, and even has network
adapters build into it. A CODEC usually does
most of its work in hardware, but there is no
reason not to implement everything in software on
a reasonably fast processor. The most
expensive and complex component of a CODEC is the
compression/decompression part. There are a
number of international standards and many
number of proprietary compression techniques for
video.
29Moving Image Coding (1)
- H.261
- H.261 is the most widely used international
video compression standard - for video conferencing.
- This ITU (was CCITT) standard describes the
video coding and decoding - methods for the moving picture component of an
audiovisual service at - the rates of p 64 Kbps where p is in the
range of 1 to 30. - The standard targets and is really suitable for
applications using circuit - switched networks as their transmission
channels. This is as ISDN with - both basic and primary rate access was the
communication channel - considered within the framework of the
standard. - H.261 is usually used in conjunction with other
control and framing - standards such as H.221, H.230 H.242 and H.320,
of which more later.
30Moving Image Coding (2)
Processing Steps of the H.261 Video Codec
External control
Coding control
Transmission buffer
Transmission coder
Source coder
Video multiplex coder
Video signal
Coded bit stream
Video coder
Video coder
31Moving Image Coding (3)
- H.261 Inage preparation
- The source coder operates on only non-interlaced
pictures. Pictures are - coded as luminance and two color difference
components(Y, Cb, Cr). The - Cb and Cr matrices are half the size of the Y
matrix. - H.261 supports two image resolutions, CIF (Common
Intermediate Format) - and QCIF.
- CIF
QCIF - Y 352 x 288 176
x 144 - Cb 176 x 144 88
x 72 - Cr 176 x 144
88 x 72 - CIF and QCIF frames are divided into a
hierarchical block structure - Consisting of picture, group of blocks (GOB),
macro blocks, and blocks.
32Moving Image Coding (4)
- Hierarchical block structure of H.261
- Structure Element
Description - Picture (frame) 1
video picture - Group of blocks 33
macro blocks - Macro block 16
x 16 Y, 8 x 8 Cb, Cr - Block
8 x 8 pixels (coding unit for DCT) - Picture
- H.261 uses two types of coding macro blocks,
intraframe and interframe. - There is no advantage regarding the redundancy
between frames. Beyond - this, H.261 tries to make use of temporal
redundancies by means of - motion-compensated prediction.
-
33Moving Image Coding (5)
- The first frame to be transmitted is always an
intraframe coded frame (i.e. - all macro blocks are intraframe coded.)
- The entire picture is divided into
nonoverlapping 8x8 pixel blocks on - which the forward DCT is applied.
- The resulting 64 DCT coefficients quantized and
zigzag-reordered - For interframe coding, the recently coded frame
is decoded again within - the encoder using inverse quantization and
inverse DCT. - For the next frame to be encoded, the last
previously coded and stored - frame is used for deciding whether to
intraframe- or interframe-code - each macro block.
- The algorithm performs a unidirectional
motion-compensated prediction - which uses four luminance blocks of each macro
block to find a close - match in the previous frame for the macroblock
currently encoded. - If it cannot find a close match, it employs the
same coding for the macro - block as in intraframe coding.
34Moving Image Coding (6)
Block Transformation H.261 supports motion
compensation in the encoder as an option. In
motion compensation a search area is constructed
in the previous (recovered) frame to determine
the best reference macroblock . Both the
prediction error as well as the motion vectors
specifying the value and direction of
displacement between the encoded macroblock and
the chosen reference are sent. The search area
as well as how to compute the motion vectors are
not subject to standardization. Both horizontal
and vertical components of the vectors must have
integer values in the range 15 and 15 though
In block transformation, INTRA coded frames as
well as prediction errors will be composed into
8x8 blocks. Each block will be processed by a
two- dimensional FDCT function. If this sounds
expensive, there are fast table driven
algorithms and can be done in s/w quite easily,
as well as very easily in hardware.
35Moving Image Coding (7)
- The motion estimation process results in three
possible decisions for - coding a macro block.
- - Intracoding, where blocks of 8x8 pixels each
are only with reference - to themselves and are sent directly to the
block transformation process. - - Intercoding with motion compensation (the
motion vector has zero - value)
- Intercoding with motion compensation.
- There is an optional filter between DCT and the
entropy coding process, - which can be used to improve the image quality by
removing high- - frequency noise as needed.
- Quantization in H.261 is a linear function, the
quantization step size - depends on the amount of data in the transform
buffer, thereby generating - a constant data rate at the output of the coder.
36Moving Image Coding (8)
- A prediction error is calculated between a 16x16
pixel region (macroblock) - and the (recovered) correspondent macroblock in
the previous frame. - Prediction error of transmitted blocks (criteria
of transmission is not - standardized) are then sent to the block
transformation process. - Blocks are inter or intra coded
- Intra-coded blocks stand alone
- Inter-coded blocks are based on predicted
error between the previous - frame and this one
- Intra-coded frames must be sent with a minimum
frequency to avoid loss - of synchronization of sender and receiver.
37Moving Image Coding (9)
Quantization Entropy Coding The purpose of
this step is to achieve further compression by
representing the DCT coefficients with no
greater precision than is necessary to achieve
the required quality. The number of quantizers
are 1 for the INTRA dc coefficients and 31 for
all others. Entropy coding involves extra
compression (non-lossy) is done by assigning
shorter code-words to frequent events and longer
code-words to less frequent events. Huffman
coding is usually used to implement this step.
In other words, for a given quality, we can
lose coefficients of the transform by using less
bits than would be needed for all the values This
leads to a "coarser" picture. We can then
entropy code the final set of values by using
shorter words for the most common values and
longer ones for rarer ones (like using 8 bits
for three letter words in English)
38Moving Image Coding (11)
H.263 H.263 is a new addition to the ITU H
series and is aimed at extending the repertoire
to Video Coding for Low Bit Rate Communication.
This makes it suitable to a wide variety of
Internet access line speeds, and therefore also
probably reasonably friendly to many Internet
Service Providers backbone speeds. Existing
A/V Standards and Video and the basic technology
of CCD camera and of Television and general CRT
dictates frame grabbing at some particular
resolution and rate. The choice of resolution is
complex. One could have fixed number of pixels,
and aspect ratio, or allow a range of choice of
line rate and samples rates. H.261 and MPEG
choose the latter.
39Moving Image Coding (12)
The line rate (a.k.a. Picture Clock Frequency -
PCF) is 30,000/1001 or about 29.97Hz but one can
also use multiples of this. The chosen
resolution for H.263 is dxdy luminance and
chrominanace is just one half this in both
dimensions. H.263 then allows for sub-QCIF which
is 12896 pixels, QCIF - 176144 pixels, CIF -
352288 pixels, 4CIF (SCIF in the INRIA Ivs
tool) 704576 pixels and 16CIF 14081152 pixels.
The designer can also choose a pixel aspect
ration the default is 288/3 352/4 which is
1211 (as per H.261). The picture area covered by
standard formats has aspect ratio of 43.
Luminance and chromnance sample positions as
per H.261, discussed earlier in this chapter.
The structure of the coder is just the same too,
although there are now two additional modes
called the slice'' and picture block''
modes.
40Moving Image Coding (13)
A block is 1616 Y and 88 Cb and Cr each The
Group of Block, or GOB refers to k16 lines
GOBS are numbered using a vertical scan starting
with 0 to k, depending on the number of lines in
Picture. e.g. normally, when lines lt 400, where
k is 1. The number of GOBS per picture then is 6
for subQCIF, 9 for QCIF, 18 for CIF (and for
4CIF and 16CIF because of special rules).
Prediction works on Intra, inter, B, PB, EI or EP
(the reference picture is smaller). The
Macroblock is 16 lines of Y, and the
corresponding 8 each of Cb and Cr Motion vectors
of which we can receive 1 per Macroblock. H.263
extends H.261 over lower bit rate (not just
p64kbps design goal) and more features for
better quality and services, but the basic ideas
same. Intra and Inter frame compression
DCT block transform plus quantization
41Moving Image Coding (14)
There are then a number of basic enhancements in
H.263 including 1. Continuous Presence
Multi-point and Video Multiplex mode - basically
4 in 1 sub-bit-stream transmission. This may
be useful for conferences, tele-presence,
surveillance and so on 2. Motion Vectors can
point outside picture 3. Arithmetic as well as
variable length coding (VLC) 4. Advanced
Prediction Mode which is also known as
Overlapped Block Motion Compensation'
uses 4 88 blocks instead of 1 1616, This gives
better detail. 5. PB Frames known as
combined Predictive and Bi-Directional frames
(like MPEG II). 6. FEC to help with
transmission loss Advanced Intra coding to help
with interpolation Deblocking Filter mode,
to remove blocking artifacts
42Moving Image Coding (15)
7. Slice Structured Mode (re-order blocks so
Slice layer instead of GOB layer is more
delay and loss tolerant for packet transport 8.
Supplemental Enhancement Information,
Freeze/Freeze Release and Enhancement and
Chroma Key (use external picture as
merge/background etc...for mixing). 9.
Improved PB mode, including 2 way motion vectors
in PB mode 10. Reference Picture Selection 11.
Temporal, SNR and Spatial Scalability mode this
allows receivers to drop B frames for
example - gives potential heterogeneity amongst
receivers of multicast. 12. Reduced
Resolution Update Independent Segment decoding
Alternate INTER VLC mode 13. Modified
Quantization mode (can adjust up or down the
amount of quantization to give fine
quality/bit-rate control.
43Moving Image Coding (16)
Chroma Keying is a commonly used technology in
TV, e.g. for picture in picture/superimpose etc,
for weather people and so on. The idea is to
define some pixels in an image as
transparent'' or semi-transparent'' and
instead of showing these, a reference,
background image is used (c.f. transparent GIFs
in WWW). We need an octet per pixel to define the
keying color for Y, Cb and Cr, each. The actual
choice when there isn't an exact match is
implementor defined.
44Moving Image Coding (17)
- MPEG
- The aim of the MPEG-II video compression standard
is to cater for the - growing need of generic coding methods for moving
images for various - applications such as digital storage and
communication. So unlike the - H.261 standard who was specifically designed for
the compression of - moving images for video conferencing systems at p
64Kbps , MPEG - is considering a wider scope of applications.
- Aimed at storage as well as transmission
- Higher cost and quality than H.261
- Higher minimum bandwidth
- Decoder is just about implementable in
software - Target 2Mbps to 8Mbps really.
- The "CD" of Video?
45Moving Image Coding (18)
MPEG Source Images format The source pictures
consist of three rectangular matrices of
integers a luminance matrix (Y) and two
chrominance matrices (Cb and Cr). The MPEG
supports three formats 420 format - In this
format the Cb and Cr matrices shall be one half
the size of the Y matrix in both horizontal and
vertical dimensions. 422 format - In this
format the Cb and Cr matrices shall be one half
the size of the Y matrix in horizontal
dimension and the same size in the vertical
dimension. 444 format - In this format the Cb
and Cr matrices will be of the same size as the
Y matrix in both vertical and horizontal
dimensions. It may be hard to convert to this,
but then this is targeted at digital video tape
and video on demand really.
46Moving Image Coding (19)
MPEG frames The output of the decoding process,
for interlaced sequences, consists of a series
of fields that are separated in time by a field
period. The two fields of a frame may be coded
independently (field-pictures) or can be coded
together as a frame (frame pictures). An MPEG
source encoder will consist of the following
elements - Prediction (3 frame times)
- Block Transformation - Quantization and
Variable Length Encoding The diagram in the
following shows the intra, predictive and
bi-directional frames that MPEG supports
47Moving Image Coding (20)
MPEG GOP structure
Forward prediction
1
3
4
5
6
7
8
9
2
Bidirectional prediction
48Moving Image Coding (21)
Structure of MPEG bitstream
Sequence layer
GOP layer
Picture layer
Slice layer
Macroblock layer
Block layer
49Moving Image Coding (22)
- MPEG Prediction
- MPEG defines three types of pictures
- Intrapictures (I-pictures)
- These pictures are encoded only with respect
to themselves. Here each - picture is composed onto blocks of 8x8 pixels
each that are encoded only - with respect to themselves and are sent
directly to the block transformation - process.
- Predictive pictures (P-pictures)
- These are pictures encoded using motion
compensated prediction from a - past I-picture or P-picture. A prediction
error is calculated between a - 16x16 pixels region (macroblock) in the
current picture and the past - reference I or P picture.
50Moving Image Coding (23)
A motion vector is also calculated to determine
the value and direction of the prediction. For
progressive sequences and interlaced sequences
with frame-coding only one motion vector will be
calculated for the P-pictures. For interlace
sequences with field-coding two motion vectors
will be calculated. The prediction error is then
composed to 8x8 pixels blocks and sent to the
block transformation Bi-directional pictures
(B-pictures) These are pictures encoded using
motion compensates predictions from a past
and/or future I-picture or P-picture. A
prediction error is calculated between a 16x16
pixels region in the current picture and the past
as well as future reference I-picture or
P-picture. Two motion vectors are calculated.
One to determine the value and direction of the
forward prediction the other to determine the
value and direction of the backward prediction.
51Moving Image Coding (24)
For field-coding pictures in interlaced sequences
four motion vectors will thus be calculated.
It should be noted that a B-picture can never
be used as a prediction picture. The method of
calculating the motion vectors as well as the
search area for the best predictor is left to be
determined by the encoder. MPEG Block
Transformation In block transformation, INTRA
coded blocks as well as prediction errors are
processed by a two-dimensional DCT function.
Quantization The purpose of this step is to
achieve further compression by representing the
DCT coefficients with no greater precision than
is necessary to achieve the required quality.
52Moving Image Coding (25)
Variable length encoding Here extra compression
(non-lossy) is done by assigning shorter
code-words to frequent events and longer
code-words to less frequent events. Huffman
coding is usually used to implement this step.
MPEG Picture Order It must be noted that in
MPEG the order of the picture in the coded stream
is the order in which the decoder process them.
The reconstructed frames are not necessarily in
the correct form of display.
53Moving Image Coding (26)
Multiplexing and Synchronizing In networked
multimedia standards, the multiplexing function
defines the way that multiple streams of
different or the same media of data are carried
from source to sink over a channel. There are at
least three completely different points in this
path where we can perform this function we can
design a multi-media codec which mixes together
the digital coded (and possibly compressed)
streams as it generates them - possibly
interleaving media at a bit by bit level of
granularity we can design a multiplexing layer
that mixes together the different media as it
packetizes them, possibly interleaving samples
of different media in the same packets or we can
let the network do the multiplexing, packetizing
different media streams completely separately.
54Moving Image Coding (27)
MPEG-1 Video MPEG-1 consists of several parts
System, Video and Audio etc. Beyond simple
playback, the MPEG-1 system is responsible for
multiplexing and synchronization. MPEG-1 video
distinguishes between four different coding types
for Images I frames, P frames, B frames and D
frames. I frames (intracoded frames) are coded
without reference to other images. MPEG makes use
of JPEG for I frames. The compression rate for I
frames is the lowest for all defined coding
types. P frames (predictively coded frames) need
information from the previous I and/or P frame
for encoding and decoding. The achievable
compression is higher than that for I frames.
55Moving Image Coding (28)
B frames (bidirectionally predictively coded
frames) require information from the previous and
following I and/or P frames for encoding and
decoding. The highest compression ration can be
achieved, also a bidirectional
motion-compensation prediction can be used. D
frames (DC coded frame) are encoded intraframe,
whereby the AC coefficient are neglected. D
frames can never be used with the other picture
types. Reference frames must be transmitted
first. The transmission order and the display
order may differ. At the beginning, there is
always an I frame. The first I frame and the
first P frame is also the reference for the
first two B frames. The first I frame is also the
reference of the P frame. Thus the I frame must
be transmitted first, followed next by the P
frame and then the B frame.
56Moving Image Coding (29)
Display and Transmission Order in MPEG-1 Video
Display order
I B B P B B I
Transmission order I P
B B I B B The second I frame
must be transmitted since it serves as reference
for The second pair of B frames.
57Moving Image Coding (30)
Video Sequence
Group of Pictures
?
???
???
Block
Picture
Macro block
Slice
8 Pixel
?
8 Pixel
58Moving Image Coding (31)
MPEG-1 Constrained Parameter Set
Parameter Restrictions Horizont
al resolution ? 768 pixel Vertical
resolution ? 576 lines Macro
blocks/s ? 25 macro
blocks/s Frames/s ?
30 Hz Motion vector range ? (-64/ 63,
5) pixel Input buffer size ?
327.680 bit Bit rate
? 1 856 Mbps MPEG-1 video uses the same
image format as H.261 but allows a greater Choice
of image size.
59Moving Image Coding (32)
- MPEG-2 Video
- MPEG-2 video conforms with MPEG-1. It allows
data rates up to 100 - Mbps, also it supports interlaced video formats
as well as HDTV. - MPEG-2 can be used for the digital transmission
of video over satellite, - cable, and other broadcast channels.
- MPEG-2 builds upon the completed MPEG-1 standard
and was - cooperatively developed by ISO/IEC and ITU
(H.262). - MPEG-2 video was defined in terms of extensible
profiles, each of which - supports the features needed by an important
class of applications. - Initially, MPEG-3 was intended to support HDTV
applications. During - development, MPEG-2 video proved adequate when
scaled up to meet - HDTV requirements. As a result, MPEG-3 was
dropped.
60Moving Image Coding (33)
MPEG-2 Video Profiles and Levels
61Moving Image Coding (34)
MPEG-4 Video MPEG-4 Video supports low-bit-rate
applications. ISO expert group Developing the
MPEG 4 standard has decided to stop the
development of A new video coding method for low
bit rates. Instead, they focus on Providing
enhanced functionality based on existing
compression methods. For example, the coding of
audio-visual objects. It encodes objects with Any
shape in a video scene. Instead of each image in
the video clip coded As a whole, the stationary
background and the tennis player in the
Foreground can be coded independently with
different methods or Parameter sets. On the
audio side, audio objects are identified and
coded Depending on their contents. One of the
existing video coding methods under study for
MPEG-4 is H.263.
62Moving Image Coding (35)
- H.263
- The ITU-T Recommendation H.263 defines a codec
for the compression - of moving picture component of audio-visual
services at low bit rates. - A typical application is the transmission of
video over a V.34 modem - connection using 20 kbps for video and 6.5 kbps
for audio. - H.263 is based on H.261, but it supports 5 image
formats, it has refined - motion-compensation, and the standard supports
B frames. - B frames in H.263 have only P frames as a
reference. - The up-to-date video compression methods used in
H.263 are - - Wavelet Image Compression
- - Fractal Image Compression
63Moving Image Coding (36)
The approaches have different performance
benefits and costs, and all three approaches are
in use for Internet Multimedia. Some of the costs
are what engineers call non-functional'' ones,
which derive from business cases of the
organizations defining the schemes. There are a
lot of players (stakeholders'') in the
multimedia market place. Many of them have
devised their own system architectures - none the
least of these are the ITU, ISO, DAVIC and the
IETF. The ITU has largely been concerned with
video telephony, whilst DAVIC has concerned
itself with Digital Broadcast technology, and the
IETF has slowly added multimedia (store and
forward and real time) to its repertoire.
64Moving Image Coding (37)
Each group has its own mechanism or family of
mechanisms for identifying media in a stream or
on a store, and for multiplexing over a stream.
The design criteria were different in each case,
as were the target networks and underlying
infrastructure. This has led to some confusion
which will probably persist for a few years now.
Here we look at the 4 major players and their
three major architectures for a multimedia
stream. Two earlier attempts to make sense out of
this jungle were brave goals of Applet and
Microsoft, and we briefly discuss their earlier
attempts to unravel this puzzle - Microsoft have
made recent changes to their architecture at many
levels and this is discussed in their product
specifications and we will not cover it here.
65Moving Image Coding (38)
To cut to the chase, the ITU defines a bit level
interleave or multiplex appropriate to low cost,
low latency terminals and a bit pipe model of the
network, while ISO MPEG group defines a CODEC
level interleave appropriate to digital
multimedia devices with high quality, but
possibly higher cost terminals (it is hard to
leave out a function) finally, the DAVIC and
Internet communities define the multiplexer to be
the network, although DAVIC assume an ATM network
whereas the Internet community obviously assume
an IP network as the fundamental layer.
66Moving Image Coding (39)
The Internet community try to make use of
anything that its possible to use, so that if an
ITU or DAVIC or ISO CODEC is available on an
Internet capable host, someone, somewhere will
sometime devise a way to packetize its output
into IP datagrams. The problem with this is that
it means that for the non-purist approaches of
separate media in separate packets, there are
potentially then several layers of the technical
multiplexing. In a classic paper, David
Tennenhouse describes reasons why this is a very
bad architecture for communicating software
systems. Note that this is not a critique of the
ISO MPEG, DAVIC or ITU H.320 architectures they
are beautiful pieces of design fit for a
particular purpose it is merely an observation
that it is better to unpick their multiplex in
an Internet based system. It certainly leads to
more choice for where to carry out other
functions (e.g. mixing, re- synchronization,
trans-coding, etc etc).
67Digital Signal Processing (1)
Analog to Digital Conversion Sampling An
input signal is converted from some continuously
varying physical value. This continuously
varying electrical signal can then be converted
to a sequence of digital values, called
samples, by some analog to digital conversion
circuit.
68Digital Signal Processing (2)
There are two factors which determine the
accuracy with which the digital sequence of
values captures the original continuous signal
the maximum rate at which we sample, and the
number of bits used in each sample. This latter
value is known as the quantization level.
69Digital Signal Processing (3)
- The raw (uncompressed) digital data rate
associated with a signal then is - simply the sample rate times the number of bits
per sample. -
- To capture all possible frequencies in the
original signal, Nyquist's - theorem shows that the digital rate must be
twice the highest frequency - component in the continuous signal.
- It is often not necessary to capture all
frequencies in the original signal - for example, voice is comprehensible with a
much smaller range of - frequencies than we can actually hear.
- When the sample rate is much lower than the
highest frequency in the - continuous signal, a band-pass filter which
only allows frequencies in - the range actually needed, is usually put
before the sampling circuit. - This avoids possible ambiguous samples
(aliases'').
70Audio Coding (1)
Audio Input and Output Audio signals vary
depending on the application. Human speech has a
well understood spectrum, and set of
characteristics, whereas musical input is much
more varied, and the human ear and perception and
cognition systems behave rather differently in
each case. For example, when a speech signal
degrades badly, humans make use of comprehension
to interpolate. Basically, for speech, the
analog signal from a microphone is passed through
several stages. Firstly a band pass filter is
applied eliminating frequencies in the signal
that we are not interested in (e.g. for telephone
quality speech, above 3.6Khz). Then the signal
is sampled, converting the analog signal into a
sequence of values, each of which represents the
amplitude of the analogue signal over a small
discrete time interval. This is then quantized,
or mapped into one of a set of fixed values
These values are then coded for transmission.
The process at the receiver is simply the reverse.
71Audio Coding (2)
- Audio compression methods differ in the
trade-offs between - Encoder and decoder complexity,
- Quality of the compressed audio, and
- Amount of data.
- A basic audio compression technique employed in
digital telephony is based - on a logarithmic transformation,
- A-law transformation maps from 13 bits linearly
quantized PCM values - to 8 bits, commonly used in Europe
- ?-law transformation maps from 14 bits to 8
bits, used in North America - and Japan
- Both specifications are covered in ITU
recommendation G.711.
72Audio Coding (3)
- Adaptive Differential Pulse Code Modulation
(ADPCM) - ADPCM overcomes the disadvantages of DPCM.
- It is a lossy method that codes differences
between PCM-coded audio - signals using only a small number of bits.
- It can change the step size of the quantizer,
the predictor, and adapt to the - characteristics of the signal.
- It is able to code either the high- or the
low-frequency portion of a signal - exactly, and always operates in one of these
two modes. - It reduces the data rate of high-quality audio
from 1.4 Mbps to 32 kbps. - The ADPCM standard is covered by ITU G.721.
-
73Audio Coding (4)
- MPEG-1 Audio
- MPEG-1 Audio compression is lossy, but it can
achieve transparent, - perceptually lossless compression.
- The algorithm exploits perceptual limitations of
the human hearing - threshold and auditory masking to determine
which part of an audio - signal is acoustically irrelevant and can be
removed in the compression.
80 60 40 20 0
Hearing threshold For the human ear
Amplitude(dB)
Hearing threshold
0.02 0.05 0.1 0.2 0.5 1
2 5 10 20
Frequency(kHz)
74Audio Coding (5)
Auditory masking Auditory masking is a perceptual
weakness of the ear that occurs whenever the
presence of a strong audio signal makes a
spectral neighborhood of weaker audio signals
imperceptible. The threshold for noise masking
at any given frequency is solely dependent on the
signal activity within a critical band of that
frequency.
Strong tonal signal
Amplitude
Region where weaker Signals are masked
Frequency
75Audio Coding (6)
- MPEG-1 Audio defines three layers.
- Compression techniques for each layer are
similar, but coder complexity - increases with each layer.
- Each layer uses a separate but a related way of
compressing audio, - Each layer's decoder must decode audio from any
layer below it. For - example, a Layer III decoder must also decode
Layer II and Layer I audio, - while a Layer II decoder must decode Layer I
audio, but not Layer III. - The input audio stream passes simultaneously
through a filter bank and - through a psychoacoustic model.
- The filter bank divides the input into multiple
subbands, - The psychoacoustic model determines the
signal-to-mask ratio of each - subband.
76Audio Coding (7)
MPEG-1 Audio Encoder
Encoded Bit Stream
PCM Audio Input
Time-to- Frequency Mapping Filter Bank
Bit/Noise Allocation, Quantizer and Coding
Bit-Stream Formating
Psycho- acoustic Model
MPEG-1 Audio Decoder
Decoded PCM Audio
Encoded Bit Stream
Bit-stream Unpacking
Frequency Sample Re- construction
Frequency- to-Time Mapping
77Audio Coding (8)
- MPEG-2 Audio Standard extends the functionality
of MPEG-1 by - multichannel coding with up to five channels
(left, right, center, and two - surround channels), plus an additional
low-frequency enhancement - channel, and/or up to seven commentary/multiling
ual channels. - It extends stero and mono coding of the MPEG-1
Audio standard by - further sampling rates.
- MPEG-2 Audio is backward compatible with MPEG-1
Audio. - An MPEG-2-Audio decoder can process any MPEG-1
Audio bit stream, - also an MPEG-1-Audio decoder can read and
process the stero information - of an MPEG-2-Audio bit stream.
- For more information regarding MPEG, search
engines such as - Altavista, Lycos, Yahoo! can be checked.
78Audio Coding (9)
- Codebook Excited Linear Predictive Coding (CELP)
- The main problem with vocoders is the simplistic
model of the excitation - used. One method of circumventing this problem
is Codebook Excited - Linear Prediction (CELP).
- In the CELP coder the speech is passed through
the cascade of the vocal - tract predictor and the pitch predictor. The
output of this predictor is a - good approximation to Gaussian noise. This
noise sequence has to be - quantized and transmitted to the receiver.
- Multi-pulse coders quantize it using a series of
weighted impulses. - CELP coders use vector quantization. The index
of the codeword that - produces the best quality speech is
transmitted along with a gain term for it. - The codebook search is carried out using an
analysis-by-synthesis technique, - The speech is synthesized for every entry in the
codebook. - The codeword that produces the lowest error is
chosen as the excitation. - The error measure used is perceptually weighted
so the chosen codeword - produces the speech that sounds the best.
79Audio Coding (10)
80Audio Coding (11)
Summary of Audio and Video Input and Output
Audio and Video are loss tolerant, so can use
cleverer compression that discards some
information. Compression of 400 times is possible
on video A lot of standards for this now
including schemes based on PCM, such as ADPCM,
or on models such as LPC, and MPEG Audio. Note
that lossy compression of audio and video is not
acceptable to some classes of user (e.g.
radiologist, or air traffic controller). It is
sometimes said that the eye integrates while
the ear differentiates''. What is meant by this
is that the eye responds to stronger signals or
higher frequencies with cumulative reaction,
while the ear responds less and less (i..e to
double the pitch, you have to double the
frequency - so we hear a logarithmic scale as
linear, and to double the loudness, you have to
increase the power exponentially too).
81References
- F. Kuo et. al, Multimedia Communications
protocols and Applications, 1998 - Prentice Hall
- M. Riley and L. Richardson, Digital Video
Communications, 1997 Artech House - http//dnausers.d-n-a.net/dnetzNRo/mp3info.htm
- http//www.cas.mcmaster.ca/malcolm/cs4cb3/node28.
html - http//www.cs.ucl.ac.uk/staff/jon/mmbook/book/book
.html - ITU-T. Video Codec for Audiovisual Services at 64
kbps. Recommendation H.261, - 1993
- ITU-T. Generic Coding of Moving Pictures and
Associated Audio Recommendation - H.262, 1994
- 8 ITU-T. Video Coding for low bit rate
communications. Recommendation H.263, 1995