Overview of Digital Video 1

About This Presentation

Title:

Overview of Digital Video 1

Description:

Analog video only use a bandwidth of few megahertz, but the bit rate of ... information prior to the network transmission and decompress (decode) ... – PowerPoint PPT presentation

Number of Views:93

Avg rating:3.0/5.0

Slides: 82

Provided by: ycc5

Category:

more less

Transcript and Presenter's Notes

Title: Overview of Digital Video 1

1
Overview of Digital Video (1)

Digital video is regarding video information
that is stored and transmitted
in digital form.
Analog video only use a bandwidth of few
megahertz, but the bit rate of
the digital signal for transmitting the same
video content is typically over
100 Mbps, which is too high to be feasible for
most of the networks today.
Video compression is the solution for both
stored video and video
transmission over the network. The the video
compression techniques
have been evolving for the past two decades.
Due to the advance of the processor technology
and the development of
international video compression standards, a
wide range of video
communications applications have been developed
in recent years.

2
Overview of Digital Video (2)

Representation of Video Information
A color image is represented in terms of
component signals.
A color can be synthesized by combining three
color primaries,
red, green and blue (RGB).
Each of the three primary contains information
of luminance (brightness)
and chrominance (color), which can be
represented separately.
A luminance signal (Y) can be produced from a
weighted sum of R, G,
and B components.
The chromaticity of the color can be represented
by the color difference
signal.
Cr wr (R-Y)
Cb wb (B-Y)
Cg wg (G-Y)
where wr , wb , and wg are weighting factors.

3
Overview of Digital Video (3)

Standards for Analog Color TV
NTSC (National Television System Committee),
this is used in the USA
and Japan. NTSC use 525 lines per frame, and
its field rate is 60 Hz (i.e.
30 frame per second)
PAL (Phase Alternation Line) system is used in
most of Western Europe.
SECAM (Sequential Colour avec Memoire ) is used
in France and in
parts of Eastern Europe.
In both PAL and SECAM systems, each frame
consists of 625 active
lines, and the field rate is 50 Hz (i.e. 25
frames per second) .
All the above systems use three components
luminance Y, blue color
difference U (equivalent to Cb), and red color
difference V (equivalent
to Cr).

4
Overview of Digital Video (4)

ITU-R (International Telecommunications Union -
Radio) Standard
The ITU-R (former CCIR) provides a standard
method of encoding
television information in digital form.
The luminance and color difference components
are sampled with a
precision of 8-bits.
The luminance component of an NTSC frame is
sampled to produce an
image of 525 lines, each containing 858
samples.The active area of the
digitized frame is 720 x 486 picture elements
(pixels).
The luminance component of a PAL or SECAM frame
is sampled with
625 lines and 864 samples, and the active area
is 720x576 pixels.
The color difference signals are sampled with
the same vertical resolution
(486 or 576 active lines) and the horizontal
resolution halved.
Only the odd-numbered luminance pixels in each
line have associated
color difference pixels.

5
Overview of Digital Video (5)
Bit rate of CCIR 601 digital TV signal NTSC
original bit rate 30 x 8 x( (858x525)
(429x525x2)) 216.216Mbps - 30 frame per
second - 858 x 525 luminance samples - 429 x 525
x 2 chrominance samples - 8 bits per
sample PAL/SECAM original bit rate 25 x 8 x(
(864x625)(432x625x2))
216.0Mbps - 25
frame per second - 864 x 625 luminance samples -
432 x 625 x 2 chrominance samples - 8 bits per
sample
6
Why Video Compression ?
A single digital TV signal for CCIR 601 format
needs 216 Mbps bit rate, which is unacceptable
for most practical network transmission. On the
other hand, a traditional analog TV signal with
similar quality only requires 6 7 MHz.
Obviously, network transmission of original
digital TV signal is too expensive to be
practical. Before digital TV or video can be
transmitted through the network, the data rate
needs to be reduced. This means to compress
(encode) the digital video information prior to
the network transmission and decompress (decode)
the received video information before displaying
it. Digital video information contains a
considerable amount of redundancy. Video data is
usually highly correlated both spatially and
temporally. This redundancy can be removed by
coding the data in a more efficient way.
7
Overview of Compression Methods (1)

Types of redundancy
Spatial redundancy - the values of neighboring
pixels are strongly
correlated in almost all natural images.
Redundancy in scale - important image features
such as straight edges
and constant regions are invariant under
rescaling.
Redundancy in frequency - in images composed of
more than one
spectral band, the spectral values for the same
pixel location are often
correlated and an audio signal can completely
mask a sufficiently
weaker signal in its frequency vicinity.
Temporal redundancy - adjacent frames in a
video sequence often
show very little change, and a strong audio
signal in a given time block
can mask an adequately lower distortion in a
previous or future block.
Stereo redundancy - audio coding methods can
take advantage of the
correlations between stereo channels.

8
Overview of Compression Methods (2)
Characteristics of Compression Methods
9
Overview of Compression Methods (3)
Relation between perceptible quality and required
bandwidth
Quality
high
lossless compression
lossy compression
low
Bandwidth
high
low
10
Overview of Compression Methods (4)
Coding Techniques for Multimedia Systems
11
Overview of Compression Methods (5)

Entropy coding (lossless)
Entropy is defined as the average information
content of given data.
It defines the minimum number of bits needed to
represent the information
content without information loss.
Entropy coding is a lossless technique, it tries
to achieve this theoretical
lower limit.
Source coding (lossy)
It distinguishes relevant and irrelevant data.
It takes into account the semantics of the data
and removes the irrelevant
data, so that the original data stream can be
compressed.
Hybrid coding (lossy)
This is a combination of entropy coding and
source coding

Compressed data
Uncompressed data
Source Coding steps
Entropy Coding steps
Preparation
12
Basic Coding Methods (1)

Entropy coding does not use semantics of the
data only bit stream is
Considered.
Run-Length Coding
Uncompressed data ABCCCCCCCDEFGH
Run-Length coded A!7CDEFGH
Huffman Coding
p(A) ¾, p(B) 1/8,
p(C) p(D) 1/16
w(A) 1, w(B) 01,
w( C) 001,
w(D) 000

0
1
0
1
1
0
13
Basic Coding Methods (2)

Arithmetic Coding
This coding method often generates slightly
better results in audio and
video coding because it works with floating
points instead of the characters.
used in Huffman coding.
The floating points is computationally more
expensive.
It has been shown that average compression
achieved by arithmetic and
Huffman coding is very similar.
The algorithm for arithmetic coding is covered
by patents held by IBM,
ATT and Mitsubishi

14
Basic Coding Methods (3)

Discrete Cosine Transform Coding
Pixels are grouped into blocks, which are
transformed into another domain
to form a set of coefficients, these
coefficients are coded and transmitted.
Compression is done by quantizing the
coefficients so that the useful
coefficient are transmitted and the remaining
coefficients are discarded.
The most effective compaction is achieved using
Karhunen-Loeve
Transform (KLT), but it is very
computationally intensive, while discrete
cosine transform (DCT) is a widely used
alternative to KLT.
A DCT-based image coding system usually consists
of following steps
- Separate the image into blocks
- Discrete Cosine Transform
- Quantization
- Encoding

15
Basic Coding Methods (4)
Discrete Cosine Transform The DCT converts a
block of pixels into a block of transform
coefficients of the same dimensions. These
coefficients represent the spatial frequency
components that make up the original pixel
block. Each coefficient can be thought of as a
weight that is applied to an appropriate basis
function.
DC basis function
Increasing horizontal frequency
Increasing vertical frequency
16
Basic Coding Methods (5)

A gray-scale 8 x 8 pixel block can be fully
represented by a weighted sum
of these 64 basis functions.
The appropriate weights that are required to
produce a particular block are
the DCT coefficients for that block.
The two-dimensional DCT of an N x N block of
pixel values

and the inverse DCT
where F(u,v) is the transform coefficient and
f(i,j) is the pixel value, and
17
Basic Coding Methods (6)

Each coefficient in the forward transform is
calculated by summing 64
separate calculations, this results a total of
64 x 64 4,096 calculations
for transforming an 8 x 8 block of pixels.
The above calculation process can be replaced by
a one-dimensional
transform along all the rows and then all the
columns of the block.
Since each coefficient in the one-dimensional
transform requires 8
calculations, so either a row or a columns
needs 8 x 8 64 calculations,
this results 64 x 8 (for 8 rows) 64 x 8 (for
8 columns) 1,024
calculations.
The computational complexity can be further
reduced by replacing the
cosine form of the transform with an algorithm
which only perform a
series of multiplication and addition
operations

18
Basic Coding Methods (7)
Frequency distribution
vertical
DC
DC
low
diagonal
medium
horizontal
high
block features
frequency distribution
19
Basic Coding Methods (8)
Zigzag Sequence Order of AC coefficients
with increasing frequency
20
Basic Coding Methods (9)

Differential Pulse Code Modulation (DPCM)
The image is scanned in a raster fashion
Each pixel is represented as a number with a
limited precision
A predictive pixel value based on the previously
transmitted pixel is
transmitted instead of the actual value.
The prediction error between the predicted pixel
value and the actual value
is quantized and transmitted.
Encoding the quantized error using variable
length codes to achieve further
compression.

Current pixel
21
Basic Coding Methods (10)

Motion-Compensated Prediction
Temporal redundancies between two frames bin a
video sequence can be
exploited.
The idea is to look for a certain area in
previous or subsequent frame that
matches very closely an area of the same size
in the current frame.
If successful, a best matching block can be
found, and the difference signal
between the block intensity values of the
block in the current frame and
the block in the reference frame is
calculated.
The motion vector, which represents the
translation of corresponding
blocks in both x- and y-direction is
determined.
The difference signal and the motion vector
represent the deviation between
reference block and predicted block, both are
called prediction error.

22
Basic Coding Methods (11)

There are three types of motion-compensated
prediction
Unidirectional motion-compensatd prediction
Bidirectional motion-compensatd prediction
Interpolative motion-compensated prediction

Forward Motion- Compensated Prediction
po
23
Still Image Coding (1)
Image Coding
Encoder
Decoder
Input image
Compressed data
Image output
Encoder model
Entropy Encoder
Entropy decoder
Decoder model

Image coding techniques
Predictive coding
Discrete Cosine Transform Coding
Subband coding
Fractal coding

24
Still Image Coding (2)

JPEG (Joint Photographic Experts Group)
JPEG standard defines 4 coding modes
Sequential DCT-based encoding - each component
is encoded in a single
scan based on DCT.
Progressive DCT-based encoding - it uses
multiple scans, each scan
contains a partially encoded version of the
image. The scans can be
decoded sequentially so that a rough image is
quickly decoded and this is
then built up using further scans.
Hierarchical encoding - each component is
encoded at multiple resolutionstor,
dufferring by a factor of two or
Lossless encoding it is based on DPCM system.
This mode provides
compression without any loss of quality using
more time consuming
algorithms.

25
Still Image Coding (3)
Four Coding Modes of JPEG
Progressive DCT
Sequential DCT
Sequential lossless
Hierarchical
26
Still Image Coding (4)
27
Still Image Coding (5)

Image preparatioon
A source image must have a rectangular format
and consist of 1 to 255
planes or components, such as RGB or YUV. After
the separation of
components, each component is divided into data
units of 8x8 pixel blocks.
Picture processing
The baseline mode compresses the data by
applying a two-dimensional
DCT, then quantizing and entropy coding the
corresponding DCT
coefficients. There are 64-element quantization
table associated with
64 DCT coefficient.
Entropy encoding
JPEG specifies both Huffman and arithmetic
encoding for entropy coding.

28
Coding of Moving Images
A video CODEC can be anything from the simplest
A2D device, through to something that does
picture pre-processing, and even has network
adapters build into it. A CODEC usually does
most of its work in hardware, but there is no
reason not to implement everything in software on
a reasonably fast processor. The most
expensive and complex component of a CODEC is the
compression/decompression part. There are a
number of international standards and many
number of proprietary compression techniques for
video.
29
Moving Image Coding (1)

H.261
H.261 is the most widely used international
video compression standard
for video conferencing.
This ITU (was CCITT) standard describes the
video coding and decoding
methods for the moving picture component of an
audiovisual service at
the rates of p 64 Kbps where p is in the
range of 1 to 30.
The standard targets and is really suitable for
applications using circuit
switched networks as their transmission
channels. This is as ISDN with
both basic and primary rate access was the
communication channel
considered within the framework of the
standard.
H.261 is usually used in conjunction with other
control and framing
standards such as H.221, H.230 H.242 and H.320,
of which more later.

30
Moving Image Coding (2)
Processing Steps of the H.261 Video Codec
External control
Coding control
Transmission buffer
Transmission coder
Source coder
Video multiplex coder
Video signal
Coded bit stream
Video coder
Video coder
31
Moving Image Coding (3)

H.261 Inage preparation
The source coder operates on only non-interlaced
pictures. Pictures are
coded as luminance and two color difference
components(Y, Cb, Cr). The
Cb and Cr matrices are half the size of the Y
matrix.
H.261 supports two image resolutions, CIF (Common
Intermediate Format)
and QCIF.
CIF
QCIF
Y 352 x 288 176
x 144
Cb 176 x 144 88
x 72
Cr 176 x 144
88 x 72
CIF and QCIF frames are divided into a
hierarchical block structure
Consisting of picture, group of blocks (GOB),
macro blocks, and blocks.

32
Moving Image Coding (4)

Hierarchical block structure of H.261
Structure Element
Description
Picture (frame) 1
video picture
Group of blocks 33
macro blocks
Macro block 16
x 16 Y, 8 x 8 Cb, Cr
Block
8 x 8 pixels (coding unit for DCT)
Picture
H.261 uses two types of coding macro blocks,
intraframe and interframe.
There is no advantage regarding the redundancy
between frames. Beyond
this, H.261 tries to make use of temporal
redundancies by means of
motion-compensated prediction.

33
Moving Image Coding (5)

The first frame to be transmitted is always an
intraframe coded frame (i.e.
all macro blocks are intraframe coded.)
The entire picture is divided into
nonoverlapping 8x8 pixel blocks on
which the forward DCT is applied.
The resulting 64 DCT coefficients quantized and
zigzag-reordered
For interframe coding, the recently coded frame
is decoded again within
the encoder using inverse quantization and
inverse DCT.
For the next frame to be encoded, the last
previously coded and stored
frame is used for deciding whether to
intraframe- or interframe-code
each macro block.
The algorithm performs a unidirectional
motion-compensated prediction
which uses four luminance blocks of each macro
block to find a close
match in the previous frame for the macroblock
currently encoded.
If it cannot find a close match, it employs the
same coding for the macro
block as in intraframe coding.

34
Moving Image Coding (6)
Block Transformation H.261 supports motion
compensation in the encoder as an option. In
motion compensation a search area is constructed
in the previous (recovered) frame to determine
the best reference macroblock . Both the
prediction error as well as the motion vectors
specifying the value and direction of
displacement between the encoded macroblock and
the chosen reference are sent. The search area
as well as how to compute the motion vectors are
not subject to standardization. Both horizontal
and vertical components of the vectors must have
integer values in the range 15 and 15 though
In block transformation, INTRA coded frames as
well as prediction errors will be composed into
8x8 blocks. Each block will be processed by a
two- dimensional FDCT function. If this sounds
expensive, there are fast table driven
algorithms and can be done in s/w quite easily,
as well as very easily in hardware.
35
Moving Image Coding (7)

The motion estimation process results in three
possible decisions for
coding a macro block.
- Intracoding, where blocks of 8x8 pixels each
are only with reference
to themselves and are sent directly to the
block transformation process.
- Intercoding with motion compensation (the
motion vector has zero
value)
Intercoding with motion compensation.
There is an optional filter between DCT and the
entropy coding process,
which can be used to improve the image quality by
removing high-
frequency noise as needed.
Quantization in H.261 is a linear function, the
quantization step size
depends on the amount of data in the transform
buffer, thereby generating
a constant data rate at the output of the coder.

36
Moving Image Coding (8)

A prediction error is calculated between a 16x16
pixel region (macroblock)
and the (recovered) correspondent macroblock in
the previous frame.
Prediction error of transmitted blocks (criteria
of transmission is not
standardized) are then sent to the block
transformation process.
Blocks are inter or intra coded
Intra-coded blocks stand alone
Inter-coded blocks are based on predicted
error between the previous
frame and this one
Intra-coded frames must be sent with a minimum
frequency to avoid loss
of synchronization of sender and receiver.

37
Moving Image Coding (9)
Quantization Entropy Coding The purpose of
this step is to achieve further compression by
representing the DCT coefficients with no
greater precision than is necessary to achieve
the required quality. The number of quantizers
are 1 for the INTRA dc coefficients and 31 for
all others. Entropy coding involves extra
compression (non-lossy) is done by assigning
shorter code-words to frequent events and longer
code-words to less frequent events. Huffman
coding is usually used to implement this step.
In other words, for a given quality, we can
lose coefficients of the transform by using less
bits than would be needed for all the values This
leads to a "coarser" picture. We can then
entropy code the final set of values by using
shorter words for the most common values and
longer ones for rarer ones (like using 8 bits
for three letter words in English)
38
Moving Image Coding (11)
H.263 H.263 is a new addition to the ITU H
series and is aimed at extending the repertoire
to Video Coding for Low Bit Rate Communication.
This makes it suitable to a wide variety of
Internet access line speeds, and therefore also
probably reasonably friendly to many Internet
Service Providers backbone speeds. Existing
A/V Standards and Video and the basic technology
of CCD camera and of Television and general CRT
dictates frame grabbing at some particular
resolution and rate. The choice of resolution is
complex. One could have fixed number of pixels,
and aspect ratio, or allow a range of choice of
line rate and samples rates. H.261 and MPEG
choose the latter.
39
Moving Image Coding (12)
The line rate (a.k.a. Picture Clock Frequency -
PCF) is 30,000/1001 or about 29.97Hz but one can
also use multiples of this. The chosen
resolution for H.263 is dxdy luminance and
chrominanace is just one half this in both
dimensions. H.263 then allows for sub-QCIF which
is 12896 pixels, QCIF - 176144 pixels, CIF -
352288 pixels, 4CIF (SCIF in the INRIA Ivs
tool) 704576 pixels and 16CIF 14081152 pixels.
The designer can also choose a pixel aspect
ration the default is 288/3 352/4 which is
1211 (as per H.261). The picture area covered by
standard formats has aspect ratio of 43.
Luminance and chromnance sample positions as
per H.261, discussed earlier in this chapter.
The structure of the coder is just the same too,
although there are now two additional modes
called the slice'' and picture block''
modes.
40
Moving Image Coding (13)
A block is 1616 Y and 88 Cb and Cr each The
Group of Block, or GOB refers to k16 lines
GOBS are numbered using a vertical scan starting
with 0 to k, depending on the number of lines in
Picture. e.g. normally, when lines lt 400, where
k is 1. The number of GOBS per picture then is 6
for subQCIF, 9 for QCIF, 18 for CIF (and for
4CIF and 16CIF because of special rules).
Prediction works on Intra, inter, B, PB, EI or EP
(the reference picture is smaller). The
Macroblock is 16 lines of Y, and the
corresponding 8 each of Cb and Cr Motion vectors
of which we can receive 1 per Macroblock. H.263
extends H.261 over lower bit rate (not just
p64kbps design goal) and more features for
better quality and services, but the basic ideas
same. Intra and Inter frame compression
DCT block transform plus quantization
41
Moving Image Coding (14)
There are then a number of basic enhancements in
H.263 including 1. Continuous Presence
Multi-point and Video Multiplex mode - basically
4 in 1 sub-bit-stream transmission. This may
be useful for conferences, tele-presence,
surveillance and so on 2. Motion Vectors can
point outside picture 3. Arithmetic as well as
variable length coding (VLC) 4. Advanced
Prediction Mode which is also known as
Overlapped Block Motion Compensation'
uses 4 88 blocks instead of 1 1616, This gives
better detail. 5. PB Frames known as
combined Predictive and Bi-Directional frames
(like MPEG II). 6. FEC to help with
transmission loss Advanced Intra coding to help
with interpolation Deblocking Filter mode,
to remove blocking artifacts
42
Moving Image Coding (15)
7. Slice Structured Mode (re-order blocks so
Slice layer instead of GOB layer is more
delay and loss tolerant for packet transport 8.
Supplemental Enhancement Information,
Freeze/Freeze Release and Enhancement and
Chroma Key (use external picture as
merge/background etc...for mixing). 9.
Improved PB mode, including 2 way motion vectors
in PB mode 10. Reference Picture Selection 11.
Temporal, SNR and Spatial Scalability mode this
allows receivers to drop B frames for
example - gives potential heterogeneity amongst
receivers of multicast. 12. Reduced
Resolution Update Independent Segment decoding
Alternate INTER VLC mode 13. Modified
Quantization mode (can adjust up or down the
amount of quantization to give fine
quality/bit-rate control.
43
Moving Image Coding (16)
Chroma Keying is a commonly used technology in
TV, e.g. for picture in picture/superimpose etc,
for weather people and so on. The idea is to
define some pixels in an image as
transparent'' or semi-transparent'' and
instead of showing these, a reference,
background image is used (c.f. transparent GIFs
in WWW). We need an octet per pixel to define the
keying color for Y, Cb and Cr, each. The actual
choice when there isn't an exact match is
implementor defined.
44
Moving Image Coding (17)

MPEG
The aim of the MPEG-II video compression standard
is to cater for the
growing need of generic coding methods for moving
images for various
applications such as digital storage and
communication. So unlike the
H.261 standard who was specifically designed for
the compression of
moving images for video conferencing systems at p
64Kbps , MPEG
is considering a wider scope of applications.
Aimed at storage as well as transmission
Higher cost and quality than H.261
Higher minimum bandwidth
Decoder is just about implementable in
software
Target 2Mbps to 8Mbps really.
The "CD" of Video?

45
Moving Image Coding (18)
MPEG Source Images format The source pictures
consist of three rectangular matrices of
integers a luminance matrix (Y) and two
chrominance matrices (Cb and Cr). The MPEG
supports three formats 420 format - In this
format the Cb and Cr matrices shall be one half
the size of the Y matrix in both horizontal and
vertical dimensions. 422 format - In this
format the Cb and Cr matrices shall be one half
the size of the Y matrix in horizontal
dimension and the same size in the vertical
dimension. 444 format - In this format the Cb
and Cr matrices will be of the same size as the
Y matrix in both vertical and horizontal
dimensions. It may be hard to convert to this,
but then this is targeted at digital video tape
and video on demand really.
46
Moving Image Coding (19)
MPEG frames The output of the decoding process,
for interlaced sequences, consists of a series
of fields that are separated in time by a field
period. The two fields of a frame may be coded
independently (field-pictures) or can be coded
together as a frame (frame pictures). An MPEG
source encoder will consist of the following
elements - Prediction (3 frame times)
- Block Transformation - Quantization and
Variable Length Encoding The diagram in the
following shows the intra, predictive and
bi-directional frames that MPEG supports
47
Moving Image Coding (20)
MPEG GOP structure
Forward prediction
1
3
4
5
6
7
8
9
2
Bidirectional prediction
48
Moving Image Coding (21)
Structure of MPEG bitstream
Sequence layer
GOP layer
Picture layer
Slice layer
Macroblock layer
Block layer
49
Moving Image Coding (22)

MPEG Prediction
MPEG defines three types of pictures
Intrapictures (I-pictures)
These pictures are encoded only with respect
to themselves. Here each
picture is composed onto blocks of 8x8 pixels
each that are encoded only
with respect to themselves and are sent
directly to the block transformation
process.
Predictive pictures (P-pictures)
These are pictures encoded using motion
compensated prediction from a
past I-picture or P-picture. A prediction
error is calculated between a
16x16 pixels region (macroblock) in the
current picture and the past
reference I or P picture.

50
Moving Image Coding (23)
A motion vector is also calculated to determine
the value and direction of the prediction. For
progressive sequences and interlaced sequences
with frame-coding only one motion vector will be
calculated for the P-pictures. For interlace
sequences with field-coding two motion vectors
will be calculated. The prediction error is then
composed to 8x8 pixels blocks and sent to the
block transformation Bi-directional pictures
(B-pictures) These are pictures encoded using
motion compensates predictions from a past
and/or future I-picture or P-picture. A
prediction error is calculated between a 16x16
pixels region in the current picture and the past
as well as future reference I-picture or
P-picture. Two motion vectors are calculated.
One to determine the value and direction of the
forward prediction the other to determine the
value and direction of the backward prediction.
51
Moving Image Coding (24)
For field-coding pictures in interlaced sequences
four motion vectors will thus be calculated.
It should be noted that a B-picture can never
be used as a prediction picture. The method of
calculating the motion vectors as well as the
search area for the best predictor is left to be
determined by the encoder. MPEG Block
Transformation In block transformation, INTRA
coded blocks as well as prediction errors are
processed by a two-dimensional DCT function.
Quantization The purpose of this step is to
achieve further compression by representing the
DCT coefficients with no greater precision than
is necessary to achieve the required quality.
52
Moving Image Coding (25)
Variable length encoding Here extra compression
(non-lossy) is done by assigning shorter
code-words to frequent events and longer
code-words to less frequent events. Huffman
coding is usually used to implement this step.
MPEG Picture Order It must be noted that in
MPEG the order of the picture in the coded stream
is the order in which the decoder process them.
The reconstructed frames are not necessarily in
the correct form of display.
53
Moving Image Coding (26)
Multiplexing and Synchronizing In networked
multimedia standards, the multiplexing function
defines the way that multiple streams of
different or the same media of data are carried
from source to sink over a channel. There are at
least three completely different points in this
path where we can perform this function we can
design a multi-media codec which mixes together
the digital coded (and possibly compressed)
streams as it generates them - possibly
interleaving media at a bit by bit level of
granularity we can design a multiplexing layer
that mixes together the different media as it
packetizes them, possibly interleaving samples
of different media in the same packets or we can
let the network do the multiplexing, packetizing
different media streams completely separately.
54
Moving Image Coding (27)
MPEG-1 Video MPEG-1 consists of several parts
System, Video and Audio etc. Beyond simple
playback, the MPEG-1 system is responsible for
multiplexing and synchronization. MPEG-1 video
distinguishes between four different coding types
for Images I frames, P frames, B frames and D
frames. I frames (intracoded frames) are coded
without reference to other images. MPEG makes use
of JPEG for I frames. The compression rate for I
frames is the lowest for all defined coding
types. P frames (predictively coded frames) need
information from the previous I and/or P frame
for encoding and decoding. The achievable
compression is higher than that for I frames.
55
Moving Image Coding (28)
B frames (bidirectionally predictively coded
frames) require information from the previous and
following I and/or P frames for encoding and
decoding. The highest compression ration can be
achieved, also a bidirectional
motion-compensation prediction can be used. D
frames (DC coded frame) are encoded intraframe,
whereby the AC coefficient are neglected. D
frames can never be used with the other picture
types. Reference frames must be transmitted
first. The transmission order and the display
order may differ. At the beginning, there is
always an I frame. The first I frame and the
first P frame is also the reference for the
first two B frames. The first I frame is also the
reference of the P frame. Thus the I frame must
be transmitted first, followed next by the P
frame and then the B frame.
56
Moving Image Coding (29)
Display and Transmission Order in MPEG-1 Video
Display order
I B B P B B I
Transmission order I P
B B I B B The second I frame
must be transmitted since it serves as reference
for The second pair of B frames.
57
Moving Image Coding (30)
Video Sequence
Group of Pictures
?
???
???
Block
Picture
Macro block
Slice
8 Pixel
?
8 Pixel
58
Moving Image Coding (31)
MPEG-1 Constrained Parameter Set
Parameter Restrictions Horizont
al resolution ? 768 pixel Vertical
resolution ? 576 lines Macro
blocks/s ? 25 macro
blocks/s Frames/s ?
30 Hz Motion vector range ? (-64/ 63,
5) pixel Input buffer size ?
327.680 bit Bit rate
? 1 856 Mbps MPEG-1 video uses the same
image format as H.261 but allows a greater Choice
of image size.
59
Moving Image Coding (32)

MPEG-2 Video
MPEG-2 video conforms with MPEG-1. It allows
data rates up to 100
Mbps, also it supports interlaced video formats
as well as HDTV.
MPEG-2 can be used for the digital transmission
of video over satellite,
cable, and other broadcast channels.
MPEG-2 builds upon the completed MPEG-1 standard
and was
cooperatively developed by ISO/IEC and ITU
(H.262).
MPEG-2 video was defined in terms of extensible
profiles, each of which
supports the features needed by an important
class of applications.
Initially, MPEG-3 was intended to support HDTV
applications. During
development, MPEG-2 video proved adequate when
scaled up to meet
HDTV requirements. As a result, MPEG-3 was
dropped.

60
Moving Image Coding (33)
MPEG-2 Video Profiles and Levels
61
Moving Image Coding (34)
MPEG-4 Video MPEG-4 Video supports low-bit-rate
applications. ISO expert group Developing the
MPEG 4 standard has decided to stop the
development of A new video coding method for low
bit rates. Instead, they focus on Providing
enhanced functionality based on existing
compression methods. For example, the coding of
audio-visual objects. It encodes objects with Any
shape in a video scene. Instead of each image in
the video clip coded As a whole, the stationary
background and the tennis player in the
Foreground can be coded independently with
different methods or Parameter sets. On the
audio side, audio objects are identified and
coded Depending on their contents. One of the
existing video coding methods under study for
MPEG-4 is H.263.
62
Moving Image Coding (35)

H.263
The ITU-T Recommendation H.263 defines a codec
for the compression
of moving picture component of audio-visual
services at low bit rates.
A typical application is the transmission of
video over a V.34 modem
connection using 20 kbps for video and 6.5 kbps
for audio.
H.263 is based on H.261, but it supports 5 image
formats, it has refined
motion-compensation, and the standard supports
B frames.
B frames in H.263 have only P frames as a
reference.
The up-to-date video compression methods used in
H.263 are
- Wavelet Image Compression
- Fractal Image Compression

63
Moving Image Coding (36)
The approaches have different performance
benefits and costs, and all three approaches are
in use for Internet Multimedia. Some of the costs
are what engineers call non-functional'' ones,
which derive from business cases of the
organizations defining the schemes. There are a
lot of players (stakeholders'') in the
multimedia market place. Many of them have
devised their own system architectures - none the
least of these are the ITU, ISO, DAVIC and the
IETF. The ITU has largely been concerned with
video telephony, whilst DAVIC has concerned
itself with Digital Broadcast technology, and the
IETF has slowly added multimedia (store and
forward and real time) to its repertoire.
64
Moving Image Coding (37)
Each group has its own mechanism or family of
mechanisms for identifying media in a stream or
on a store, and for multiplexing over a stream.
The design criteria were different in each case,
as were the target networks and underlying
infrastructure. This has led to some confusion
which will probably persist for a few years now.
Here we look at the 4 major players and their
three major architectures for a multimedia
stream. Two earlier attempts to make sense out of
this jungle were brave goals of Applet and
Microsoft, and we briefly discuss their earlier
attempts to unravel this puzzle - Microsoft have
made recent changes to their architecture at many
levels and this is discussed in their product
specifications and we will not cover it here.
65
Moving Image Coding (38)
To cut to the chase, the ITU defines a bit level
interleave or multiplex appropriate to low cost,
low latency terminals and a bit pipe model of the
network, while ISO MPEG group defines a CODEC
level interleave appropriate to digital
multimedia devices with high quality, but
possibly higher cost terminals (it is hard to
leave out a function) finally, the DAVIC and
Internet communities define the multiplexer to be
the network, although DAVIC assume an ATM network
whereas the Internet community obviously assume
an IP network as the fundamental layer.
66
Moving Image Coding (39)
The Internet community try to make use of
anything that its possible to use, so that if an
ITU or DAVIC or ISO CODEC is available on an
Internet capable host, someone, somewhere will
sometime devise a way to packetize its output
into IP datagrams. The problem with this is that
it means that for the non-purist approaches of
separate media in separate packets, there are
potentially then several layers of the technical
multiplexing. In a classic paper, David
Tennenhouse describes reasons why this is a very
bad architecture for communicating software
systems. Note that this is not a critique of the
ISO MPEG, DAVIC or ITU H.320 architectures they
are beautiful pieces of design fit for a
particular purpose it is merely an observation
that it is better to unpick their multiplex in
an Internet based system. It certainly leads to
more choice for where to carry out other
functions (e.g. mixing, re- synchronization,
trans-coding, etc etc).
67
Digital Signal Processing (1)
Analog to Digital Conversion Sampling An
input signal is converted from some continuously
varying physical value. This continuously
varying electrical signal can then be converted
to a sequence of digital values, called
samples, by some analog to digital conversion
circuit.
68
Digital Signal Processing (2)
There are two factors which determine the
accuracy with which the digital sequence of
values captures the original continuous signal
the maximum rate at which we sample, and the
number of bits used in each sample. This latter
value is known as the quantization level.
69
Digital Signal Processing (3)

The raw (uncompressed) digital data rate
associated with a signal then is
simply the sample rate times the number of bits
per sample.
To capture all possible frequencies in the
original signal, Nyquist's
theorem shows that the digital rate must be
twice the highest frequency
component in the continuous signal.
It is often not necessary to capture all
frequencies in the original signal
for example, voice is comprehensible with a
much smaller range of
frequencies than we can actually hear.
When the sample rate is much lower than the
highest frequency in the
continuous signal, a band-pass filter which
only allows frequencies in
the range actually needed, is usually put
before the sampling circuit.
This avoids possible ambiguous samples
(aliases'').

70
Audio Coding (1)
Audio Input and Output Audio signals vary
depending on the application. Human speech has a
well understood spectrum, and set of
characteristics, whereas musical input is much
more varied, and the human ear and perception and
cognition systems behave rather differently in
each case. For example, when a speech signal
degrades badly, humans make use of comprehension
to interpolate. Basically, for speech, the
analog signal from a microphone is passed through
several stages. Firstly a band pass filter is
applied eliminating frequencies in the signal
that we are not interested in (e.g. for telephone
quality speech, above 3.6Khz). Then the signal
is sampled, converting the analog signal into a
sequence of values, each of which represents the
amplitude of the analogue signal over a small
discrete time interval. This is then quantized,
or mapped into one of a set of fixed values
These values are then coded for transmission.
The process at the receiver is simply the reverse.
71
Audio Coding (2)

Audio compression methods differ in the
trade-offs between
Encoder and decoder complexity,
Quality of the compressed audio, and
Amount of data.
A basic audio compression technique employed in
digital telephony is based
on a logarithmic transformation,
A-law transformation maps from 13 bits linearly
quantized PCM values
to 8 bits, commonly used in Europe
?-law transformation maps from 14 bits to 8
bits, used in North America
and Japan
Both specifications are covered in ITU
recommendation G.711.

72
Audio Coding (3)

Adaptive Differential Pulse Code Modulation
(ADPCM)
ADPCM overcomes the disadvantages of DPCM.
It is a lossy method that codes differences
between PCM-coded audio
signals using only a small number of bits.
It can change the step size of the quantizer,
the predictor, and adapt to the
characteristics of the signal.
It is able to code either the high- or the
low-frequency portion of a signal
exactly, and always operates in one of these
two modes.
It reduces the data rate of high-quality audio
from 1.4 Mbps to 32 kbps.
The ADPCM standard is covered by ITU G.721.

73
Audio Coding (4)

MPEG-1 Audio
MPEG-1 Audio compression is lossy, but it can
achieve transparent,
perceptually lossless compression.
The algorithm exploits perceptual limitations of
the human hearing
threshold and auditory masking to determine
which part of an audio
signal is acoustically irrelevant and can be
removed in the compression.

80 60 40 20 0
Hearing threshold For the human ear
Amplitude(dB)
Hearing threshold
0.02 0.05 0.1 0.2 0.5 1
2 5 10 20

Frequency(kHz)
74
Audio Coding (5)
Auditory masking Auditory masking is a perceptual
weakness of the ear that occurs whenever the
presence of a strong audio signal makes a
spectral neighborhood of weaker audio signals
imperceptible. The threshold for noise masking
at any given frequency is solely dependent on the
signal activity within a critical band of that
frequency.
Strong tonal signal
Amplitude
Region where weaker Signals are masked
Frequency
75
Audio Coding (6)

MPEG-1 Audio defines three layers.
Compression techniques for each layer are
similar, but coder complexity
increases with each layer.
Each layer uses a separate but a related way of
compressing audio,
Each layer's decoder must decode audio from any
layer below it. For
example, a Layer III decoder must also decode
Layer II and Layer I audio,
while a Layer II decoder must decode Layer I
audio, but not Layer III.
The input audio stream passes simultaneously
through a filter bank and
through a psychoacoustic model.
The filter bank divides the input into multiple
subbands,
The psychoacoustic model determines the
signal-to-mask ratio of each
subband.

76
Audio Coding (7)
MPEG-1 Audio Encoder
Encoded Bit Stream
PCM Audio Input
Time-to- Frequency Mapping Filter Bank
Bit/Noise Allocation, Quantizer and Coding
Bit-Stream Formating
Psycho- acoustic Model
MPEG-1 Audio Decoder
Decoded PCM Audio
Encoded Bit Stream
Bit-stream Unpacking
Frequency Sample Re- construction
Frequency- to-Time Mapping
77
Audio Coding (8)

MPEG-2 Audio Standard extends the functionality
of MPEG-1 by
multichannel coding with up to five channels
(left, right, center, and two
surround channels), plus an additional
low-frequency enhancement
channel, and/or up to seven commentary/multiling
ual channels.
It extends stero and mono coding of the MPEG-1
Audio standard by
further sampling rates.
MPEG-2 Audio is backward compatible with MPEG-1
Audio.
An MPEG-2-Audio decoder can process any MPEG-1
Audio bit stream,
also an MPEG-1-Audio decoder can read and
process the stero information
of an MPEG-2-Audio bit stream.
For more information regarding MPEG, search
engines such as
Altavista, Lycos, Yahoo! can be checked.

78
Audio Coding (9)

Codebook Excited Linear Predictive Coding (CELP)
The main problem with vocoders is the simplistic
model of the excitation
used. One method of circumventing this problem
is Codebook Excited
Linear Prediction (CELP).
In the CELP coder the speech is passed through
the cascade of the vocal
tract predictor and the pitch predictor. The
output of this predictor is a
good approximation to Gaussian noise. This
noise sequence has to be
quantized and transmitted to the receiver.
Multi-pulse coders quantize it using a series of
weighted impulses.
CELP coders use vector quantization. The index
of the codeword that
produces the best quality speech is
transmitted along with a gain term for it.
The codebook search is carried out using an
analysis-by-synthesis technique,
The speech is synthesized for every entry in the
codebook.
The codeword that produces the lowest error is
chosen as the excitation.
The error measure used is perceptually weighted
so the chosen codeword
produces the speech that sounds the best.

79
Audio Coding (10)
80
Audio Coding (11)
Summary of Audio and Video Input and Output
Audio and Video are loss tolerant, so can use
cleverer compression that discards some
information. Compression of 400 times is possible
on video A lot of standards for this now
including schemes based on PCM, such as ADPCM,
or on models such as LPC, and MPEG Audio. Note
that lossy compression of audio and video is not
acceptable to some classes of user (e.g.
radiologist, or air traffic controller). It is
sometimes said that the eye integrates while
the ear differentiates''. What is meant by this
is that the eye responds to stronger signals or
higher frequencies with cumulative reaction,
while the ear responds less and less (i..e to
double the pitch, you have to double the
frequency - so we hear a logarithmic scale as
linear, and to double the loudness, you have to
increase the power exponentially too).
81
References

F. Kuo et. al, Multimedia Communications
protocols and Applications, 1998
Prentice Hall
M. Riley and L. Richardson, Digital Video
Communications, 1997 Artech House
http//dnausers.d-n-a.net/dnetzNRo/mp3info.htm
http//www.cas.mcmaster.ca/malcolm/cs4cb3/node28.
html
http//www.cs.ucl.ac.uk/staff/jon/mmbook/book/book
.html
ITU-T. Video Codec for Audiovisual Services at 64
kbps. Recommendation H.261,
1993
ITU-T. Generic Coding of Moving Pictures and
Associated Audio Recommendation
H.262, 1994
8 ITU-T. Video Coding for low bit rate
communications. Recommendation H.263, 1995