Title: Course Progress
1Course Progress
- Image Compression
- Video Compression
- MPEG
- H.261
2Video Compression
- Development of Video Compression Standards by
International Telecommunications Union and Motion
Picture Expert Group - By ITU alone
- H.261-gt H.263-gtH.263, H.263
- By ITU MPEG
- H.262/MPEG2-gt H.264
- By MPEG alone
- MPEG1 -gt MPEG4
3MPEG
- MPEG1
- first video compression standard from MPEG in
1993 - Compress video into sequence of image frames
- MPEG2
- Enhanced video compression standard in 1994
- MPEG4
- object based video compression standard in 1999
- Compress video into composing objects
4MPEG
- MPEG7
- an Extensible Markup Language (XML) based
multimedia content description standard - MPEG21
- an open multimedia framework standard
5MPEG
- Two Source Intermediate Formats (SIF)
- 352240 pixels/frame, 420 subsampling, 30
frames/second, progressive scan, 30 mbps - 352288 pixels/frame, 420 subsampling, 25
frames/second, progressive scan, 30 mbps - MPEG-1 compress SIF video with raw data rate 30
mbps to about 1.1 mbps at VHS VCR quality
6MPEG Compression
- Digital storage media and channels
- Asymmetric applications
- Movies, video-on-demand, education-on-demand,
Point of sales - Compression is done once at production
- Frequent use of decompression process
- Symmetric applications
- Video mail, video conferencing
- Similar compression and decompression frequency
7MPEG-2 Compression
- Asymmetric compression
- Balance between intra-frame and inter-frame
coding - Two basic techniques
- Inter-frames Block based motion compensation to
reduce temporal redundancy - Intra-frame DCT based compression to reduce
spatial redundancy
8MPEG Structure
- An MPEG stream consists of many Group of Pictures
(GOP). - Each GOP consists of 3 types of pictures
I-frame, P-frame, and B-frame - I-frame intra-pictures
- Starting points for random access
- Moderately compressed using JPEG
- P-frame predicted pictures
- Coded with a reference to a past picture
- Used as a reference for future predicted pictures
- B-frame bi-directional predicted pictures
- Require past and future reference for prediction
- Highest compression
9Inter-frame Structure
- The sequence of displaying frames are
- P-frames are displayed between I frames
- B-frames are inserted between I/P-frames
- Dependency among frames
- P frames are predicted from the previous I/P
frame. - B frames are interpolated from the previous and
future I/P frames
1 2 3 4 5 6
7 8 9 10
time
I B B P B B
P B B I
Predictions
10Sequence of frames
- Due to the dependency among frames, the order of
frames are changed in transmission and storage. - Transmission order I frames, P frames, then B
frames - Example
- Sequence (IBBPBBPBB) (IBBPBBPBB) (IBB
- Transmit order (PBBPBB)(IBBPBBPBB) (IBB
- (IPB) ratio (126)
- In 25 frames/second, random access provided at
through 9/25 seconds, about 360 msec.
11I-Frame Encoder
Color Space convertor
Entropy Encoder
Quantizer
FDCT
Compressed I-frame
I-frame
Similar to the JPEG image compression
12P/B-Frame Encoder
Error terms
FDCT
Color Space convertor
Entropy Encoder
Compressed frame
P/B-frame
Motion estimator
RGB?YUV
Reference frames
13Motion Estimation
- Motion compensation to compensate inter-frame
differences due to motion - Block matching techniques Find the best matching
block from the previous frame - Forward prediction
Previous frame
Current frame
A
Best match
A
14Motion Compensated Interpolation
- Interpolate the blocks from the previous and
future frames - Bi-directional prediction
Future frame
Previous frame
Current frame
A
B
C
15MPEG Compression bit rates
MPEG-1
MPEG-2
16MPEG Compression
- Each MPEG video is divided into GOP of I, P, and
B-frames. - I frames are independently compressed like JPEG
images. They are starting points for random
access. - P-frames are compressed using motion estimation
with reference to the previous frames. - B-frames are compressed using interpolation
between previous and future I/P-frames - The frames from the same GOP are stored or
transmitted in the IPB order.
17MPEG4 System
- System Requirements
- Coded representation of interactive audiovisual
scenes - Identification of elementary streams
- Object content information
- User interaction
Source Olivier Avaro, Carsten Herpel Julien
Signès, MPEG4 Tutorial
18Decoding
Network
...
Ex MPEG-2 Transport
Display and Local User Interaction
19Return Channel
Network
TransMux
DAI
20Representation of Multimedia Content
- Identification of Elementary Streams Object
Descriptor - Scene Description BIFS
- Animation of the Scene Animation Streams
- Object Content Information OCI
- Identification of Intellectual Property
21Object Descriptor
- Identification of Elementary Streams
- Stream types and decoder configuration,
- Streams identification and location,
- object content description.
- Association between Streams
- Coding dependency between streams,
- Clock dependencies between streams,
- Association between Scene streams and Media
streams.
22Binary Format for Scenes (BIFS)
- To compose together MPEG-4 media type
- 2D3D, naturalsynthetic, audiovideo,
storedstreamed - in the same environment.
- BIFS is based on VRML (Virtual Reality
Manipulation Language) - it is a set of nodes to represent the primitive
scene objects to be composed, the scene graph
constructs, the behavior and interactivity
23BIFS
- Additionally to VRML, BIFS defines
- 2D capabilities,
- Integration of 2D and 3D,
- Advanced Audio Features,
- a timing model,
- BIFS - Update protocol to modify the scene in
time, - BIFS - Animation protocol to animate the scene in
time, - a binary encoding for the scene.
24Summary to MPEG4
- A coded representation of interactive audiovisual
scenes based on composition and rendering - Identification of elementary streams
- Animate object scenes using object content
information - Allow user interaction with return channel
25MPEG7
- Multimedia objects may be described by some
descriptions called metadata. - Example ltdurationgt of a video, lttime pointgt
within an audio, ltauthorgt of an image, ltkeywordgt
of an object, ltTextAnnotationgt of an object, etc - Different content creation software/hardware may
describe the same object type using different
names and formats. - ? Incompatibility among different software
- ? Difficult to search/index/compare objects from
different databases
26MPEG7
- Provide core technologies allowing the
description of audiovisual data contents - May include still pictures, graphics, 3D models,
audio, speech, video, and composition information - Different granularity in its descriptions to
allow different levels of discrimination
27MPEG7
- The main elements of the MPEG-7 standard are
- Description tools descriptors that define the
syntax and semantics of each metadata element - DDL defines the syntax of the description tools
and allows the creation of new description
schemes - Classification Schema defines a list of typical
terms used in many applications together with
their meanings. - Extensibility supported through MPEG-7 schema
extensions mechanism - System tools support binary coded representation
for efficient storage and transmission.
28MPEG7
Data Definition Language (DDL)
definition
Extension
definition
Classification Schema
MPEG7 document ltMpeg7gt lt/Mpeg7gt
Visual D Type MediaProfile D MediaQuality D
Visual DS Type Segment DS MediaInfo DS
instantiation
Description Scheme
Descriptors
Encoding delivery
29MPEG7
Complete description
Content description
Content management
Content entity
Content abstraction
User description
Creation description
Multimedia content
Semantic description
Summary description
Type derivation hierarchy
Video
Audio
Image
30MPEG-7 Example
- A list of integer values can be defined as
follows - ltsimpleType nameintegerVectorgt
- ltlist itemTypeinteger/gt
- lt/simpleTypegt
31MPEG-7 Example
- All MPEG-7 documents should have the following
header - lt?xml version1.0 encodingiso-8859-1?gt
- ltMpeg7 xmlnsurnmpegmpeg7schema2001
xmlnsxsihttp//www.w3.or/2001/XMLSchema-instanc
e - xmlnsmpeg7urnmpegmpeg7schema2001
xmlnsxmlhttp//www.w3.org/XML/1998/namespace
xsischemaLocationurnmpegmpeg7schema2001
Mpeg7-2001.xsdgt - lt! Your MPEG-7 content - gt
- lt/Mpeg7gt
32MPEG-21
- Many elements exist to build an infrastructure
for the delivery and consumption of multimedia
content - No big picture to describe how these elements
relate to each other
33MPEG-21
- MPEG21 defines an open MM framework
- Enable transparent and augment use of MM
resources across a wide range of networks and
services - Cover the entire content delivery chain
- Content creation, production, delivery,
personalization, consumption, presentation, and
trade. - Describes the framework as a generic architecture
34MPEG-21
- MPEG-21 defines
- Digital item a fundamental unit of distribution
and transaction - User model the concept of users interacting with
DIs.
35MPEG-21
- Digital Item (DI)
- a hierarchical container of resources, metadata,
and other digital item - User
- User is any entity that interacts in the MPEG-21
environment or that makes use of a DI. - E.g. individuals, corporations, consumer can be
users
36MPEG-21
- Relationship among users and digital items.
- User may use content in many ways (publish,
deliver, consume, and so on).
Digital item
use
interaction
user
user
37MPEG-21
- Seven key architectural elements
- Digital Item Declaration gives precise
description of what constitutes a DI. - Digital Item Identification identifies and
describes a DI. - Content management and usage provide interfaces
and protocols that enable creation, manipulation,
search, access, storage, and (re)use of content
across its delivery and consumption chain - Intellectual Property Management and Protection
enables DIs and their rights to be persistently
and reliably managed and protected
38MPEG-21
- Seven key architectural elements (cont.)
- Terminals and networks provide tools that enable
the interoperable and transparent access to
content across networks and terminals - Content representation defines how the media
resources are represented - Event reporting supplies the metrics and
interfaces that enable users to understand
precisely the performance of all reportable events
39MPEG-21 Example
- The following XML declares the root element DIDL
and several phto albums are grouped together in a
DI. - ltDIDLgt
- ltcontainergt
- ltDescriptorgt
- ltStatement mimeType text/plaingtMy Photo
Albums in Itemslt/Statementgt - lt/Descriptorgt
- ltItem idAlbum1gt lt/Itemgt
- ltItem idAlbum2gt lt/Itemgt
- lt/Containergt
- lt/DIDLgt
40ITU
- ITU International Telecommunication Union
- 2 video formats
- Common Intermediate Format (CIF)
- compress video signals in half the resolution of
BT.601 420 format in both horizontal and
vertical dimensions. - 352288 pixels, 420 subsampling, 30 frames
progressive scan, 37 mbps.
41ITU
- 2 video formats
- QCIF
- Half the resolution of CIF in both horizontal and
vertical directions. - 176144 pixels/frame, 420 subsampling, 30
frames/second, progressive scan, 9 mbps. - Both video formats are non-interlaced and
developed for video conferencing applications.
42H.261
- ISDN (integrated service digital network) lines
only allow transmission rates in multiples of 64
kbps. - H.261 compress the CIF and QCIF to p64kbps,
where p 1, 2, , 30. - Compress a CIF signal with a raw data rate of 37
mbps to 128-384 kbps with reasonable quality. - Compress QCIF signal with a raw data rate of 9
mbps to 64-128 kbps.
43H.261 Encoder
- Features of a H.261 encoder
- H.261 subdivides the image into macroblocks of
size 16x16 pels - A macroblock consists of 4 luminance and 2
chrominance blocks (1 for Cb and 1 for Cr) - Each macroblock is transformed using a 8x8 DCT
for each block to reduce spatial redundancy - Uses the Differential Pulse Code Modulation
(DPCM) to exploit temporal redundancy - Uses unidirectional integer-pel forward motion
compensation
44H.261 Encoder
p
Mux
Coding Control
t
qz
q
Q
T
Video in
Q-1
Channel
T-1
d
P
F
f
45H.261 Encoder
- T Transform
- Q Quantizer
- P Picture memory with motion estimator and
motion compensation unit - F Loop filter
- p flag for Intra/Inter
- t flag for transmitted or not
- qz quantizer indication
- q quantizing index for transform coefficients
- d motion vector
- f switching on/off of the loop filter
46H.261 Encoder
- Each macroblock is transformed using a 8x8 DCT
(block T) - DCT coefficients are
- scanned using a zigzag scan
- quantized
- converted into a pair of symbols
- Encoded using variable length codewords
DC AC01 AC07
AC70
AC77
47H.261 Encoder
- Differential Pulse Code Modulation (DPCM)
- Encodes the difference from the previous DC
coefficient - Two quantizers for DCT coefficients (block Q)
- A uniform quantizer with stepsize of 8 is used in
intra-mode for DC coeff. - A nearly uniform midtread quantizer with stepsize
of 2 to 62 for AC coeff.
48H.261 Encoder
- A nearly unifom midtread quantizer with input
amplitude x and output amplitude Q(x). - The dead zone, from -T to T, is quantized to
zero. Otherwise, the stepsize is uniform. - The dead zone avoids coding many small coeff.
that would mainly contribute to coding noise.
-T
T
49H.261 Encoder
- Each symbol includes two fields
- the number of coefficients that are quantized to
zero - the amplitude of the current nonzero coefficient.
- The symbols are encoded using Variable Length
Codewords (VLC) like Huffman coding. - The encoder sends an End of Block (EOB) after the
last nonzero coefficient. - E.g. The quantized DCT coefficients
- 5 0 0 2 3 0 0 4 0 0 0 0 0 0 1 0 0 0 0 0 0
are converted into(0, 5), (2, 2), (0, 3), (2,
4), (6, 1), EOB.
50H.261 Encoder
- Uses unidirectional integer-pel forward motion
compensation (block P) - Uses a 2D loop filter to low-pass the
motion-compensated prediction signal (block F) - Decreases the prediction error and the
blockliness of the prediction image
51H.261 Encoder
- The encoder transmits mainly 2 classes of
information for each macroblock - DCT coefficients from the transform of the
prediction error signal - Motion vectors that are estimated by the motion
estimator. - Motion vectors are limited to 16 pels.
52H.261 Encoder
- In intra-mode, the bit stream contains transform
coefficients for each block. - In inter-mode, the encoder has a choice of
sending - a differentially coded motion vector (MVD) with
or without the loop filter on. - a coded block pattern (CBP) to specify the blocks
for which transform coefficients will be
transmitted
53Summary to Video Compression
- MPEG-2 compresses video into a collection of
I/P/B frames - I-frames uses DCT
- P-frames uses motion estimation
- B-frames uses bi-directional interpolation
- MPEG-4 animates video as interactive scenes of
elementary streams. - MPEG-7 describes multimedia objects with
metadata. - MPEG-21 builds a framework for users to
manipulate digital items which are resources and
metadata of multimedia objects
54Summary to Video Compression
- H.261
- Divides images into 16x16 pel macroblocks
- Transforms macroblock with 8x8 DCT
- Encodes DCT coefficients using Differential Pulse
Code Modulation and Variable Length Codewords - Uses unidirectional integer-pel forward motion
compensation