Title: Video Compression
1Video Compression
2Video Compression Standards
- JPEG ISO and ITU-T
- for compression of still image
- Moving JPEG (MJPEG)
- H.261 ITU-T SG XV
- for audiovisual service at p x 64Kbps
- MPEG-1, 2, 4, 7 ISO IEC/JTC1/SC29/WG11
- for compression of combined video and audio
- H.263 ITU-T SG XV
- for videophone at a bit-rate below 64Kbps
- JBIG ISO
- for compression of bilevel images
- Non-standardized techniques
- DVI de facto standard from Intel for storage
compression and real-time decompression - QuickTime Macintosh
3Frame/Picture Types
- I frame Intra-coded frame
- points for random access
- used as a reference for coding other frames
- Use JPEG except quantization threshhold values
are same for all DCT components - P frame Predictively coded frame
- based on the reference frame (previous I or P
frame) - B frame Bidirectionally predictively coded frame
- based on the previous and following I and/or P
frames - D frame DC coded frame
- intra-coded frame, neglecting AC coefficients
- used for fast forward and rewind mode
4Group of Picture (GOB) Structure
5Display and Transmission Order
- Transmission order and display order may differ
- Reference frames must be transmitted first
Forward prediction
1
2
3
4
5
6
7
8
9
I
B
B
B
P
B
B
B
I
Bidirectional prediction
Transmission Order 1 5 2 3 4 9 6 7 8
I P B B B I B B B
6Motion Estimation and Compensation
- Macroblock motion compensation unit
- Motion Estimation
- extracts the motion information from a video
sequence - Motion information
- one motion vector for forward-predicted
macroblock - two motion vectors for bidirectionally predicted
macroblocks - Motion Compensation
- reconstructs an image using blocks from the
previous image along with motion information,
I.e., motion vectors
7Implementation Issues
- In case of P-frames, encoding of each macrobock
is dependent on the output of motion estimation
unit - If two contents are the same, only the address of
the MB in the reference frame is encoded - If very close, both the motion vector and the
difference matices are encoded - If no close match is found, encode in the same
way as in I-frame
8Implementation Schematics
Bitsteram format
9Performance
- I-frame
- Similar to JPEC
- 101 201
- P-frame
- 201 201
- B-frame
- 301 501
10Video CompressionH.261
11H.261 Overview
- ITU-T standard for the compression/ decompression
of digital video (1990) - to facilitate video conferencing and video phone
over ISDN at the rate of p x 64 kbps p 1,2,
... ,30 - real-time encoding-decoding (? 150ms)
- low-cost VLSI implementation
12Picture preparation
- An image 3 rectangular matrices (components)
- Luminance Y
- Chrominance Cb (blue), Cr (red)
- 411 format
- Image format
- CIF(common intermediate format) 352x288
- Used for video conferencing
- 30fps, progressive scanning
- QCIF(Quarter CIF) 176x144
- Used for video telephony
- 15 / 7.5fps, progressive scanning
- QCIF is mandatory. CIF is optional
- Bandwidth requirement of CIF with 15 fps
- Y 352 x 288 x 8bits/pixel x 15frame/sec
- Cb Cr 2 x ¼ x Y
- 18.3 Mbps ? need more than 501 compression for
transmitting at 384 Kbps (p6) - I, P-frames are used in H.261
- 3 P-frames between each pair of I-frame
13H.261 Encoding Format
Frame format
GOB structure
Macro block format
14H.261 Video Encoder
15Entropy Encoding
- Run-length encoding
- (run, amplitude)
- Huffman encoding
- Huffman table is predefined by the H.261 standard
- table for motion vectors
- table for quantized DCT coefficient
16Video CompressionH.263
17H.263
- Low-bit rate standard for teleconferencing
applications - Optimize H.261 so as to operate on below 64Kbps
or V.34 Modem - 2.5 times more compressed than H.261
- An extension of H.261
- 2 image formats ? 5 image formats
- Motion-compensated prediction has been refined
- supports B frame( has only P frame as a
reference) - Used in IETF RTSP(Real Time Streaming Protocol)
- Used in RealPlayer G2
18Picture Preparation
- Digitization format
- QCIF(Quarter CIF) 176x144
- Used for video telephony
- 15 / 7.5fps, progressive scanning
- Sub-QCIF (S-QCIF) 128 x 96
- Progressive scanning, 15 / 7.5fps
- Frame types
- I, P, B frames
19Picture Processing
- Unrestricted motion vectors
- For those pixels of a potential close-match MB
that fall outside of the frame boundary, the edge
pixels themselves are used instead - The resulting MB produce a close match, then the
motion vector, if necessary is allowed to point
outside of the frame area
20Error resilience
- Target network for H.263 is a wireless network or
PSTN ? relatively high error rate - Error propagation
- Due to the resulting errors in the motion
estimation vectors and motion compensation
information, errors within a GOB may propagate to
other regions of the frame - To minimize error propagation
- Error tracking
- Independent segment decoding
- Reference picture selection
21Error tracking
- Error detection methods
- Out-of-range motion vectors
- Invalid variable length codewords
- Out-of-range DCT coefficients
- Excessive number of coefficients within a MB
22Independent Segment Decoding
Effect of a GOB being corrupted
- Each GOB is treated as a separate subvideo which
is independent or the other GOBs in the frame - Motion estimation and compensation is limited to
the boundary pixels of a GOB rather than a frame
Used with error tracking
23Reference Picture Selection
NAK mode
ACK mode
24MPEGVideo Compression
25MPEG
- MPEG(Moving Picture Experts Group)
- ISO/IEC JTC1/SC29/WG11
- standard for synchronized video and audio
- consists of System, Video, Audio,
- System for multiplexing and synch.
- MPEG-1
- ISO Recommendation 11172
- Intended for the storage of VHS-quality
audio-visual information on CD-ROM at bit rates
up to 1.5Mbps - Video resolution SIF (up to 352 x 288 pixels)
- Compressed bandwidth ? 1.5 Mbps
- about 1.1Mbps for video, 128Kbps for audio,
remainder for system - Allows random access, fast forward, rewind
- MPEG-2
- Intended for the recording and transmission of
studio-quality audio and video - MPEG-4
- Initially, concerned with a similar range of
applications to those of H.263, at very low bit
rate 4.8 64 kbps - Later interactive multimedia applications over
the Internet and the various types of
entertainment networks - MPEG-7
- To describe the structure and features of the
content of the (compressed MM information
26MPEG-1
27MPEG-1 frames
- Spatial resolution 352 x 288 pixels (SIF)
- Progressive scanning with refresh rate of 30Hz
(for NTSC) and 25Hz (for PAL) - Standard allows use of
- I-frames only
- I- and P-frames only
- I-, P-, B- frames
- No D frames are supported
- I-frame is used for random-access functions
- Example sequence
- IBBPBBPBBI for PAL
- IBBPBBPBBPBBI for NTSC
28Use of B Frame
29Overview
- Compression algorithm is based on H.261
- MB
- Y plane 16x16, Cb, Cr plane 8x8
- Differences from H.261
- Time-stamps (temporal references) to enable the
decoder to resynchronize more quickly in the
event of one or more corrupted or missing MBs - Introduction of B-frames,
- Search window in the reference frame is increased
- To improve the accuracu of the motion vectors, a
finer resolution is used - Typical compression ration
- I-frame 101
- P-frame 201
- B-frame 501
30(No Transcript)
31MPEG System
- MPEG Standard
- Video coding
- Audio coding
- System coding
- Timing and Synchronization
- Presentation Time Stamps(PTS)
- Decoding Time Stamps(DTS)
- System Clock Reference(SCR)
32MPEG-1 Video Bitstream Structure
Composition
Format
- GOP layer video coding unit
- First picture must start with I frame for edting
- Picture layer primary coding unit
- Slice layer resynchronization unit
- Macroblock layer motion compensation unit
- Block layer DCT unit
33MPEG Frame Structure
MPEG-1
MPEG-2
34Constrained Parameter set
- horizontal size lt 720 pels
- vertical size lt 576 pels
- total number of macroblocks/picture lt 396
- total number of macroblocks/second lt 39625
33030 - picture rate lt 30 fps
- bit rate lt 1.86 Mbps
- decoder buffer lt376,832 bits
35MPEG Encoding Scheme
36MPEG Decoding Scheme
37MPEG-2
38MPEG-2 Video
- jointly developed by ISO/IEC (IS 13818-2) and
ITU-T (H.262) - permits data rates up to 100Mbps
- supports interlaced video formats
- supports HDTV,
- can be used for video over satellite, cable, and
other broadband channels - backward compatibility with MPEG-1 and H.261
39MPEG-1 and MPEG-2
Parameter MPEG-1 MPEG-2
Standardized 1992 1994
Main application Digital video on CD-ROM Digital TV (and HDTV)
Spatial resolution SIF format (1/4 TV) 360x288 pixels TV (4xTV) 720x576 (1440x1152)
Temporal resolution 25/30 frame/s 50/60 fields/s (100/120 fields/s)
Bit rate 1.5 Mbps 4 Mbps (20 Mbps)
Quality VHS NTSC/PAL for TV
Compression ratio over PCM 20-30 30-40
40MPEG-2 Profile and Levels
41Main Profile at Main Level (MP_at_ML)
- Target application digital TV broadcasting
- Interlaced scanning 2 fields
Field mode Suitable for live sports
Frame mode Suitable for studio-based program
42HDTV
- 3 Standards
- ATV (advance television) in North America
- DVB (digital video broadcast) in Europe
- MUSE (multiple sub-Nyquist sampling encoding) in
Japan and rest of Asia - ITU-R HDTV specification
- 16/9 aspect ratio
- 1920 sample/line, 1152(1080 visible) lines/frame
- Interlaced scanning with 420 format
- ATV standard Grand Alliance standard
- ITU-R spec 1280 x 720, 16/9 aspect ratio
- Video compression MP_at_HL
- Audio compression Dolby AC-3
- DVB standard
- 4/3 aspect ration, 1440 x 1152(1080 visible)
- Video compression SSP_at_H1140 (spatially-scalable
profile) - MUSE standard
- 16/9 aspect ratio, 1920 x 1034
- Video compression similar to MP_at_HL
43MPEG-4
44Goal of MPEG-4 (1)
- Initial goal was to refine H.261 with a
compression ratio 10 times better. But, failed. - Consequently, the focus was shifted to
development of standard for - Flexible bitstreams that are scalable for
receivers with different capabilities such as
resolutions - Extendable configuration for transmitters to
download new applications and algorithms into
receivers - Content-based interactivity for multimedia data
access, manipulations and bitstream editing, and
hybrid, natural and synthetic data - Network independence, so that it can be used with
any communication network to provide universal
accessibility
45Goal of MPEG-4 (2)
- MPEG-4 standards for
- Multimedia content generation
- Network interface for multimedia transport
- Interactivity for users
- Content-based interactivity
- Defined by SNHC (Synthetic and Natural Hybrid
Coding) group - Coding for a synthetic human face and body
- Animation of the face and body
- Media integration of text and graphics
- Texture coding for view-dependent applications
- Static and dynamic mesh coding with texture
mapping - Interface for text-to-speech synthesis and
synthetic audio
46AVO Audio/Visual Object
- Primitive AVOs
- 2D fixed background
- Picture of a walking and talking lady without the
background - Voice associated with that person
- Compound AVO
- e.g) AVO that contains both the audio and visual
components of a talking and walking person - MPEG-4 treats the audiovisual activities and
associated operations, including compression,
decompression, multiplexing and synchronization
of audiovisual activities, as objects similar
to OOP - View as a configuration, communication, and
instantiation of classes of objects - VOP (Video Object Plane)
- a video object at any given time
- Video encoder encodes each VOP separately
47Content-based Video Coding
48User Interaction
- User interaction operations with the decoded
scene following the design of the scenes author - Changing view/listening point of the scene by
navigating through a scene - Dragging objects to different positions
- Triggering a sequence of events by clicking on a
specific object, including the starting and
stopping of a video stream - Selecting the desired language when multiple
language tracks are available
49Scalability and Accessibility
- MPEG-4 video object coding supports spatial and
temporal scalability - This allows the receiver to decode only a part of
a bitstream and reconstruct images or image
sequences - Good for video delivery over multimedia networks
due to bandwidth limitation - Good for displaying limited resolution due to
receivers capability - Universal accessibility to support various
communication media - MPEG-4 provides error robustness and resilience
for a noisy environment such as mobile networks - Supports audio and video compression algorithms
in error-prone environments at low bit-rates ( lt
64 Kbps)
50Audio Compression
- Compressed using one of algorithms, depending on
available bit rate of the transmission channel
and sound quality required, e.g. - G.723.1 (CELP) for interactive MM applications
over Internet - Dolby AC-3, or MPEG Layer 2 for interactive TV
applications over entertainment networks
51MPEG-4 Encoder/Decoder
VOP endcoder
MPEG-4 decoder
52Error Resilience Techniques
- Use of fixed-length video packets (VP 188B)
instead of GOBs
- New variable-length coding (VLC) scheme based on
reversible VLCs
Convential GOB approach
Using fixed-length VP
53Applications of MPEG-4
- Real-time communication systems
- Mobile computing
- Content-based storage and retrieval
- Streaming video on the Internet
- Collaborative scene visualization
- High-quality broadcasting
- Studio and TV post-production
- Interactive movie, travel guide, computer-based
teaching, Karaoke
54MPEG-7Multimedia Content Description Interface
55Overview
- Description, identification and access of AV
information - Used to perform a search for AV information
- Search picture using characteristics such as
color, texture or shape of objects - MPEG-7 description can be attached to any kind of
multimedia material independent of the format of
the representation - Visual description based on
- Color, texture, sketch, 2D and 3D shape, still
images, 3D visual data, spatial composition
relations, temporal composition information - Audio description base on
- Frequency contour, frequency profile,
prototypical soound, souce of sound, stereo of
5.1-channel or binaural sounds
56MPEG-7 Applications
- Medical diagnosis
- Home shopping
- Search for video and audio database
- Architecture, interior design
- Multimedia directory services
57MHEG
58Overview
- Standardized by ISO/IEC/JTC1/SC29 WG12
- Describes how video is displayed, audio is
replayed and the means by which a user can
interact with the ongoing presentation - Also addresses multiplatform issue
- Uses ASN.1 for representing data structure
- More functionality than HTML
- Multimedia handling capabilities such as
synchronization of stream, replay speed control,
users interactivity with stream events - Uses 3 spatial coordinates and time to
synchronize the presentation
59MHEG Applications
- Video on demand
- Interactive multimedia service
- Interactive TV