Title: Introduction to MPEG-4
1Introduction toMPEG-4
2Outline
- Multimedia
- MPEG-4 Profiles
- Key Features of MPEG-4 Systems
- MPEG-4
- Systems
- DMIF
- Audiovisual Objects and Scene Graph
- Editing, Composition and Rendering
- Coding Basics
- Coding Techniques
3Multimedia
- What is multimedia?
- Combination of audio, video, image, graphic, and
text. - Coverage of all human I/Os.
- Why does multimedia need to be coded?
4(No Transcript)
5Multimedia Coding for Different Applications
- Mobile devices
- Low data-rate, error resilience, scalability
- Streaming service
- Scalability, low to medium data-range,
interactivity - On-disk distribution (DVD)
- Interactivity
- Broadcast
- On-demand services
6Profiles in MPEG-4
- Visual Profiles
- Audio Profiles
- Graphics Profiles
- Scene Graph Profiles
- MPEG-J Profiles
- Object Descriptor Profile
7NewPred
8H.263 Baseline
9Key Features of MPEG-4 Systems
- Provides a consistent and complete architecture
for the coded representation of the desired
combination of streamed elementary audio-visual
information. - Covers a broad range of applications,
functionality and bit rates. - Through profile and level definitions, it
establishes a framework that allows consistent
progression from simple applications (e.g., an
audio broadcast application with graphics) to
more complex ones (e.g., a virtual reality home
theater).
10Key Features of MPEG-4 Systems (2)
- A set of tools for the representation of the
multimedia content - a framework for object description (the OD
framework), - BIFS a binary language for the representation
(format) of multimedia interactive 2D and 3D
scene description, - SDM and SyncLayer a framework for monitoring and
synchronizing elementary data stream, and - MPEG-J programmable extensions to access and
monitor MPEG-4 content.
11Key Features of MPEG-4 Systems (3)
- MPEG-4 System defines an efficient mapping of the
MPEG-4 content on existing delivery
infrastructures. - FlexMux an efficient and simple multiplexing
tool to optimize the carriage of MPEG-4 data
(into different QoS channels), - Extensions allowing the carriage of MPEG-4
content on MPEG-2 and IP systems, and a flexible
file format for authoring, streaming and
exchanging MPEG-4 data.
12MPEG-4IS0/IEC 14496 Terminal Architecture
13Systems
- Timing Model
- Buffer Model
- Multiplexing of Streams
- Synchronization of Streams
- The Compression Layer
- Object Description Framework
- Scene Description Streams
- Audio-visual Streams
- Upchannel Streams
14Systems Decoder Model
15(No Transcript)
16IS0/IEC 14496 Terminal Architecture
17Network-based Multimedia System
18The Objectives of DMIF
Delivery Multimedia Integration Framework
- to hide the delivery technology details from the
DMIF User - to manage real time, QoS sensitive channels
- to allow service providers to log resources per
session for usage accounting - to ensure interoperability between end-systems
19(No Transcript)
20DMIF Communication Architecture
signaling
21High View of a Service Activation
22Audiovisual Objects
- Audiovisual scene is with objects
- Mixed different objects on the screen
- Visual
- Video
- Animated face body
- 2D and 3D animated meshes
- Text and Graphics
- Audio
- General audio mono, stereo, and multichannel
- Speech
- Synthetic sounds (Structured audio)
- Environmental spatialization
23Example of MPEG-4 Video Objects
From Olivier Avaro
24(No Transcript)
25The Scene Graph
26- Composition
- Description Synchronization
- Delivery of streaming data
- Interaction with media objects
- Management and identification of intellectual
property
27Major Components
28Composition Rendering
Media Objects
29Adding or Removing Objects (1)
30Adding or Removing Objects (2)
From Igor S. Pandžic
31Adding or Removing Objects (3)
- Applications
- Video conferencing
- Real-time, automatic
- Separate foreground (communication partner) from
background - Object tracking in video
- May allow off-line and semi-automatic
- Separate moving object from others
32MPEG-4 Coding Basics
33Toolbox Approach
tools for synthetic scenes
tools for natural scenes
TOOLS
ALGORITHMS
PROFILES
34Coding Techniques
- Video objects
- Shape
- Motion vectors
- texture
- Audio objects
- MPEG
- AAC (Advanced Audio Coder)
- TTS (Text-To-Speech)
- Face and Body
- Animation parameters
- 2D Mesh
- Triangular patches
- Motion vector
35Content-based Audio-Visual Representation
- Audio-Visual Object (AVO)
- Video object component (video object plane, VOP)
- natural or synthetic
- 2D or 3D
- Audio object component
- mono, stereo or multi-channel
36Video Object Planes (VOP)
- Characteristics of VOP
- may have different spatial temporal resolutions
- may be associated with different degrees of
accessibility ? sub-VOPs - may be separated or overlapping
- VOP type
- Traditional I, P, B type
- S-VOP (Sprite) for background
37Video Object Plane Type
S-VOP
Time
S-VOP
B-VOP
B-VOP
B-VOP
B-VOP
B-VOP
B-VOP
I-VOP
P-VOP
P-VOP
38Content-based Object Manipulation
- Object manipulation
- change of the spatial position of a VOP
- application of a spatial scaling factor to a VOP
- change of the speed with which an VOP moves
- insertion of new VOPs
- deletion of an object in the scene
- change of the scene area
39Segmentation Process
- Depending on applications, segmentation can be
perform - Online (real-time) or offline (non-real-time)
- Automatic or semi-automatic
- Examples
- Video conferencing
- real-time, automatic
- separate foreground (communication partner) from
background - Object Tracking in Video
- May allow off-line and semi-automatic
- separate moving object from others
40Compression
- Improved coding efficiency
- 5-64 kbps for mobile applications
- up to 20Mbps for TV/film applications
- subjectively better quality compared to existing
standard - Coding of multiple concurrent data streams
- can code multiple views of a scene
efficiently,e.g. stereo video
41Coding VO in MPEG-4
- Reduce temporal redundancy
- Motion estimation for arbitrary shaped VOPs
- padding and modified block (polygon) matching
motion estimation
P-VOP
B-VOP
time
I-VOP
42Encoding of Visual Objects
- Binary alpha block
- Motion vector
- Context-based arithmetic encoding
- Texture
- Motion vector
- DCT
43New Coding Features
- For each macroblock, the motion vectors can be
computed on a 16 ? 16 or 8 ? 8 block basis - Unrestricted motion estimation prediction can
extend over image boundary - Overlapped block motion compensation
- Each component of texture can range from 1 to 12
bits - More robust coding
44Robust Video Coding
- Resynchronization
- Allow insertion of resync marker within each VOP
- Video packet header include macroblock number,
qunatizer value and timing information - Data partition
- Allow shape, motion and texture data to be
separated within a packet - Reversible VLC
- Offer partial recovery from errors.
45Sprite VOP
- Represent background image
- Can be used for very efficient coding of scenes
involving camera pan and zoom - Much larger than the size of image and thus
require more memory
46Example of Sprite VOP
47Object Mesh
- Useful for animation, content manipulation,
content overlay, merging natural and synthetic
video and others - Tesselate with triangular patches
- Define motion vector for each node
- 2D motion of video objects are represented by the
motion vectors of the node points - Motion compensation is achieved by warping of
texture map corresponding to patches by affine
transform
48Example of Object Mesh
49Face Animation
- Face model
- Default face model
- Download from the encoder
- Low-level facial animation
- A set of 66 facial animation parameters
- High-level facial animation
- A set of primary facial expression like joy,
sadness, surprise and disgust - Speech animation
- 14 visemes for mouth shape
- Text-to-speech synthesizer
50Facial Animation
From Eine Ãœbersicht
51Still Texture Coding
- Discrete Wavelet Transform (DWT)
- Spatial and quality scalability
- Use 2D Daubechies (9, 3)-tap biorthogonal filter
- Lowest band is lossless coded by arithmetic
coding - Higher bands are coded by multilevel
quantization, zero-tree scanning and arithmetic
coding
52Audio Coding
- Different bit-rates, different types of source
material and different algorithms - Combination of parameter based coding, LPC-based
coding, time/frequency based coding - High quality speech with 2 kbps Harmonic Vector
eXcitation Coding (HVXC) - Text-to-Speech (TTS)
53Natural Audio Coder
Telephone
From Olivier Dechazal
54Multiview Video
55Stereo Sequence Coding
- Multiview profile of MPEG-2
- Coding left view seqence Sl, first, for the right
view sequence, each frame is predicated from the
corresponding frame in Sl, based on an estimated
disparity field and the prediction error image
are coded.
P
B
B
B
Right view
I
B
B
P
Left view
56Intermediate View Synthesis
57Original left
Original right
Regular mesh on the left image
Corresponding mesh on the right image
Predictive right image by mesh (27.48 dB)
Predictive right image by BMA (32.03 dB)
The mesh-based scheme yields a visually more
accurate prediction
58MPEG-4 Coding Techniques
Shape Coding Shape-adaptive DCT Object-based
Inter-frame Coding Overlapped Motion
Estimation Bit-plane Coding and FGS
59Object-Based Coding
60Shape Coding
- Bitmap Coding
- Context-Based Arithmetic Encoding (CAE)
- Contour Coding
- Chain Coding
- Baseline Shape Coding
- Polygon Approximation
- Skeleton-Based Shape Coding
- Quadtree Coding
61Context-Based Arithmetic Encoding
16
16
Transparent block
Boundary blocks
Opaque block
BOUNDING BOX
Conditional entropy coding
62Context-Based Arithmetic Encoding
16
16
Transparent block
Boundary blocks
Conditional entropy coding
Opaque block
BOUNDING BOX
63Chain Coding
0
0
3
0
0
3
3
3
2
3
3
2
2
2
1
2
2
1
1
1
0
0
1
1
starting points
4
4 - connected
8 - connected
64Chain Coding
starting points
4
4 - connected
8 - connected
65Differential Chain Code
- DCC records the move (forward, leftward or
rightward) regarding two consecutive directional
links.
F
F
F
R
L
F
R
L
66Baseline Shape Coding
67Polygon Approximation
d2
d1
d3
- Select vertices that are optimal in the
rate-distortion sense. - Splines are adopted to approximate the contour.
68Skeleton-Based Shape Coding
69Quadtree Coding
70Shape-adaptive DCT
71Inter-frame Coding Reconstruction of Object
Shape
MVS MVPS MVDS MVS MV for shape MVPS
predication MVDS difference (BAC)
72The context for Inter-frame Coding
73Overlapped Motion Estimation
74Weighting Coefficients in Overlapped Motion
Estimation
75Fine Granularity Scalable
76FGS Video Encoder Structure
77Bit-plane Coding
quantized residual
5 7 8 7 6 2 0 4 3 8 1 2 3 0 3 5
4 6 8 6 6 2 0 4 2 8 0 2 0 0 0 4
binary transfer
0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0
1 1 0 1 1 0 0 1 0 0 0 0 0 0 0 1
0 1 0 1 1 1 0 0 1 0 0 1 1 0 1 0
1 1 0 1 0 0 0 0 1 0 1 0 1 0 1 1
MSB
0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0
1 1 0 1 1 0 0 1 0 0 0 0 0 0 0 1
0 1 0 1 1 1 0 0 1 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
LSB
reordering
00100000010000001101100100000001010111001001101011
01000010101011
00100000010000001101100100000001010111001001101011
01000010101011
run-length coding
Enhancement layer bitstream
78FGS Video Decoder Structure
79Binary Shape Encoder
80Padding