Introduction to MPEG-4 - PowerPoint PPT Presentation

1 / 80
About This Presentation
Title:

Introduction to MPEG-4

Description:

Title: PowerPoint Presentation Author: JS Wang Last modified by: Jia-Shung Wang Created Date: 1/1/1601 12:00:00 AM Document presentation format: – PowerPoint PPT presentation

Number of Views:331
Avg rating:3.0/5.0
Slides: 81
Provided by: JSW77
Category:

less

Transcript and Presenter's Notes

Title: Introduction to MPEG-4


1
Introduction toMPEG-4
  • MC2008

2
Outline
  • Multimedia
  • MPEG-4 Profiles
  • Key Features of MPEG-4 Systems
  • MPEG-4
  • Systems
  • DMIF
  • Audiovisual Objects and Scene Graph
  • Editing, Composition and Rendering
  • Coding Basics
  • Coding Techniques

3
Multimedia
  • What is multimedia?
  • Combination of audio, video, image, graphic, and
    text.
  • Coverage of all human I/Os.
  • Why does multimedia need to be coded?

4
(No Transcript)
5
Multimedia Coding for Different Applications
  • Mobile devices
  • Low data-rate, error resilience, scalability
  • Streaming service
  • Scalability, low to medium data-range,
    interactivity
  • On-disk distribution (DVD)
  • Interactivity
  • Broadcast
  • On-demand services

6
Profiles in MPEG-4
  • Visual Profiles
  • Audio Profiles
  • Graphics Profiles
  • Scene Graph Profiles
  • MPEG-J Profiles
  • Object Descriptor Profile

7
NewPred
8
H.263 Baseline
9
Key Features of MPEG-4 Systems
  • Provides a consistent and complete architecture
    for the coded representation of the desired
    combination of streamed elementary audio-visual
    information.
  • Covers a broad range of applications,
    functionality and bit rates.
  • Through profile and level definitions, it
    establishes a framework that allows consistent
    progression from simple applications (e.g., an
    audio broadcast application with graphics) to
    more complex ones (e.g., a virtual reality home
    theater).

10
Key Features of MPEG-4 Systems (2)
  • A set of tools for the representation of the
    multimedia content
  • a framework for object description (the OD
    framework),
  • BIFS a binary language for the representation
    (format) of multimedia interactive 2D and 3D
    scene description,
  • SDM and SyncLayer a framework for monitoring and
    synchronizing elementary data stream, and
  • MPEG-J programmable extensions to access and
    monitor MPEG-4 content.

11
Key Features of MPEG-4 Systems (3)
  • MPEG-4 System defines an efficient mapping of the
    MPEG-4 content on existing delivery
    infrastructures.
  • FlexMux an efficient and simple multiplexing
    tool to optimize the carriage of MPEG-4 data
    (into different QoS channels),
  • Extensions allowing the carriage of MPEG-4
    content on MPEG-2 and IP systems, and a flexible
    file format for authoring, streaming and
    exchanging MPEG-4 data.

12
MPEG-4IS0/IEC 14496 Terminal Architecture
13
Systems
  • Timing Model
  • Buffer Model
  • Multiplexing of Streams
  • Synchronization of Streams
  • The Compression Layer
  • Object Description Framework
  • Scene Description Streams
  • Audio-visual Streams
  • Upchannel Streams

14
Systems Decoder Model
15
(No Transcript)
16
IS0/IEC 14496 Terminal Architecture
17
Network-based Multimedia System
18
The Objectives of DMIF
Delivery Multimedia Integration Framework
  • to hide the delivery technology details from the
    DMIF User
  • to manage real time, QoS sensitive channels
  • to allow service providers to log resources per
    session for usage accounting
  • to ensure interoperability between end-systems

19
(No Transcript)
20
DMIF Communication Architecture
signaling
21
High View of a Service Activation
22
Audiovisual Objects
  • Audiovisual scene is with objects
  • Mixed different objects on the screen
  • Visual
  • Video
  • Animated face body
  • 2D and 3D animated meshes
  • Text and Graphics
  • Audio
  • General audio mono, stereo, and multichannel
  • Speech
  • Synthetic sounds (Structured audio)
  • Environmental spatialization

23
Example of MPEG-4 Video Objects
From Olivier Avaro
24
(No Transcript)
25
The Scene Graph
26
  1. Composition
  2. Description Synchronization
  3. Delivery of streaming data
  4. Interaction with media objects
  5. Management and identification of intellectual
    property

27
Major Components
28
Composition Rendering
Media Objects
29
Adding or Removing Objects (1)



30
Adding or Removing Objects (2)
From Igor S. Pandžic
31
Adding or Removing Objects (3)
  • Applications
  • Video conferencing
  • Real-time, automatic
  • Separate foreground (communication partner) from
    background
  • Object tracking in video
  • May allow off-line and semi-automatic
  • Separate moving object from others

32
MPEG-4 Coding Basics
33
Toolbox Approach
tools for synthetic scenes
tools for natural scenes
TOOLS
ALGORITHMS
PROFILES
34
Coding Techniques
  • Video objects
  • Shape
  • Motion vectors
  • texture
  • Audio objects
  • MPEG
  • AAC (Advanced Audio Coder)
  • TTS (Text-To-Speech)
  • Face and Body
  • Animation parameters
  • 2D Mesh
  • Triangular patches
  • Motion vector

35
Content-based Audio-Visual Representation
  • Audio-Visual Object (AVO)
  • Video object component (video object plane, VOP)
  • natural or synthetic
  • 2D or 3D
  • Audio object component
  • mono, stereo or multi-channel

36
Video Object Planes (VOP)
  • Characteristics of VOP
  • may have different spatial temporal resolutions
  • may be associated with different degrees of
    accessibility ? sub-VOPs
  • may be separated or overlapping
  • VOP type
  • Traditional I, P, B type
  • S-VOP (Sprite) for background

37
Video Object Plane Type
S-VOP
Time
S-VOP
B-VOP
B-VOP
B-VOP
B-VOP
B-VOP
B-VOP
I-VOP
P-VOP
P-VOP
38
Content-based Object Manipulation
  • Object manipulation
  • change of the spatial position of a VOP
  • application of a spatial scaling factor to a VOP
  • change of the speed with which an VOP moves
  • insertion of new VOPs
  • deletion of an object in the scene
  • change of the scene area

39
Segmentation Process
  • Depending on applications, segmentation can be
    perform
  • Online (real-time) or offline (non-real-time)
  • Automatic or semi-automatic
  • Examples
  • Video conferencing
  • real-time, automatic
  • separate foreground (communication partner) from
    background
  • Object Tracking in Video
  • May allow off-line and semi-automatic
  • separate moving object from others

40
Compression
  • Improved coding efficiency
  • 5-64 kbps for mobile applications
  • up to 20Mbps for TV/film applications
  • subjectively better quality compared to existing
    standard
  • Coding of multiple concurrent data streams
  • can code multiple views of a scene
    efficiently,e.g. stereo video

41
Coding VO in MPEG-4
  • Reduce temporal redundancy
  • Motion estimation for arbitrary shaped VOPs
  • padding and modified block (polygon) matching
    motion estimation

P-VOP
B-VOP
time
I-VOP
42
Encoding of Visual Objects
  • Binary alpha block
  • Motion vector
  • Context-based arithmetic encoding
  • Texture
  • Motion vector
  • DCT

43
New Coding Features
  • For each macroblock, the motion vectors can be
    computed on a 16 ? 16 or 8 ? 8 block basis
  • Unrestricted motion estimation prediction can
    extend over image boundary
  • Overlapped block motion compensation
  • Each component of texture can range from 1 to 12
    bits
  • More robust coding

44
Robust Video Coding
  • Resynchronization
  • Allow insertion of resync marker within each VOP
  • Video packet header include macroblock number,
    qunatizer value and timing information
  • Data partition
  • Allow shape, motion and texture data to be
    separated within a packet
  • Reversible VLC
  • Offer partial recovery from errors.

45
Sprite VOP
  • Represent background image
  • Can be used for very efficient coding of scenes
    involving camera pan and zoom
  • Much larger than the size of image and thus
    require more memory

46
Example of Sprite VOP
47
Object Mesh
  • Useful for animation, content manipulation,
    content overlay, merging natural and synthetic
    video and others
  • Tesselate with triangular patches
  • Define motion vector for each node
  • 2D motion of video objects are represented by the
    motion vectors of the node points
  • Motion compensation is achieved by warping of
    texture map corresponding to patches by affine
    transform

48
Example of Object Mesh
49
Face Animation
  • Face model
  • Default face model
  • Download from the encoder
  • Low-level facial animation
  • A set of 66 facial animation parameters
  • High-level facial animation
  • A set of primary facial expression like joy,
    sadness, surprise and disgust
  • Speech animation
  • 14 visemes for mouth shape
  • Text-to-speech synthesizer

50
Facial Animation
From Eine Ãœbersicht
51
Still Texture Coding
  • Discrete Wavelet Transform (DWT)
  • Spatial and quality scalability
  • Use 2D Daubechies (9, 3)-tap biorthogonal filter
  • Lowest band is lossless coded by arithmetic
    coding
  • Higher bands are coded by multilevel
    quantization, zero-tree scanning and arithmetic
    coding

52
Audio Coding
  • Different bit-rates, different types of source
    material and different algorithms
  • Combination of parameter based coding, LPC-based
    coding, time/frequency based coding
  • High quality speech with 2 kbps Harmonic Vector
    eXcitation Coding (HVXC)
  • Text-to-Speech (TTS)

53
Natural Audio Coder
Telephone
From Olivier Dechazal
54
Multiview Video
55
Stereo Sequence Coding
  • Multiview profile of MPEG-2
  • Coding left view seqence Sl, first, for the right
    view sequence, each frame is predicated from the
    corresponding frame in Sl, based on an estimated
    disparity field and the prediction error image
    are coded.

P
B
B
B
Right view
I
B
B
P
Left view
56
Intermediate View Synthesis
57
Original left
Original right
Regular mesh on the left image
Corresponding mesh on the right image
Predictive right image by mesh (27.48 dB)
Predictive right image by BMA (32.03 dB)
The mesh-based scheme yields a visually more
accurate prediction
58
MPEG-4 Coding Techniques
Shape Coding Shape-adaptive DCT Object-based
Inter-frame Coding Overlapped Motion
Estimation Bit-plane Coding and FGS
59
Object-Based Coding
60
Shape Coding
  • Bitmap Coding
  • Context-Based Arithmetic Encoding (CAE)
  • Contour Coding
  • Chain Coding
  • Baseline Shape Coding
  • Polygon Approximation
  • Skeleton-Based Shape Coding
  • Quadtree Coding

61
Context-Based Arithmetic Encoding
16
16
Transparent block
Boundary blocks
Opaque block
BOUNDING BOX
Conditional entropy coding
62
Context-Based Arithmetic Encoding
16
16
Transparent block
Boundary blocks
Conditional entropy coding
Opaque block
BOUNDING BOX
63
Chain Coding
0
0
3
0
0
3
3
3
2
3
3
2
2
2
1
2
2
1
1
1
0
0
1
1
starting points
4
4 - connected
8 - connected
64
Chain Coding
starting points
4
4 - connected
8 - connected
65
Differential Chain Code
  • DCC records the move (forward, leftward or
    rightward) regarding two consecutive directional
    links.

F
F
F
R
L
F
R
L
66
Baseline Shape Coding
67
Polygon Approximation
d2
d1
d3
  • Select vertices that are optimal in the
    rate-distortion sense.
  • Splines are adopted to approximate the contour.

68
Skeleton-Based Shape Coding
69
Quadtree Coding
70
Shape-adaptive DCT
71
Inter-frame Coding Reconstruction of Object
Shape
MVS MVPS MVDS MVS MV for shape MVPS
predication MVDS difference (BAC)
72
The context for Inter-frame Coding
73
Overlapped Motion Estimation
74
Weighting Coefficients in Overlapped Motion
Estimation
75
Fine Granularity Scalable
76
FGS Video Encoder Structure
77
Bit-plane Coding
quantized residual
5 7 8 7 6 2 0 4 3 8 1 2 3 0 3 5
4 6 8 6 6 2 0 4 2 8 0 2 0 0 0 4
binary transfer
0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0
1 1 0 1 1 0 0 1 0 0 0 0 0 0 0 1
0 1 0 1 1 1 0 0 1 0 0 1 1 0 1 0
1 1 0 1 0 0 0 0 1 0 1 0 1 0 1 1
MSB
0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0
1 1 0 1 1 0 0 1 0 0 0 0 0 0 0 1
0 1 0 1 1 1 0 0 1 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
LSB
reordering
00100000010000001101100100000001010111001001101011
01000010101011
00100000010000001101100100000001010111001001101011
01000010101011
run-length coding
Enhancement layer bitstream
78
FGS Video Decoder Structure
79
Binary Shape Encoder
80
Padding
Write a Comment
User Comments (0)
About PowerShow.com