Title: Concepts of Multimedia Processing and Transmission
1Concepts of Multimedia Processing and Transmission
- IT 481, Lecture 1
- Dennis McCaughey, Ph.D.
- 28 August, 2006
2Outline
- Course Description
- Instructor
- Student Survey
- Exams, Homework and Project
- Grading
- General Policies
- Lecture Schedule
3Course Description
- Topics
- The fundamentals of signal and image processing,
including algorithms for signal processing that
have applications to multimedia - Techniques for voice coding and recognition, CD
and DVD technology, streaming video, WANs and
LANs, and videoconferencing technology - Text Multimedia Communication Systems
Techniques, Standards, and Networks, K. R. Rao,
Zoran S. Bojkovic, Dragorad A. Milovanovic,
Prentice Hall PTR 1st edition (April 26, 2002),
ISBN 013031398X.
4Instructor
- Dennis McCaughey
- Contact Information
- 703-263-7425 (Office)
- 703-624-6830 (Cell)
- dgm_at_rincon.com (e-mail)
- Office Hours one hour before class
- Background
- PhD in EE University of Southern California 1977
- Thesis Degrees of Freedom for Projection Imaging
5Student Survey
- Name
- Contact Information
- Last Degree along with current Degree Objective
i. e. - Undergrad seeking Bachelors, Grad seeking
MS/PhD, Other - Mathematical Background
- Calculus?
- Differential Equations?
- Linear Algebra?
- Probability, Statistics, Random Processes?
6Student Survey Contd
- Systems Background
- Linear Systems?
- Signal Processing
- Image processing
- Programming Languages
- C or C?
- MATLAB?
7Exams, Homework and Project
- Mid-Term 1 Hour Closed Book
- Cover the key topics covered in class and
homework - Final Format To Be Determined
- Homework 1) Reading assignments, 2) Written
answers to selected questions based on reading
assignments, 3) Some limited math problems - Project Format (Preliminary) MATLAB
implementation of a multimedia processing
application.
8More on the Project
- A course project will be required exploring
aspects of multimedia signal processing which may
computer based using MATLAB. - Project topics will be of the students choice
subject to review by the instructor. - Each student will also be required to present a
short briefing on the results. - Projects will be evaluated on the content of the
presentation and not on the briefing itself. - Details regarding topics, content, and format
will be provided during the course.
9Grading
- The final grade will be determined by a weighted
average of the homework assignments, a mid-term
exam, a final exam and a project
Homework 10
Mid-Term 20
Project 30
Final 40
10General Policies
- Collaboration
- Students are permitted and encouraged to
collaborate on homework assignments. - All graded work, however, must be the original
effort of the student submitting the paper. - Homework
- Homework will be collected at the beginning of
each class period. Note Late homework will be
accepted provided the reason for the delay is
coordinated with the instructor within 2 days of
its assignment. Homework solutions will be
discussed in class. - Make-up Exams
- Make-up exams will not be given unless detailed
written clarification accompanied by
documentation for the absence is provided. If
this information is not provided an F grade will
be given for the exam. The location and time for
a make-up exam will be decided by the instructor.
Also, students are expected to be in class and
on-time for every class.
11Lecture Schedule (Preliminary)
Week Date Chapter Topic Reading Homework
1 8/28 1, 2 Lecture 1 Introduction to Multimedia Communications 4
2 9/11 4 Lecture 2 Networks and Multimedia Applications 3
3 9/18 3 Lecture 3 Signal Processing Fundamentals 3
4 9/25 3 Lecture 4 Audio Coding MATLAB Tutorial 3
5 10/2 3 Lecture 5 Video Coding 1 3
6 10/9 3 Lecture 6 Video Coding 2 Review
7 10/17 1-4 Mid-Term Exam Project Review
8 10/30 5 Lecture 7 MPEG-1 5
9 11/6 5 Lecture 8 MPEG-2 5
10 11/13 5 Lecture 9 MPEG-4 5
11 11/20 Lecture 10 MPEG-4, MPEG-7, MPEG-21
12 11/27 6 Lecture 11 Audio and video streaming 6
13 12/4 Lecture 12
14 12/11 Final Exam Review 6
15 12/18 Final Exam 5-6
12Multimedia Communications
13What is Multimedia?
- Multimedia is a combination of text, art, sound,
animation, and video.
Slide Courtesy, Hung Nguyen
14Multimedia Components Simplified
- Multimedia can be viewed as they combination of
audio, video, data and how they interact with the
user (more than the sum of the individual
components)
15Background
- Fast paced emergence in applications in medicine,
education, travel etc - Characterized by large documents that must be
communicated with short delays - Glamorous applications such as distance learning,
video teleconferencing - Applications that are enhanced by Video are often
seen as driver for development of multimedia
networks
16Forces Driving Communications That Facilitate
Multimedia Communications
- Evolution of communications and data networks
- Increasing availability of almost unlimited
bandwidth demand - Availability of ubiquitous access to the network
- Ever increasing amount of memory and
computational power - Sophisticated terminals
- Digitization of virtually everything
17New Information System Paradigm
Slide Courtesy, Hung Nguyen
18Elements of Multimedia Systems
- Two key communication modes
- Person-to-person
- Person-to-machine
Slide Courtesy, Hung Nguyen
19Multimedia Networks
- The world has been wrapped in copper and glass
fiber and can be viewed as a hair ball with
physical, wireless and satellite entry/exit
points. - Physical LAN-WAN connections
- Wireless Cellular telephony, wireless PC
connectivity - Satellite INMARSAT, THURYA, ACeS etc
20Multimedia Communication Model
- Partitioning of information objects into distinct
types, e.g., text, audio, video - Standardization of service components per
information type - Creation of platforms at two levels network
service and multimedia communication - Define general applications for multiple use in
various multimedia environments - Define specific applications, e.g. e-commerce,
tele-training, using building blocks from
platform and general applications
21Requirements
- User Requirements
- Fast preparation and presentation
- Dynamic control of multimedia applications
- Intelligent support to users
- Standardization
- Network Requirements
- High speed and variable bit rates
- Multiple virtual connections using the same
access - Synchronization of different information types
- Suitable standardized services along with support
22Network Requirements
- ATM-BISDN and SS7 have enabled the switching
based communications capabilities over the PSTN
that support the necessary services - ATM-BISDN-SS7 will evolve to all optical
switchless networks based on packet transfer
23Packet Transfer Concept
- Allows voice, video and data to be dealt with in
a common format - More flexible than circuit switching which it can
emulate while allowing the multiplexing of varied
bit rate data streams - Dynamic allocation of bandwidth
- Handle Variable Bit Rate (VBR) directly
24Considerations
- Buffering required for constant bit rate data
such as audio - Re-sequencing and recovery capabilities must be
provided over networks where packets may be
received either in an order different from that
transmitted or dropped - In an ATM network some packets can be dropped
while others may not (i.e. voice vs bank transfer
data packets) - Optimum packet lengths for voice video and data
differ in an ATM network - IP packets over the internet may arrive in a
different order or be dropped.
25Digital Video Signal Transport
- Decoder
- De-quantization
- Entropy decode
- Inv Trans
- Loss conceal
- Post process
- Encoder
- Transformation
- Quantization
- Entropy Coding
- Bit-Rate Control
- Application
- Data Structuring
Network Multiplexing/Routing
Video
Users
- Error detection
- Loss detection
- Error correction
- Erasure correction
26Quality of Service (QoS)
- The set of parameters that defines the properties
of media streams - Can define four QoS layers
- User QoS Perception of the multimedia data at
the user interface (qualitative) - Application QoS Parameters such as end-to-end
delay (quantitative) - System QoS Requirements on the communications
services derived from the application QoS - Network QoS Parameters such as network load and
performance
27Audio-Visual Integration
28Importance of Interaction
- Multimedia is more than the combination of text,
audio, video and data - Interaction among media is important
- Consider a poorly dubbed movie
- Audio not synchronized with video
- Lip movements inconsistent with language
- Audio dynamic range inconsistent with the scene
Slide Courtesy, Hung Nguyen
29Media Interaction
Compression Synthesis 3D Sound
Audio
Lip synch Face Animation Joint A/V Coding
Speech Recognition Text-to-Speech
Multimedia
Text
Image Video
Sign language Lip reading
Compression, Graphics Database indexing/retrieval
Translation Natural language
Slide Courtesy, Hung Nguyen
30Bimodality of Human Speech
- Human speech is produced by vibration of the
vocal cord, configuration of the vocal tract with
muscles that generate facial expressions
Audio Visual ? Perceived
ba ga da
pa ga ta
ma ga na
Slide Courtesy, Hung Nguyen
31Basic Definitions
- The basic unit of acoustic speech is called a
phoneme - In the visual domain, the basic unit of mouth
movement is called viseme - A viseme is the smallest visibly distinguishable
unit of speech - Can contain several phonemes and thus form one
viseme group - A many-to-one mapping between phonemes and visemes
Slide Courtesy, Hung Nguyen
32Lip Reading System
- Application to support hearing-impaired person
- People learn to understand spoken language by
combining visual content with lexical, syntactic,
semantic and programmatic information - Automated lip reading systems
- Speech recognition possible using only visual
information - Integrated with speech recognition systems to
improve accuracy
Slide Courtesy, Hung Nguyen
33Lip Synchronization
- Applications
- In VTC (video teleconferencing) where video frame
is dropped (low bandwidth requirement) but audio
must still be continuous - In non-real-time use such as dubbing in studio
where recorded voice full of background noise - Time-warping commonly used in both audio and
video modes - Time-frequency analysis
- Video time-warping could be used for VTC
- Audio time-warping could be used for dubbing
Slide Courtesy, Hung Nguyen
34Lip Tracking
- To prevent too much jerkiness in the motion
rendering and too much loss in lip
synchronization - Involved real-time analysis on 3-dimensional of
the video signal plus one temporal dimension - Produce meaningful parameters
- Classification of mouth images into visemes
- Measures of dimension, e.g. mouth widths and
heights - Analysis tools Fourier Transform,
Karhunen-Loeve Transform (KLT), Probability
Density Function (pdf) Estimation
Slide Courtesy, Hung Nguyen
35Audio-to-Visual Mapping for Lip Tracking
- Conversion of acoustic speech to mouth shape
parameters - A mapping of phonemes to visemes
- Could be most precisely implemented with a
complete speech recognizer followed by a look-up
table - High computational overhead plus table look-up
complexity - Do not need to recognize spoken word to achieve
audio-to-visual mapping - Physical relationships exist between vocal tract
shape and sound produced ? functional
relationships exist between speech and visual
parameters
Slide Courtesy, Hung Nguyen
36Classification-Based Conversion Approaches for
Lip Tracking
- Two-step process
- Classification of acoustic signal using VQ
(vector quantization), HMM (hidden Markov model)
and NN (neural network) - Mapping of the acoustic classes into
corresponding visual outputs, then averaged to
get centroid - Shortcomings
- Error resulting from averaging visual vector to
get visual centroid - Not a continuous mapping finite output levels
Slide Courtesy, Hung Nguyen
37Classification-Based Conversion
Slide Courtesy, Hung Nguyen
38Audio and Visual Integration for Lip Reading
Applications
- Three major steps
- Audio-visual pre-processing Principal Component
Analysis (PCA) has been used for feature
extraction - Pattern recognition strategy (HMM, NN,
time-warping) - Integration strategy (decision making)
- Heuristic rules to incorporate knowledge of
phonemes about the two modalities - Combination of independent evaluation score for
each modalities
Slide Courtesy, Hung Nguyen
39Application in Biometrics Bimodal Person
Verification
- Existing methods for person verification are
mainly based on a single modality which would
have limitation in security and robustness - Audio visual integration using a camera and
microphone makes person verification a more
reliable product
Slide Courtesy, Hung Nguyen
40Joint Audio-Video Coding
- Correlation between audio and video can be used
to achieve more efficient coding - Predictive coding of audio and video information
used to construct estimate of current frame
(cross-modal redundancy) - Difference between original and estimated signal
can be transmitted as parameters - Decision on what and how to send is based on Rate
Distortion (R-D) criteria - Reconstruction done at receiver according to
agreed-upon decoding rules
Slide Courtesy, Hung Nguyen
41Cross-Model Predictive Coding
Visual Analysis
Parameter X
Decision Module (R-D)
Nothing
Parameter X
A-to-V Mapping
Slide Courtesy, Hung Nguyen
42Applications of Multimedia
- Business - Business applications for multimedia
include presentations training, marketing,
advertising, product demos, databases,
catalogues, instant messaging, and networked
communication. - Schools - Educational software can be developed
to enrich the learning process.
Slide Courtesy, Hung Nguyen
43Applications of Multimedia
- Home - Most multimedia projects reach the homes
via television sets or monitors with built-in
user inputs. - Public places - Multimedia will become available
at stand-alone terminals or kiosks to provide
information and help.
Slide Courtesy, Hung Nguyen
44Compact Disc Read-Only (CD-ROM)
- CD-ROM is the most cost-effective distribution
medium for multimedia projects. - It can contain up to 80 minutes of full-screen
video or sound. - CD burners are used for reading discs and
converting the discs to audio, video, and data
formats.
Slide Courtesy, Hung Nguyen
45Digital Versatile Disc (DVD)
- Multilayered DVD technology increases the
capacity of current optical technology to 18 GB. - DVD authoring and integration software is used to
create interactive front-end menus for films and
games. - DVD burners are used for reading discs and
converting the disc to audio, video, and data
formats.
Slide Courtesy, Hung Nguyen
46Multimedia Communications
- Multimedia communications is the delivery of
multimedia to the user by electronic or digitally
manipulated means.
Slide Courtesy, Hung Nguyen