Content based multimedia Signal processing

About This Presentation

Title:

Content based multimedia Signal processing

Description:

Breaking media into atomic units. Reverse engineer media capture and editing process ... 1998 ABC/CNN news C-span. 13 videos, 4.9 GB. 24 groups participate. 4 tasks ... – PowerPoint PPT presentation

Number of Views:485

Avg rating:3.0/5.0

Slides: 78

Provided by: YuHe8

Category:

more less

Transcript and Presenter's Notes

Title: Content based multimedia Signal processing

1
Content based multimedia Signal processing
Feng-Chia University Augus 11t , 2004

Yu Hen Hu
University of Wisconsin Madison
Dept. Electrical and Computer Engineering
Madison, WI 53706
hu_at_engr.wisc.edu

2
Outline

Content-based Multimedia Signal Processing
Content
Multimedia signal processing
Potential Applications of CBMSP
MM representation,
MPEG-7
Object based organization
Syntactic structure
Semantic structure

MM database
Query
User profile
Relevance feedback
Similarity measure
Personalized MM services
Filtering
Authoring/composing
Intelligent surveillance
Object recognition
Action recognition
Event recognition
Content-assisted MSP
Post-processing
coding

3
What is Content?

(digital) Content the syntactic and semantic
information inherent in a digital material.
Examples text document, say email
Syntactic content headers, fields, protocols
Semantic content key words, subject, types of
emails (information/expression, etc)
Example multimedia documents, movie clips
Syntactic content scene cuts, shots,
Semantic content motion, summary, index,
caption, etc.

4
Content-based Multimedia Information Processing
(CBMIP)

Why we need to know content?
Information processing, in terms of creation,
archiving, indexing, delivering, accessing and
other processing, require in-depth knowledge of
content to optimize the performance.
What is CBMIP?
Use content information to enable personalized,
intelligent processing of multimedia information

5
Current CBMIP Research

Structure Analysis
Shots, scene cut detection, etc.
Breaking media into atomic units
Reverse engineer media capture and editing
process
Object processing
Breaking individual frame into objects
Object based content description
Foreground/background separation

Media skim generation
Enable browsing with light-weight devices
Summary, fast forward, preview generation
Semantic content exploited to match user needs
Semantic annotation
Recognize generic semantic content
Story telling by computer
Appearance, situation, scene understanding
Detection, classification, tracking of object,
events

Chang, S. F., IEEE Multimedia Magazine, 2002
6
CBMIP Research That Makes Sense

Producing meta-data that are not readily
available
If editing status are already recorded, there is
no need for reverse-engineering
Producing meta-data that human operator is
difficult in generating
E.g. low level features, texture, etc.
Quantified measurements
Annotating content of large volume and low
individual values
News archives, etc. things that accumulate and
usefulness are not known in advance

Chang, S. F., IEEE Multimedia Magazine, 2002
7
CB Multimedia Information Retrieval

Multimedia catalog indexing and shopping
Color and texture matching
Dinning set, wall paper, carpet, drapery, cloth,
etc.
Paintings, stock photos
3D shape matching
Tools, nails, bolts
Vase, decoration, furniture
Melody
Karaoke

Imitation detection for IP protection
Logos, trade marks,
Jewelries, Paintings, art work
Songs, poems, articles

8
CB Multimedia Surveillance

Purposes
Detect intrusion and illegal action so it can be
stopped on the spot
Preserve relevant information for future
references
Requirements
Understanding specific content of video/audio
Semantically meaningful recording and compression
of data
CB indexing and summary for easy retrieval of
archived information

9
Surveillance Applications

Traffic
Accidents, congestion, dangerous behavior
Security
Airport, train station, public buildings
Shopping mall, dept store
Defense
Border patrol
Health care
Home care
Life sign monitoring

Smart living space
Monitoring human activities and take appropriate
actions
Emotion, gesture, voice, speech, body movement
recognition
Suitable for monitoring tasks that have
Large area
Many object presents
Prolonged durations

10
CB Multimedia Authoring

Given home videos, raw video clips,
Perform laborious task for human author
Segment video into shots and scenes
Annotate individual shots with semantic
description
Automatic generation of draft of closed caption
Index individual shots with meta data
Assist user in composing
Story board
Associate shots with script
Formatting individual shots
Length stretching, shortening
Transition
Content manipulation
E.g. remove that unwanted person in the
background.

11
MPEG-7 Overview

Objective
Provide inter-operability among systems and
applications used in generation, management,
distribution, and consumption of audio-visual
content description.
Help user to identify, retrieve, or filter
audio-video information.

Requirement of Content Descriptors
Object oriented multilevel abstraction
Generic applications
Effective, comprehensive, flexible, extensible,
scalable, and simple
Use XML (extensible markup language)

12
Potential Application of MPEG-7

Summary,
Generation of multimedia program guide or content
summary
Generation of content description of A/V archive
to allow seamless exchange among content creator,
aggregator, and consumer.
Filtering
Filter and transform multimedia streams in
resource limited environment by matching user
preference, available resource and content
description.

Retrieval
Recall music using samples of tunes
Recall pictures using sketches of shape, color
movement, description of scenario
Recommendation
Recommend program materials by matching user
preference (profile) to program content
Indexing
Create family photo or video library index

13
Content descriptions

Descriptors
MPEG-7 contains standardized descriptors for
audio, visual, generic contents.
Standardize how these content features are being
characterized, but not how to extract.
Different levels of syntax and semantic
descriptions are available

Description Scheme
Specify the structure and relations among
different A/V descriptors
Description Definition Language (DDL)
Standardized language based on XML (eXtended
Markup Language) for defining new Ds and DSs
extending or modifying existing Ds and Dss.

14
Content descriptions

Descriptors
MPEG-7 contains standardized descriptors for
audio, visual, generic contents.
Standardize how these content features are being
characterized, but not how to extract.
Different levels of syntax and semantic
descriptions are available

Description Scheme
Specify the structure and relations among
different A/V descriptors
Description Definition Language (DDL)
Standardized language based on XML (eXtended
Markup Language) for defining new Ds and DSs
extending or modifying existing Ds and Dss.

15
Visual Color Descriptors

Color space HSV (hue-saturation-value)
Scalable color descriptor (SCD) color histogram
(uniform 255 bin) of an image in HSV encoded by
Haar transform.
Color layout descriptor
spatial distribution of color in an arbitrarily
shaped region.

Dominant color descriptor (DCD)
colors are clustered first.
Color structure descriptor (CSD)
scan 8x8 block in slide window, and count
particular color in window.
Group of Frame/Group of Picture color descriptor

16
Visual Texture Descriptor

Texture Browsing D.
Regularity
0 irregular 3 periodic
Directionality
Up to 2 directions
1-6 in 30O increment
Coarseness
0 fine 3 coarse
Edge histogram D.
16 sub-images
5 (edge direction) bins/sub-image

Homogeneous Texture D. (HTD)
Divide frequency space into 30 bins (5 radial, 6
angular)
2D Gabor filter bank applied to each bin
Energy and energy deviation in each bin computed
to form descriptor.

17
Visual Shape Descriptor

3D Shape D. Shape spectrum
Histogram (100 bins, 12bits/bin) of a shape
index, computed over 3D surface.
Each shape index measures local convexity.
Region-based D. Art
Angular radial transform
Shape analysis based on moments
ART basis
Vnm(?, ?) exp(jm?)Rn(?)
Rn(?) 2 cos(n??) n ?0
1 n 0

Contour based shape descriptor
Curvature scale space (CSS)
N points/curve, successively smoothed by 0.25
0.5 0.25 till curve become convex.
Curvature at each point form a curvature at that
scale.
Peaks of each scale are used as feature
2D/3D descriptors
Use multiple 2D descriptors to describe 3D shape

18
Visual Motion Descriptor
Motion region

Motion activity D.
Intensity
Direction of activity
Spatial distribution of activity
Temporal distribution of activity
Camera motion
Panning
Booming (lift up)
Tracking
Tilting
Zooming
Rolling (around image center)
Dollying (backward)

Video segment
trajectory
Mosaic
Camera motion
Parametric motion
Warping parameter
Motion activity

Warping (w.r.t. mosaic)
Motion trajectory

19
MPEG-7 Audio Content Descriptors

Spoken content Ds
Speaker type
Link type
Extraction info type
Confusion info type
Timbre Ds
Instrument
Harmonic instrument
Percussive instrument
Melody contour Ds
Contour
Meter
beat

4 classes of audio signals
Pure music
Pure speech
Pure sound effect
Arbitrary sound track
Audio descriptors
Silence Ds silencetype
Sound effect Ds
Audio Spectrum
Sound effect features

20
Spoken content description
Speech waveform
Audio processing
MPEG-7 Encoder

Spoken content Header
Word lexicon (vocabulary)
Phone lexicon
IPA (international phonetic association.
Alphabet)
SAMPA (speech assessment method phonetic
alphabet)
Phone confusion statistics
Speaker
Spoken content lattice (word or phone)
Lattice Node
Word and phone link

ASR

Goal To support potentially erroneous decoding
extracted using an automatic speech recognition
system for robust retrieval.

lattice
?
?
Header
IS P0.7
BORE P0.6
lattice
HIS P0.3
21
Multimedia Content Analysis (1)

What to do with the content features?
Structure analysis
Parsing multimedia object into pre-defined
structures
Automated organization of information
Sometimes a reverse engineering task
Example
Parse email into parts
Parse video into shots, scene, etc.
Parse song into paragraph, sentence

Method
Detecting syntactic, semantic discontinuity
Fitting into pre-defined meta structure
Summary and Skimming
Examples
thumnails of an image,
Preview of a movie
Abstract of an article
Challenge
Extract semantically important information

22
Multimedia Content Analysis (2)

Information filtering
Examples
Junk email blocking
Personalized TV programming
Challenges
Matching users preference to content
Information retrieval
Example
Find similar picture, songs
Challenges
Similarity measures

Object and event recognition
Examples
Face recognition
Event detection
Object tracking
Challenges
Temporal and spatial features
Multi-modality features

23
SBD Problem

Shots
In professional video, a shot is often taken
while the camera is stationary or in a regular
movement
The on and off of the camera defines a shot.
Hence, shot boundaries may be recorded during
video acquisition.
A shot is also the basic unit of video editing.
Hence shot boundaries may be recorded during
editing process

Why SBD then?
Process existing archives of news, TV programs
where shot information is not available
Process raw video clips where shots recording may
not be the best unit to manipulate video

24
Shot Boundary Detection

Shot
A sequence of frames captured by one camera in a
single continuous action in time and space
Shot boundary detection
Temporal segmentation of video
Types
Cut abrupt changes
Gradual Fade-in, fade-out, dissolve

Methods
Statistical change detection of multi-dimensional
time series
Content features
Color histogram
Edge pixels
Dominant motion vectors and residue errors
Feature points extraction
Distance measure
Feature dependent
May not be in a norm vector space

25
SBD Methods

Features
Intensity Abs difference of intensity
Robust intensity of pixel intensity differ
more than a threshold
Motion Amount of motion between corresponding
blocks
Color histogram
Edges numbers and positions

Graduate shot transition
Dissolve, fade, wipe, etc.
Extend over multiple frames
Discontinuity between two shots will be separated
farther.
One way is to model transition as a separate
shots
Need to model how features characterizing
different shots vary for different types of
transition effects.

26
Shot Boundary Detection Example

Dissolve detection
L1 norm of 1st 2nd derivatives of image
Flash
Sudden intensity increase over 1-2 frames

Cut detection
motion-compensated image difference

http//www-nlpir.nist.gov/projects/tvpubs/papers/c
lipsimag.paper.pdf
27
2003 TREC-VID Results

TREC-VID
Text retrieval conference
Video-retrieval InDep. Evaluation
funded by ARDA/NIST
2003 test
133 hr MPEG-I video
1998 ABC/CNN news C-span
13 videos, 4.9 GB
24 groups participate

4 tasks
Shot boundary detectioin
High-level feature extraction (17)
Story segmentation and classification
Search (manual and interactive)

http//www-nlpir.nist.gov/projects/trecvid/
28
TRECVID 2003 Shot Boundary Detection

Tasks
Identify video shot transitions
Classify each identified transition into
Cut
Dissolve
Fadeout/in
Others
596,604 frames are used for the SBD task in 2003
that contains 3734 shots

Ground truth
Manually created
3734 of them
70 cut
20 dissolve
4 fades
6 others
Performance criteria
Precision
of correct transitions/ of transitions
reported
Recall
of correct transitions/ of transitions in
ground truth

29
TRECVID 03 Results for Cuts (Zoomed)
http//www-nlpir.nist.gov/projects/tvpubs/papers/t
v3.overview.slides.pdf
30
TRECVID 03 Results for Graduate Transitions
http//www-nlpir.nist.gov/projects/tvpubs/papers/t
v3.overview.slides.pdf
31
Observations

Most techniques are based on frame- frame
comparisons, some with sliding windows
Comparisons are based on color and on luminance,
mostly
Some use adaptive thresholding, some dont
Most operate on decoded video stream
Some have special treatment of motion during GTs,
of flashes, of camera wipes
Performances are getting better

http//www-nlpir.nist.gov/projects/tvpubs/papers/t
v3.overview.slides.pdf
32
Key frame Selection

Key-frame
A typical frame of a shot,
Representing a salient content feature
May have more than one key-frame per shot
Method
Fixed e.g. First, middle frame of each shot
Frames closest to cluster center of a shot
sequence
New key frame is generated if the video
complexity of the current frame deviate from key
frame more than a threshold
Minima of motions
Future directions
Exploit semantic information recognize face,
activities
Higher level syntactic structures video editing
effects (switching, split frame, etc.)

33
Key Frame Selection Methods

Approach 1
Assume video complexity is a time function to be
approximated by piece-wise linear approximation.
Key frames are knot points

Approach 2
Cluster video frames
Close in time indices
Similar in content
Question How many key frames are needed?

34
Key Frame Selection Example

Perceived Motion Energy (PME)
Product of fraction of dominant motion vector and
magnitude of motion vectors
Relate to amount of motion perceived by viewer
Often show multiple triangles w.r.t. time for a
video
Peak of triangle is key frame

Liu, et al, IEEE Trans. CSVT, Oct 2003
35
Content-based Retrieval
Input Module
Feature Database
Feature extraction
Image Database
Multimedia data
36
Multimedia CBR System Design Issues

Requirement analysis
How the multimedia materials are to be used
Determines what set of features are needed.
Archiving
How should individual objects are stored?
Granularity?
Indexing (query) and retrieving
With multi-dimensional indices, what is an
effective and efficient retrieval method?
What is a suitable perceptually-consistent
similarity measure?
User interface
Modality? Text or spoken language or others?
Interactive or batch? Will dialogue be available?

37
Indexing and Retrieving

Index
A very high dimensional binary vector
Encoding of content features
Text-based content can be represented with term
vectors
A/V content features can be either Boolean
vectors or term vectors

Retrieval
Retrieval is a pattern classification problem
Use index vector as the feature vector
Classify each object as relevant and irrelevant
to a query vector (template)
A perceptually consistent similarity measure is
essential

38
Term Vector Query

Each document is represented by a specific term
vector
A term is a key-word or a phrase
A term vector is a vector of terms. Each
dimension of the vector corresponding to a term.
Dimension of a term vector total number of
distinct terms.
Example
Set of terms tree, cake, happy, cry, mother,
father, big, small
document Father gives me a big cake. I am so
happy, mother planted a small tree
Term vectors 0, 1, 1, 0, 0, 1, 1, 0, 1, 0,
0, 0, 1, 0, 0, 1

39
Inverse Term Frequency Vector

A probabilistic term vector representation.
Relative Term Frequency (within a document)
tf (t,d) count of term t / of terms in
document d
Inverse document Frequency
df(t) total count of document/ of doc
contain t
Weighted term frequency
dt tf(t,d) log df(t)
Inverse document frequency term vector D d1,
d2,

40
ITF Vector Example

Document 1 The weather is great these days.
Document 2 These are great ideas
Document 3 You look great
Eliminate The, is, these, are, you

41
Content-based Querying

Keywords
Most natural for user
Semantic abstraction of content
Meta data
Ontology needed
Same thing, same name
Examples
Convenient for properties that are difficult to
put into words
Texture, shape, color
Require at least one good example
Need to specify features that need to be matched.

Icon
Good to express spatial relations
Limit in vocabulary, need to search for
appropriate icon may become a retrieval problem
itself
Example composite picture of suspect
Sketch
Needs skills
Non-standards, need to recognize sketch first
Suitable for shape features

42
User Models

User Profiles
Categorize users using features relevant to tasks
Static features age, sex, etc.
Dynamic features activity logs, etc.
Derived features skill levels, preferences, etc.
Use of Profiles for HCI
Adaptation Customize HCI for different category
of users
Better understanding of users needs

43
Content-Based Visual Query (1)

Advantage
Ease of creating, capturing and collecting
digital imaginary
Approaches
Extract significant features (Color, Texture,
Shape, Structure)
Organize Feature Vectors
Compute the closeness of the feature vectors
Retrieve matched or most similar images

44
Content-Based Visual Query (2)Improve Efficiency

Keyword-based search
Match images with particular subjects and narrow
down the search scope
Clustering
Classify images into various categories based on
their contents
Indexing
Applied to the image feature vectors to support
efficient access to the database

45
Conceptual structure of the meta-search database.
46
Object and Event Recognition

High level understanding of the context of the
multimedia object.
Most useful, yet most challenging!
Relates to vision research, object recognition,
speech recognition, emotion computing,
Current research directions
Face recognition
Event detection
Story segmentation

47
Face Recognition

Challenge tasks
Lighting, shade variations, Pose variations, Time
laps, Distance/resolution, Disguise, occlusion
Pre-processing
Face detection
3D model
Features
Eigen-face
Elastic branch net
3D features
Invariant features
Classifiers
Template matching, ML, Bayes, SVM, neural net,
fuzzy logic

Applications to CBMIP
Detecting all human faces or a particular human
face from image or video sequences
Identify all faces belong to the same person
Difficulties
Controlled
Semi-controlled
Uncontrolled.

48
Multi-AV-Sensor Surveillance Event Mining
http//www.ee.columbia.edu/dvmm/
49
CB Video Surveillance Architecture
Foresti, et al, IEEE Trans. MM, Dec. 2002
50
Abandoned Object Detection
Foresti, et al, IEEE Trans. MM, Dec. 2002
51
Event Detection
Foresti, et al, IEEE Trans. MM, Dec. 2002
52
Object Tracking
Foresti, et al, IEEE Trans. MM, Dec. 2002
53
Abandoned Object Detection
Foresti, et al, IEEE Trans. MM, Dec. 2002
54
Video Recurrent Event Modeling
HHMM Hierarchical Hidden Markov Model
http//www.ee.columbia.edu/dvmm/
55
Content-adaptive Video Streaming
Chang, S. F., IEEE Multimedia Magazine, 2002
56
Content-based Music Selection

Facts
Personal entertainment system can store thousands
of songs
Songs are published on-line, giving more
selections
User cant remember the title, melody or lyric of
each song he/she likes
Problems
User which song to listen to?
Musician how to sell a song to a buyer who likes
it

Three goals of a personalized music DJ
Repetition
Surprise
Exploitation of catalog
Combinatorial optimization problem
Given a set of criteria,
Select a finite number of songs out of a large
catalog
To satisfy given constraints

Pachet, et. al. IEEE Multimedia Magazine, Jan 2000
57
Personalized Music Disk Jockey

Listeners profile
Group profile
Age, etc.
Specified profile
Favorite and hated songs
Content features
Genre, tempo, musician, instruments, pub. dates,
length, etc.
Motive, chord, melody
Lyric, title (songs)
Popularity, prices
Similarity measures
Surprise dissimilar

Constraints
Based on user profile and query
Individual objects
sequences
Select music objects that meet the constraints
Generate the sequence from selected objects while
meeting sequence constraints
Stochastic optimization methods may be needed.

58
Semantic Analysis of Film
Edge detection on pace flow, and corresponding
story and event sections (Titanic)

Estimate camera pan/tilt parameters from video
frames
Estimate pace from camera motion

Exploit relations between pace boundaries and
story line boundaries

Adams, et al, IEEE Trans. Multimedia, Dec. 2002
59
Detection Results
60
ClassView

Hierarchical video shots classification,
indexing, and accessing
Addressing semantic gap between low-level video
features and high-level semantic concepts.
Goal
Classify unlabeled video into the semantic labels
so that they can be accessed in a semantically
meaningful way.

The concept (labels) is organized in a
tree-structure that is generated by human experts
with the help of WordNet (wordnet.princeton.edu)
Pre-labeled training samples provide examples
that relates low level content features to
semantic labels at various levels of the
classification tree.

Fan, et al, IEEE Trans. Multimedia, Feb.2004
61
Hierarchical Video Database Model
Fan, et al, IEEE Trans. Multimedia, Feb.2004
62
Video News Classification Tree
63
Video Cut Detection Results

Detected scene cuts
Color histogram difference at different thresholds

64
Bottom Up Procedure
65
Multimedia summary and filtering

Summary
Text email reading
Image caption generation
Video high-lights, story board
Issues
Segmentation
Clustering of segments
Labeling clusters
Associate with syntactic and semantic labels

Filtering
Same as retrieval filter out irrelevant objects
based on a given criterion (query)
Often need to be performed based on content
features
E.g. filtering traffic accidents or law
violations from traffic monitoring videos

66
Content based Coding and Post-processing

Different coding decisions based on low level
content features
coding mode (inter/intra selection)
motion estimation
Object based coding
Encoding different regions (VOP) separately
Using different coder for different types of
regions

Multiple abstraction layer coding
An analysis/synthesis approach
Synthesize low level contents from higher level
abstraction
E.g. texture synthesis
Content based post-processing
Identify content types and en synthesize low
level content

67
Conclusion

Issues related to content-based multimedia
information processing are surveyed.
Current focus is on low level content analysis
based on statistical approach.
Statistical analysis methods, especially fusion,
is reviewed.
High level knowledge based understanding should
be incorporated in CBMIP algorithms to further
advance the state of the art.

68
Statistical Tools for CBMIP

Hypothesis testing
Detection
Pattern classification
Pattern classifiers
MAP (Bayse) classifier
ML classifier based on mixture of Gaussian
Rule based, fuzzy logic
Decision tree
Clustering based LVQ
Linear classifier
With kernel SVM
Nearest neighbor

Multi-layer perceptron
Information fusion
Basically a pattern classification task
Decision fusion vs value fusion
The key
Feature selection!

69
MAP Maximum A Posteriori Classifier

The MAP classifier stipulates that the classifier
that minimizes pr. of mis-classification should
choose
g(x) c(i) if
P(c(i)x) P(c(i)x), i ? i.
This is an optimal decision rule.
Unfortunately, in real world applications, it is
often difficult to estimate P(c(i)x).

Fortunately, to derive the optimal MAP decision
rule, one can instead estimate a discriminant
function Gi(x) such that for any x ? X, i ? i.
Gi(x) Gi(x) iff
P(c(i)x) P(c(i)x)
Gi(x) can be an approximation of P(c(i)x) or any
function satisfying above relationship.

70
Maximum Likelihood Classifier

Use Bayes rule,
p(c(i)x) p(xc(i))p(c(i))/p(x).
Hence the MAP decision rule can be expressed as
g(x) c(i) if
p(c(i))p(xc(i)) p(c(i))p(xc(i)), i ? i.
Under the assumption that the a priori Pr. is
unknown, we may assume p(c(i)) 1/M. As such,
maximizing p(xc(i)) is equivalent to maximizing
p(c(i)c).

The likelihood function p(xc(i)) may assume a
uni-variate Gaussian model. That is,
p(xc(i)) N(?i,?i)
?i,?i can be estimated using samples from
xt(x) c(i).
A priori pr. p(c(i)) can be estimated as

71
Nearest-Neighbor Classifier

Let y(1), , y(n) ? X be n samples which
has already been classified. Given a new
sample x, the NN decision rule chooses g(x)
c(i) if
is labeled with c(i).
As n ??, the prob. Mis-classification using NN
classifier is at most twice of the prob.
Mis-classification of the optimal (MAP)
classifier.
k-Nearest Neighbor classifier examine the
k-nearest, classified samples, and classify x
into the majority of them.
Problem of implementation require large storage
to store ALL the training samples.

72
MLP Classifier

Each output of a MLP will be used to approximate
the a posteriori probability P(c(i)x) directly.
The classification decision then amounts to
assign the feature to the class whose
corresponding output at the MLP is maximum.
During training, the classification labels (1 of
N encoding) are presented as target values
(rather than the true, but unknown, a posteriori
probability)

Denote y(x,W) to be the ith output of MLP, and
t(x) to be the corresponding target value (0 or
1) during the training.
Hence y(x,W) will approximate E(t(x)x)
P(c(i)x)

73
Optimal Hyper-plane Linearly Separable Case

Optimal hyper-plane should be in the center of
the gap.
Support Vectors ? Samples on the boundaries.
Support vectors alone can determine optimal
hyper-plane.
Question How to find optimal hyper-plane?

For di 1, g(xi) wTxi b ? ?w ?
woTxi bo ? 1
For di ?1, g(xi) wTxi b ? ??w ? woTxi
bo ? ?1

74
Quadratic Optimization Problem Formulation

Given (xi, di) i 1 to N, find w and b such
that
?(w) wTw/2
is minimized subject to N constraints
di ? (wTxi b) ? 0 1 ? i ? N.
Method of Lagrange Multiplier

75
Formulating the Dual Problem

At the saddle point, we have
and , substituting these relations
into above, then we have the
Dual Problem

Maximize Subject to and ?i ? 0 for i 1,
2, , N.
Note
76
Inner Product Kernels
In general, if the input is first transformed via
a set of nonlinear functions ?i(x) and then
subject to the hyperplane classifier
Define the inner product kernel as
one may obtain a dual optimization problem
formulation as
Often, dim of ? (p1) dim of x!
77
Information Fusion

What is fusion?
Decision-making (hypothesis testing) and value
computation (estimation) based on two or more
information sources.
Eg. speech recognition based on audio,
lip-reading and gesture

Why fusion?
It is easier to process individual info sources
separately
Transmitting raw data to the fusion center too
costly
Types of fusion
Decision fusion vs value fusion
Stack generalization vs mixture of experts

78
Decision Fusion
?(d)
Fusion Center
d d1 d2 ? dK T Decision vector
Low data rate channel
Local Decisions
dK
d1
d2
Member Classifier K
Member Classifier 1
Member Classifier 2
? ? ?
High data rate channel
x
79
Decision Fusion Methods

Weighted linear combination methods
Assume dk independent
Majority voting
Weighted voting
Follow the leader

Stack generalization
Fusion pattern classification
Optimal decision fusion
Behavioral knowledge space Huang95
Table look up method
CODF
Complement ODF with a weighted linear combination
method
Handle better when there are few training samples

80
Optimal Decision Fusion

If dk has N labels, and K member classifiers,
there are at most NK different decision vectors
d.
A table assigning each of the NK d to a class
label may yield optimal decision fusion that
minimizes prob. Mis-classification.
When the table is constructed using training data
samples, this method is called the BKS method by
Huang and Seun Huang95.

Practical difficulties
Table too large when NK becomes large
Some entries have too few training samples
CODF
Complement the ODF with weighted linear
combination method.

81
Weighted linear combination methods

General solution
Perceptron learning problem
If not linearly separable, will not converge
Relax constraint
Replace the nonlinear step function by a
sigmoidal function or slop function
Support vector machine (SVM)
Back-propagation learning
Least square estimation

Majority voting
wk 1, ? K/2
Threshold voting
wk 1, ? ? K/2
Following the leader
wk 1, wk 0, k ? k, ? 0
k classifier has best performance

82
Mixture of Expert
? gi (x) 1, 0? gi (x) ?1
z(x) ? gi (x)di(x)

Gating Network
?
gK
?
?
g1
g2
d1
d2
dK
Member Classifier 1
Member Classifier 2
Member Classifier K

x
83
Mixture of Expert

Gating network,
locating at fusion center,
based on raw data ? communication resources
For each x, minimize ??T(x) ??kdk(x) gk(x) ??
subject to ?kgk(x) 1, and 0?gk(x)?1 ? gk(x)1
if dk(x) T(x).
Given gk(x) for all x in training set, dk(x) can
be determined to training the member classifier.
Iterative training using EM algorithm
Fix gk train dk
Fix dk train gk