Title: Content based multimedia Signal processing
1Content based multimedia Signal processing
Feng-Chia University Augus 11t , 2004
- Yu Hen Hu
- University of Wisconsin Madison
- Dept. Electrical and Computer Engineering
- Madison, WI 53706
- hu_at_engr.wisc.edu
2Outline
- Content-based Multimedia Signal Processing
- Content
- Multimedia signal processing
- Potential Applications of CBMSP
- MM representation,
- MPEG-7
- Object based organization
- Syntactic structure
- Semantic structure
- MM database
- Query
- User profile
- Relevance feedback
- Similarity measure
- Personalized MM services
- Filtering
- Authoring/composing
- Intelligent surveillance
- Object recognition
- Action recognition
- Event recognition
- Content-assisted MSP
- Post-processing
- coding
3What is Content?
- (digital) Content the syntactic and semantic
information inherent in a digital material. - Examples text document, say email
- Syntactic content headers, fields, protocols
- Semantic content key words, subject, types of
emails (information/expression, etc) - Example multimedia documents, movie clips
- Syntactic content scene cuts, shots,
- Semantic content motion, summary, index,
caption, etc.
4Content-based Multimedia Information Processing
(CBMIP)
- Why we need to know content?
- Information processing, in terms of creation,
archiving, indexing, delivering, accessing and
other processing, require in-depth knowledge of
content to optimize the performance. - What is CBMIP?
- Use content information to enable personalized,
intelligent processing of multimedia information
5Current CBMIP Research
- Structure Analysis
- Shots, scene cut detection, etc.
- Breaking media into atomic units
- Reverse engineer media capture and editing
process - Object processing
- Breaking individual frame into objects
- Object based content description
- Foreground/background separation
- Media skim generation
- Enable browsing with light-weight devices
- Summary, fast forward, preview generation
- Semantic content exploited to match user needs
- Semantic annotation
- Recognize generic semantic content
- Story telling by computer
- Appearance, situation, scene understanding
- Detection, classification, tracking of object,
events
Chang, S. F., IEEE Multimedia Magazine, 2002
6CBMIP Research That Makes Sense
- Producing meta-data that are not readily
available - If editing status are already recorded, there is
no need for reverse-engineering - Producing meta-data that human operator is
difficult in generating - E.g. low level features, texture, etc.
- Quantified measurements
- Annotating content of large volume and low
individual values - News archives, etc. things that accumulate and
usefulness are not known in advance
Chang, S. F., IEEE Multimedia Magazine, 2002
7CB Multimedia Information Retrieval
- Multimedia catalog indexing and shopping
- Color and texture matching
- Dinning set, wall paper, carpet, drapery, cloth,
etc. - Paintings, stock photos
- 3D shape matching
- Tools, nails, bolts
- Vase, decoration, furniture
- Melody
- Karaoke
- Imitation detection for IP protection
- Logos, trade marks,
- Jewelries, Paintings, art work
- Songs, poems, articles
8CB Multimedia Surveillance
- Purposes
- Detect intrusion and illegal action so it can be
stopped on the spot - Preserve relevant information for future
references - Requirements
- Understanding specific content of video/audio
- Semantically meaningful recording and compression
of data - CB indexing and summary for easy retrieval of
archived information
9Surveillance Applications
- Traffic
- Accidents, congestion, dangerous behavior
- Security
- Airport, train station, public buildings
- Shopping mall, dept store
- Defense
- Border patrol
- Health care
- Home care
- Life sign monitoring
- Smart living space
- Monitoring human activities and take appropriate
actions - Emotion, gesture, voice, speech, body movement
recognition - Suitable for monitoring tasks that have
- Large area
- Many object presents
- Prolonged durations
10CB Multimedia Authoring
- Given home videos, raw video clips,
- Perform laborious task for human author
- Segment video into shots and scenes
- Annotate individual shots with semantic
description - Automatic generation of draft of closed caption
- Index individual shots with meta data
- Assist user in composing
- Story board
- Associate shots with script
- Formatting individual shots
- Length stretching, shortening
- Transition
- Content manipulation
- E.g. remove that unwanted person in the
background.
11MPEG-7 Overview
- Objective
- Provide inter-operability among systems and
applications used in generation, management,
distribution, and consumption of audio-visual
content description. - Help user to identify, retrieve, or filter
audio-video information.
- Requirement of Content Descriptors
- Object oriented multilevel abstraction
- Generic applications
- Effective, comprehensive, flexible, extensible,
scalable, and simple - Use XML (extensible markup language)
12Potential Application of MPEG-7
- Summary,
- Generation of multimedia program guide or content
summary - Generation of content description of A/V archive
to allow seamless exchange among content creator,
aggregator, and consumer. - Filtering
- Filter and transform multimedia streams in
resource limited environment by matching user
preference, available resource and content
description.
- Retrieval
- Recall music using samples of tunes
- Recall pictures using sketches of shape, color
movement, description of scenario - Recommendation
- Recommend program materials by matching user
preference (profile) to program content - Indexing
- Create family photo or video library index
13Content descriptions
- Descriptors
- MPEG-7 contains standardized descriptors for
audio, visual, generic contents. - Standardize how these content features are being
characterized, but not how to extract. - Different levels of syntax and semantic
descriptions are available
- Description Scheme
- Specify the structure and relations among
different A/V descriptors - Description Definition Language (DDL)
- Standardized language based on XML (eXtended
Markup Language) for defining new Ds and DSs
extending or modifying existing Ds and Dss.
14Content descriptions
- Descriptors
- MPEG-7 contains standardized descriptors for
audio, visual, generic contents. - Standardize how these content features are being
characterized, but not how to extract. - Different levels of syntax and semantic
descriptions are available
- Description Scheme
- Specify the structure and relations among
different A/V descriptors - Description Definition Language (DDL)
- Standardized language based on XML (eXtended
Markup Language) for defining new Ds and DSs
extending or modifying existing Ds and Dss.
15Visual Color Descriptors
- Color space HSV (hue-saturation-value)
- Scalable color descriptor (SCD) color histogram
(uniform 255 bin) of an image in HSV encoded by
Haar transform. - Color layout descriptor
- spatial distribution of color in an arbitrarily
shaped region.
- Dominant color descriptor (DCD)
- colors are clustered first.
- Color structure descriptor (CSD)
- scan 8x8 block in slide window, and count
particular color in window. - Group of Frame/Group of Picture color descriptor
16Visual Texture Descriptor
- Texture Browsing D.
- Regularity
- 0 irregular 3 periodic
- Directionality
- Up to 2 directions
- 1-6 in 30O increment
- Coarseness
- 0 fine 3 coarse
- Edge histogram D.
- 16 sub-images
- 5 (edge direction) bins/sub-image
- Homogeneous Texture D. (HTD)
- Divide frequency space into 30 bins (5 radial, 6
angular) - 2D Gabor filter bank applied to each bin
- Energy and energy deviation in each bin computed
to form descriptor.
17Visual Shape Descriptor
- 3D Shape D. Shape spectrum
- Histogram (100 bins, 12bits/bin) of a shape
index, computed over 3D surface. - Each shape index measures local convexity.
- Region-based D. Art
- Angular radial transform
- Shape analysis based on moments
- ART basis
- Vnm(?, ?) exp(jm?)Rn(?)
- Rn(?) 2 cos(n??) n ?0
- 1 n 0
- Contour based shape descriptor
- Curvature scale space (CSS)
- N points/curve, successively smoothed by 0.25
0.5 0.25 till curve become convex. - Curvature at each point form a curvature at that
scale. - Peaks of each scale are used as feature
- 2D/3D descriptors
- Use multiple 2D descriptors to describe 3D shape
18Visual Motion Descriptor
Motion region
- Motion activity D.
- Intensity
- Direction of activity
- Spatial distribution of activity
- Temporal distribution of activity
- Camera motion
- Panning
- Booming (lift up)
- Tracking
- Tilting
- Zooming
- Rolling (around image center)
- Dollying (backward)
Video segment
trajectory
Mosaic
Camera motion
Parametric motion
Warping parameter
Motion activity
- Warping (w.r.t. mosaic)
- Motion trajectory
19MPEG-7 Audio Content Descriptors
- Spoken content Ds
- Speaker type
- Link type
- Extraction info type
- Confusion info type
- Timbre Ds
- Instrument
- Harmonic instrument
- Percussive instrument
- Melody contour Ds
- Contour
- Meter
- beat
- 4 classes of audio signals
- Pure music
- Pure speech
- Pure sound effect
- Arbitrary sound track
- Audio descriptors
- Silence Ds silencetype
- Sound effect Ds
- Audio Spectrum
- Sound effect features
20Spoken content description
Speech waveform
Audio processing
MPEG-7 Encoder
- Spoken content Header
- Word lexicon (vocabulary)
- Phone lexicon
- IPA (international phonetic association.
Alphabet) - SAMPA (speech assessment method phonetic
alphabet) - Phone confusion statistics
- Speaker
- Spoken content lattice (word or phone)
- Lattice Node
- Word and phone link
ASR
- Goal To support potentially erroneous decoding
extracted using an automatic speech recognition
system for robust retrieval.
lattice
?
?
Header
IS P0.7
BORE P0.6
lattice
HIS P0.3
21Multimedia Content Analysis (1)
- What to do with the content features?
- Structure analysis
- Parsing multimedia object into pre-defined
structures - Automated organization of information
- Sometimes a reverse engineering task
- Example
- Parse email into parts
- Parse video into shots, scene, etc.
- Parse song into paragraph, sentence
- Method
- Detecting syntactic, semantic discontinuity
- Fitting into pre-defined meta structure
- Summary and Skimming
- Examples
- thumnails of an image,
- Preview of a movie
- Abstract of an article
- Challenge
- Extract semantically important information
22Multimedia Content Analysis (2)
- Information filtering
- Examples
- Junk email blocking
- Personalized TV programming
- Challenges
- Matching users preference to content
- Information retrieval
- Example
- Find similar picture, songs
- Challenges
- Similarity measures
- Object and event recognition
- Examples
- Face recognition
- Event detection
- Object tracking
- Challenges
- Temporal and spatial features
- Multi-modality features
23SBD Problem
- Shots
- In professional video, a shot is often taken
while the camera is stationary or in a regular
movement - The on and off of the camera defines a shot.
Hence, shot boundaries may be recorded during
video acquisition. - A shot is also the basic unit of video editing.
Hence shot boundaries may be recorded during
editing process
- Why SBD then?
- Process existing archives of news, TV programs
where shot information is not available - Process raw video clips where shots recording may
not be the best unit to manipulate video
24Shot Boundary Detection
- Shot
- A sequence of frames captured by one camera in a
single continuous action in time and space - Shot boundary detection
- Temporal segmentation of video
- Types
- Cut abrupt changes
- Gradual Fade-in, fade-out, dissolve
- Methods
- Statistical change detection of multi-dimensional
time series - Content features
- Color histogram
- Edge pixels
- Dominant motion vectors and residue errors
- Feature points extraction
- Distance measure
- Feature dependent
- May not be in a norm vector space
25SBD Methods
- Features
- Intensity Abs difference of intensity
- Robust intensity of pixel intensity differ
more than a threshold - Motion Amount of motion between corresponding
blocks - Color histogram
- Edges numbers and positions
- Graduate shot transition
- Dissolve, fade, wipe, etc.
- Extend over multiple frames
- Discontinuity between two shots will be separated
farther. - One way is to model transition as a separate
shots - Need to model how features characterizing
different shots vary for different types of
transition effects.
26Shot Boundary Detection Example
- Dissolve detection
- L1 norm of 1st 2nd derivatives of image
- Flash
- Sudden intensity increase over 1-2 frames
- Cut detection
- motion-compensated image difference
http//www-nlpir.nist.gov/projects/tvpubs/papers/c
lipsimag.paper.pdf
272003 TREC-VID Results
- TREC-VID
- Text retrieval conference
- Video-retrieval InDep. Evaluation
- funded by ARDA/NIST
- 2003 test
- 133 hr MPEG-I video
- 1998 ABC/CNN news C-span
- 13 videos, 4.9 GB
- 24 groups participate
- 4 tasks
- Shot boundary detectioin
- High-level feature extraction (17)
- Story segmentation and classification
- Search (manual and interactive)
http//www-nlpir.nist.gov/projects/trecvid/
28TRECVID 2003 Shot Boundary Detection
- Tasks
- Identify video shot transitions
- Classify each identified transition into
- Cut
- Dissolve
- Fadeout/in
- Others
- 596,604 frames are used for the SBD task in 2003
that contains 3734 shots
- Ground truth
- Manually created
- 3734 of them
- 70 cut
- 20 dissolve
- 4 fades
- 6 others
- Performance criteria
- Precision
- of correct transitions/ of transitions
reported - Recall
- of correct transitions/ of transitions in
ground truth
29TRECVID 03 Results for Cuts (Zoomed)
http//www-nlpir.nist.gov/projects/tvpubs/papers/t
v3.overview.slides.pdf
30TRECVID 03 Results for Graduate Transitions
http//www-nlpir.nist.gov/projects/tvpubs/papers/t
v3.overview.slides.pdf
31Observations
- Most techniques are based on frame- frame
comparisons, some with sliding windows - Comparisons are based on color and on luminance,
mostly - Some use adaptive thresholding, some dont
- Most operate on decoded video stream
- Some have special treatment of motion during GTs,
of flashes, of camera wipes - Performances are getting better
http//www-nlpir.nist.gov/projects/tvpubs/papers/t
v3.overview.slides.pdf
32Key frame Selection
- Key-frame
- A typical frame of a shot,
- Representing a salient content feature
- May have more than one key-frame per shot
- Method
- Fixed e.g. First, middle frame of each shot
- Frames closest to cluster center of a shot
sequence - New key frame is generated if the video
complexity of the current frame deviate from key
frame more than a threshold - Minima of motions
- Future directions
- Exploit semantic information recognize face,
activities - Higher level syntactic structures video editing
effects (switching, split frame, etc.)
33Key Frame Selection Methods
- Approach 1
- Assume video complexity is a time function to be
approximated by piece-wise linear approximation. - Key frames are knot points
- Approach 2
- Cluster video frames
- Close in time indices
- Similar in content
- Question How many key frames are needed?
34Key Frame Selection Example
- Perceived Motion Energy (PME)
- Product of fraction of dominant motion vector and
magnitude of motion vectors - Relate to amount of motion perceived by viewer
- Often show multiple triangles w.r.t. time for a
video - Peak of triangle is key frame
Liu, et al, IEEE Trans. CSVT, Oct 2003
35Content-based Retrieval
Input Module
Feature Database
Feature extraction
Image Database
Multimedia data
36Multimedia CBR System Design Issues
- Requirement analysis
- How the multimedia materials are to be used
- Determines what set of features are needed.
- Archiving
- How should individual objects are stored?
Granularity? - Indexing (query) and retrieving
- With multi-dimensional indices, what is an
effective and efficient retrieval method? - What is a suitable perceptually-consistent
similarity measure? - User interface
- Modality? Text or spoken language or others?
- Interactive or batch? Will dialogue be available?
37Indexing and Retrieving
- Index
- A very high dimensional binary vector
- Encoding of content features
- Text-based content can be represented with term
vectors - A/V content features can be either Boolean
vectors or term vectors
- Retrieval
- Retrieval is a pattern classification problem
- Use index vector as the feature vector
- Classify each object as relevant and irrelevant
to a query vector (template) - A perceptually consistent similarity measure is
essential
38Term Vector Query
- Each document is represented by a specific term
vector - A term is a key-word or a phrase
- A term vector is a vector of terms. Each
dimension of the vector corresponding to a term.
- Dimension of a term vector total number of
distinct terms. - Example
- Set of terms tree, cake, happy, cry, mother,
father, big, small - document Father gives me a big cake. I am so
happy, mother planted a small tree - Term vectors 0, 1, 1, 0, 0, 1, 1, 0, 1, 0,
0, 0, 1, 0, 0, 1
39Inverse Term Frequency Vector
- A probabilistic term vector representation.
- Relative Term Frequency (within a document)
- tf (t,d) count of term t / of terms in
document d - Inverse document Frequency
- df(t) total count of document/ of doc
contain t - Weighted term frequency
- dt tf(t,d) log df(t)
- Inverse document frequency term vector D d1,
d2,
40ITF Vector Example
- Document 1 The weather is great these days.
- Document 2 These are great ideas
- Document 3 You look great
- Eliminate The, is, these, are, you
41Content-based Querying
- Keywords
- Most natural for user
- Semantic abstraction of content
- Meta data
- Ontology needed
- Same thing, same name
- Examples
- Convenient for properties that are difficult to
put into words - Texture, shape, color
- Require at least one good example
- Need to specify features that need to be matched.
- Icon
- Good to express spatial relations
- Limit in vocabulary, need to search for
appropriate icon may become a retrieval problem
itself - Example composite picture of suspect
- Sketch
- Needs skills
- Non-standards, need to recognize sketch first
- Suitable for shape features
42User Models
- User Profiles
- Categorize users using features relevant to tasks
- Static features age, sex, etc.
- Dynamic features activity logs, etc.
- Derived features skill levels, preferences, etc.
- Use of Profiles for HCI
- Adaptation Customize HCI for different category
of users - Better understanding of users needs
43Content-Based Visual Query (1)
- Advantage
- Ease of creating, capturing and collecting
digital imaginary - Approaches
- Extract significant features (Color, Texture,
Shape, Structure) - Organize Feature Vectors
- Compute the closeness of the feature vectors
- Retrieve matched or most similar images
44Content-Based Visual Query (2)Improve Efficiency
- Keyword-based search
- Match images with particular subjects and narrow
down the search scope - Clustering
- Classify images into various categories based on
their contents - Indexing
- Applied to the image feature vectors to support
efficient access to the database
45Conceptual structure of the meta-search database.
46Object and Event Recognition
- High level understanding of the context of the
multimedia object. - Most useful, yet most challenging!
- Relates to vision research, object recognition,
speech recognition, emotion computing, - Current research directions
- Face recognition
- Event detection
- Story segmentation
47Face Recognition
- Challenge tasks
- Lighting, shade variations, Pose variations, Time
laps, Distance/resolution, Disguise, occlusion - Pre-processing
- Face detection
- 3D model
- Features
- Eigen-face
- Elastic branch net
- 3D features
- Invariant features
- Classifiers
- Template matching, ML, Bayes, SVM, neural net,
fuzzy logic
- Applications to CBMIP
- Detecting all human faces or a particular human
face from image or video sequences - Identify all faces belong to the same person
- Difficulties
- Controlled
- Semi-controlled
- Uncontrolled.
48Multi-AV-Sensor Surveillance Event Mining
http//www.ee.columbia.edu/dvmm/
49CB Video Surveillance Architecture
Foresti, et al, IEEE Trans. MM, Dec. 2002
50Abandoned Object Detection
Foresti, et al, IEEE Trans. MM, Dec. 2002
51Event Detection
Foresti, et al, IEEE Trans. MM, Dec. 2002
52Object Tracking
Foresti, et al, IEEE Trans. MM, Dec. 2002
53Abandoned Object Detection
Foresti, et al, IEEE Trans. MM, Dec. 2002
54Video Recurrent Event Modeling
HHMM Hierarchical Hidden Markov Model
http//www.ee.columbia.edu/dvmm/
55Content-adaptive Video Streaming
Chang, S. F., IEEE Multimedia Magazine, 2002
56Content-based Music Selection
- Facts
- Personal entertainment system can store thousands
of songs - Songs are published on-line, giving more
selections - User cant remember the title, melody or lyric of
each song he/she likes - Problems
- User which song to listen to?
- Musician how to sell a song to a buyer who likes
it
- Three goals of a personalized music DJ
- Repetition
- Surprise
- Exploitation of catalog
- Combinatorial optimization problem
- Given a set of criteria,
- Select a finite number of songs out of a large
catalog - To satisfy given constraints
Pachet, et. al. IEEE Multimedia Magazine, Jan 2000
57Personalized Music Disk Jockey
- Listeners profile
- Group profile
- Age, etc.
- Specified profile
- Favorite and hated songs
- Content features
- Genre, tempo, musician, instruments, pub. dates,
length, etc. - Motive, chord, melody
- Lyric, title (songs)
- Popularity, prices
- Similarity measures
- Surprise dissimilar
- Constraints
- Based on user profile and query
- Individual objects
- sequences
- Select music objects that meet the constraints
- Generate the sequence from selected objects while
meeting sequence constraints - Stochastic optimization methods may be needed.
58Semantic Analysis of Film
Edge detection on pace flow, and corresponding
story and event sections (Titanic)
- Estimate camera pan/tilt parameters from video
frames - Estimate pace from camera motion
- Exploit relations between pace boundaries and
story line boundaries
Adams, et al, IEEE Trans. Multimedia, Dec. 2002
59Detection Results
60ClassView
- Hierarchical video shots classification,
indexing, and accessing - Addressing semantic gap between low-level video
features and high-level semantic concepts. - Goal
- Classify unlabeled video into the semantic labels
so that they can be accessed in a semantically
meaningful way.
- The concept (labels) is organized in a
tree-structure that is generated by human experts
with the help of WordNet (wordnet.princeton.edu) - Pre-labeled training samples provide examples
that relates low level content features to
semantic labels at various levels of the
classification tree.
Fan, et al, IEEE Trans. Multimedia, Feb.2004
61Hierarchical Video Database Model
Fan, et al, IEEE Trans. Multimedia, Feb.2004
62Video News Classification Tree
63Video Cut Detection Results
- Detected scene cuts
- Color histogram difference at different thresholds
64Bottom Up Procedure
65Multimedia summary and filtering
- Summary
- Text email reading
- Image caption generation
- Video high-lights, story board
- Issues
- Segmentation
- Clustering of segments
- Labeling clusters
- Associate with syntactic and semantic labels
- Filtering
- Same as retrieval filter out irrelevant objects
based on a given criterion (query) - Often need to be performed based on content
features - E.g. filtering traffic accidents or law
violations from traffic monitoring videos
66Content based Coding and Post-processing
- Different coding decisions based on low level
content features - coding mode (inter/intra selection)
- motion estimation
- Object based coding
- Encoding different regions (VOP) separately
- Using different coder for different types of
regions
- Multiple abstraction layer coding
- An analysis/synthesis approach
- Synthesize low level contents from higher level
abstraction - E.g. texture synthesis
- Content based post-processing
- Identify content types and en synthesize low
level content
67Conclusion
- Issues related to content-based multimedia
information processing are surveyed. - Current focus is on low level content analysis
based on statistical approach. - Statistical analysis methods, especially fusion,
is reviewed. - High level knowledge based understanding should
be incorporated in CBMIP algorithms to further
advance the state of the art.
68Statistical Tools for CBMIP
- Hypothesis testing
- Detection
- Pattern classification
- Pattern classifiers
- MAP (Bayse) classifier
- ML classifier based on mixture of Gaussian
- Rule based, fuzzy logic
- Decision tree
- Clustering based LVQ
- Linear classifier
- With kernel SVM
- Nearest neighbor
- Multi-layer perceptron
- Information fusion
- Basically a pattern classification task
- Decision fusion vs value fusion
- The key
- Feature selection!
69MAP Maximum A Posteriori Classifier
- The MAP classifier stipulates that the classifier
that minimizes pr. of mis-classification should
choose - g(x) c(i) if
- P(c(i)x) P(c(i)x), i ? i.
- This is an optimal decision rule.
- Unfortunately, in real world applications, it is
often difficult to estimate P(c(i)x).
- Fortunately, to derive the optimal MAP decision
rule, one can instead estimate a discriminant
function Gi(x) such that for any x ? X, i ? i. - Gi(x) Gi(x) iff
- P(c(i)x) P(c(i)x)
- Gi(x) can be an approximation of P(c(i)x) or any
function satisfying above relationship.
70Maximum Likelihood Classifier
- Use Bayes rule,
- p(c(i)x) p(xc(i))p(c(i))/p(x).
- Hence the MAP decision rule can be expressed as
- g(x) c(i) if
- p(c(i))p(xc(i)) p(c(i))p(xc(i)), i ? i.
- Under the assumption that the a priori Pr. is
unknown, we may assume p(c(i)) 1/M. As such,
maximizing p(xc(i)) is equivalent to maximizing
p(c(i)c). -
- The likelihood function p(xc(i)) may assume a
uni-variate Gaussian model. That is, - p(xc(i)) N(?i,?i)
- ?i,?i can be estimated using samples from
xt(x) c(i). - A priori pr. p(c(i)) can be estimated as
71Nearest-Neighbor Classifier
- Let y(1), , y(n) ? X be n samples which
has already been classified. Given a new
sample x, the NN decision rule chooses g(x)
c(i) if -
- is labeled with c(i).
- As n ??, the prob. Mis-classification using NN
classifier is at most twice of the prob.
Mis-classification of the optimal (MAP)
classifier. - k-Nearest Neighbor classifier examine the
k-nearest, classified samples, and classify x
into the majority of them. - Problem of implementation require large storage
to store ALL the training samples.
72MLP Classifier
- Each output of a MLP will be used to approximate
the a posteriori probability P(c(i)x) directly. - The classification decision then amounts to
assign the feature to the class whose
corresponding output at the MLP is maximum. - During training, the classification labels (1 of
N encoding) are presented as target values
(rather than the true, but unknown, a posteriori
probability)
- Denote y(x,W) to be the ith output of MLP, and
t(x) to be the corresponding target value (0 or
1) during the training. - Hence y(x,W) will approximate E(t(x)x)
P(c(i)x)
73Optimal Hyper-plane Linearly Separable Case
- Optimal hyper-plane should be in the center of
the gap. - Support Vectors ? Samples on the boundaries.
Support vectors alone can determine optimal
hyper-plane. - Question How to find optimal hyper-plane?
- For di 1, g(xi) wTxi b ? ?w ?
woTxi bo ? 1 - For di ?1, g(xi) wTxi b ? ??w ? woTxi
bo ? ?1
74Quadratic Optimization Problem Formulation
- Given (xi, di) i 1 to N, find w and b such
that - ?(w) wTw/2
- is minimized subject to N constraints
- di ? (wTxi b) ? 0 1 ? i ? N.
- Method of Lagrange Multiplier
-
-
75Formulating the Dual Problem
- At the saddle point, we have
and , substituting these relations
into above, then we have the - Dual Problem
Maximize Subject to and ?i ? 0 for i 1,
2, , N.
Note
76Inner Product Kernels
In general, if the input is first transformed via
a set of nonlinear functions ?i(x) and then
subject to the hyperplane classifier
Define the inner product kernel as
one may obtain a dual optimization problem
formulation as
Often, dim of ? (p1) dim of x!
77Information Fusion
- What is fusion?
- Decision-making (hypothesis testing) and value
computation (estimation) based on two or more
information sources. - Eg. speech recognition based on audio,
lip-reading and gesture
- Why fusion?
- It is easier to process individual info sources
separately - Transmitting raw data to the fusion center too
costly - Types of fusion
- Decision fusion vs value fusion
- Stack generalization vs mixture of experts
78Decision Fusion
?(d)
Fusion Center
d d1 d2 ? dK T Decision vector
Low data rate channel
Local Decisions
dK
d1
d2
Member Classifier K
Member Classifier 1
Member Classifier 2
? ? ?
High data rate channel
x
79Decision Fusion Methods
- Weighted linear combination methods
- Assume dk independent
- Majority voting
- Weighted voting
- Follow the leader
- Stack generalization
- Fusion pattern classification
- Optimal decision fusion
- Behavioral knowledge space Huang95
- Table look up method
- CODF
- Complement ODF with a weighted linear combination
method - Handle better when there are few training samples
80Optimal Decision Fusion
- If dk has N labels, and K member classifiers,
there are at most NK different decision vectors
d. - A table assigning each of the NK d to a class
label may yield optimal decision fusion that
minimizes prob. Mis-classification. - When the table is constructed using training data
samples, this method is called the BKS method by
Huang and Seun Huang95.
- Practical difficulties
- Table too large when NK becomes large
- Some entries have too few training samples
- CODF
- Complement the ODF with weighted linear
combination method.
81Weighted linear combination methods
- General solution
- Perceptron learning problem
- If not linearly separable, will not converge
- Relax constraint
- Replace the nonlinear step function by a
sigmoidal function or slop function - Support vector machine (SVM)
- Back-propagation learning
- Least square estimation
- Majority voting
- wk 1, ? K/2
- Threshold voting
- wk 1, ? ? K/2
- Following the leader
- wk 1, wk 0, k ? k, ? 0
- k classifier has best performance
82Mixture of Expert
? gi (x) 1, 0? gi (x) ?1
z(x) ? gi (x)di(x)
Gating Network
?
gK
?
?
g1
g2
d1
d2
dK
Member Classifier 1
Member Classifier 2
Member Classifier K
x
83Mixture of Expert
- Gating network,
- locating at fusion center,
- based on raw data ? communication resources
- For each x, minimize ??T(x) ??kdk(x) gk(x) ??
subject to ?kgk(x) 1, and 0?gk(x)?1 ? gk(x)1
if dk(x) T(x). - Given gk(x) for all x in training set, dk(x) can
be determined to training the member classifier. - Iterative training using EM algorithm
- Fix gk train dk
- Fix dk train gk