Title: Statistical%20Learning%20Theory%20and%20Applications
1Statistical Learning Theory and Applications
9.520
Sayan Mukherjee and Ryan Rifkin tomaso
poggio Alex Rakhlin
2Learning Brains and Machines
Learning is the gateway to understanding the
brain and to making intelligent machines.
Problem of learning a focus for o modern
math o computer algorithms o neuroscience
3Multidisciplinary Approach to Learning
Learning theory algorithms
- Information extraction (text,Web)
- Computer vision and graphics
- Man-machine interfaces
- Bioinformatics (DNA arrays)
- Artificial Markets (society of learning
agents)
ENGINEERING APPLICATIONS
Computational Neuroscience modelsexperiments
Learning to recognize objects in visual cortex
4Class
Rules of the game problem sets (2 1
optional) final project grading
participation! Web site www.ai.mit.edu/projects/
cbcl/courses/course9.520/index.html
5Overview of overview
o Supervised learning the problem and how to
frame it within classical math o Examples of
in-house applications o Learning and the brain
6 Learning from Examples
f
OUTPUT
INPUT
7Learning from Examples formal setting
Given a set of l examples (past
data) Question find function f such that
is a good predictor of y for a future input
x
8Classical equivalent view supervised learning as
problem of multivariate function approximation
data from f
function f
y
approximation of f
x
Generalization estimating value of function
where there are no data
Regression function is real valued
Classification function is binary
Problem is ill-posed!
9Well-posed problems
- A problem is well-posed if its solution
- exists
- is unique
- is stable, eg depends continuously on the data
Inverse problems (tomography, radar, scattering)
are typically ill-posed
10 Two key requirements to solve the problem of
learning from examples well-posedness and
consistency.
A standard way to learn from examples is The
problem is in general ill-posed and does not have
a predictive (or consistent) solution. By
choosing an appropriate hypothesis space H it can
be made well-posed. Independently, it is also
known that an appropriate hypothesis space can
guarantee consistency. We have proved recently a
surprising theorem consistency and well
posedness are equivalent, eg one implies the
other.
Thus a stable solution is also predictive and
viceversa.
Mukherjee, Niyogi, Poggio, 2002
11A simple, magic algorithm ensures consistency
and well-posedness
implies
Equation includes Regularization Networks
(examples are Radial Basis Functions and Support
Vector Machines)
For a review, see Poggio and Smale, Notices of
the AMS, 2003
12Classical framework but with more general loss
function
The algorithm uses a quite general space of
functions or hypotheses RKHSs. n of the
classical framework can provide a better measure
of loss (for instance for classification)
Girosi, Caprile, Poggio, 1990
13Formally
14Equivalence to networks
Many different V lead to the same solution
K
and can be written as the same type of
network..
K
K
f
15Unified framework RN, SVMR and SVMC
Equation includes Regularization Networks, eg
Radial Basis Functions, and Support Vector
Machines (classification and regression) and some
multilayer Perceptrons.
Review by Evgeniou, Pontil and Poggio Advances in
Computational Mathematics, 2000
16The theoretical foundations of statistical
learning are becoming part of mainstream
mathematics
17Theory summary
- In the course we will introduce
-
- Stability (well-posedness)
- Consistency (predictivity)
- RKHSs hypotheses spaces
- Regularization techniques leading to RN and SVMs
- Generalization bounds based on stability
- Alternative bounds (VC and Vgamma dimensions)
- Related topics, extensions beyond SVMs
- Applications
- A new key result
-
18Overview of overview
o Supervised learning real math o Examples
of recent and ongoing in-house research on
applications o Learning and the brain
19Learning from Examples engineering applications
Bioinformatics Artificial Markets Object
categorization Object identification Image
analysis Graphics Text Classification ..
20 Bioinformatics application predicting type of
cancer from DNA chips signals
Learning from examples paradigm
21 Bioinformatics application predicting type of
cancer from DNA chips
New feature selection SVM Only 38 training
examples, 7100 features AML vs ALL 40 genes
34/34 correct, 0 rejects. 5 genes 31/31
correct, 3 rejects of which 1 is an error.
22Learning from Examples engineering applications
Bioinformatics Artificial Markets Object
categorization Object identification Image
analysis Graphics Text Classification ..
23Face identification example
- An old view-based system 15 views
Performance 98 on 68 person
database Beymer, 1995
24Face identification
Identification Result
SVM Classifier
Feature extraction
New face image
Bernd Heisele, Jennifer Huang, 2002
25Real time detection and identification
- Ported to Oxygens H21 (handheld device)
Identification of rotated faces up to about 45º - Robust against changes in illumination and
background - Frame rate of 15 Hz
Ho, Heisele, Poggio, 2000
Weinstein, Ho, Heisele, Poggio, Steele, Agarwal,
2002
26Learning from Examples engineering applications
Bioinformatics Artificial Markets Object
categorization Object identification Image
analysis Graphics Text Classification ..
27Learning Object Detection Finding Frontal Faces
...
- Training Database
- 1000 Real, 3000 VIRTUAL
- 50,0000 Non-Face Pattern
- Sung, Poggio 1995
28Recent work on face detection
- Detection of faces in images
- Robustness against slight rotations in depth and
imageplane - Full face vs. component-based classifier
Heisele, Pontil, Poggio, 2000
29The best existing system for face detection?
Heisele, Serre, Poggio et al., 2000
30Trainable System for Object Detection
Pedestrian detection - Results
Papageorgiou and Poggio, 1998
31Trainable System for Object Detection
Pedestrian detection - Training
Papageorgiou and Poggio, 1998
32The system was tested in a test car (Mercedes)
33(No Transcript)
34System installed in experimental Mercedes
A fast version, integrated with a real-time
obstacle detection system MPEG
Constantine Papageorgiou
35System installed in experimental Mercedes
A fast version, integrated with a real-time
obstacle detection system MPEG
Constantine Papageorgiou
36People classification/detection training the
system
. . .
. . .
1848 patterns
7189 patterns
Representation overcomplete dictionary of Haar
wavelets high dimensional feature space (gt1300
features)
Core learning algorithm Support Vector
Machine classifier
pedestrian detection system
37An improved approach combining component
detectors
Mohan, Papageorgiou and Poggio, 1999
38Results
The system is capable of detecting partially
occluded people
39System Performance
- Combination systems (ACC) perform best.
- All component based systems perform better than
full body person detectors.
A. Mohan, C. Papageorgiou, T. Poggio
40Learning from Examples Applications
Object identification Object categorization Image
analysis Graphics Finance Bioinformatics
41Image Analysis
- IMAGE ANALYSIS OBJECT RECOGNITION AND POSE
ESTIMATION
Þ Bear (0 view)
Þ Bear (45 view)
42Computer vision analysis of facial expressions
The main goal is to estimate basic facial
parameters, e.g. degree of mouth openness,
through learning. One of the main applications is
video-speech fusion to improve speech recognition
systems.
Kumar, Poggio, 2001
43Combining Top-Down Constraints and Bottom-Up Data
Morphable Model
Kumar, Poggio, 2001
44The Three Stages
Localization of Facial Features
Analysis of Facial parts
Face Detection
For more details ? Appendix 2
45Learning from Examples engineering applications
Bioinformatics Artificial Markets Object
categorization Object identification Image
analysis Graphics Text Classification ..
46Image Synthesis
Q 0 view Þ
Q 45 view Þ
47Supermodels
MPEG (Steve Lines)
48A trainable system for TTVS
- Input Text
- Output photorealistic talking face uttering text
Tony Ezzat
49TTVS video
Tony Ezzat, T. Poggio
50From 2D to a better 2d and from 2D to 3D
- Two extensions of our text-to-visual-speech
(TTVS) system - morphing of 3D models of faces to output a 3D
model of a speaking face (Blanz) - Learn facial appearance, dynamics and
coarticulation (Ezzat)
51Towards 3D
Neutral face shape was reconstructed from a
single image, and animation transfered to new
face.
3D face scans were collected.
Click for animation
Click for animation
Blanz, Ezzat, Poggio
52Using the same basic learning techniques
Trainable Videorealistic Face Animation(voice is
real, video is synthetic)
Ezzat, Geiger, Poggio, SigGraph 2002
53Trainable Videorealistic Face Animation
2. Run Time For any speech input the system
provides as output a synthetic video stream
1. Learning System learns from 4 mins of
video the face appearance (Morphable Model) and
the speech dynamics of the person
Phone Stream
/B/
/AE/
/AE/
/JH/
/SIL/
Phonetic Models
Trajectory Synthesis
MMM
Image Prototypes
Tony Ezzat,Geiger, Poggio, SigGraph 2002
54Novel !!! Trainable Videorealistic Face Animation
Let us look at video!!!
Ezzat, Poggio, 2002
55Reconstructed 3D Face Models from 1 image
Blanz and Vetter, MPI SigGraph 99
56Reconstructed 3D Face Models from 1 image
Blanz and Vetter, MPI SigGraph 99
57Learning from Examples Applications
Object identification Object categorization Image
analysis Graphics Finance Bioinformatics
58Artificial Agents learning algorithms buy and
sell stocks
Example of a subproject The Electronic Market
Maker
Nicholas Chang
59Learning from Examples engineering applications
Bioinformatics Artificial Markets Object
categorization Object identification Image
analysis Graphics Text Classification ..
60Overview of overview
o Supervised learning the problem and how to
frame it within classical math o Examples of
in-house applications o Learning and the brain
61The Ventral Visual Stream From V1 to IT
modified from Ungerleider and Haxby, 1994
62Summary of basic facts
- Accumulated evidence points to three (mostly
accepted) properties of the ventral visual stream
architecture - Hierarchical build-up of invariances (first to
translation and scale, then to viewpoint etc.) ,
size of the receptive fields and complexity of
preferred stimuli - Basic feed-forward processing of information (for
immediate recognition tasks) - Learning of an individual object generalizes to
scale and position
63The standard model following Hubel and Wiesel
Learning (supervised)
Learning (unsupervised)
Riesenhuber Poggio, Nature Neuroscience, 2000
64The standard model
- Interprets or predicts many existing data in
microcircuits and system physiology, and also in
cognitive science - What some complex cells and V4 cells do and how
MAX - View-tuning of IT cells (Logothetis)
- Response to pseudomirror views
- Effect of scrambling
- Multiple objects
- Robustness to clutter
- Consistent with K. Tanakas simplification
procedure - Categorization tasks (cats vs dogs)
- Invariance to translation, scale etc
- Gender classification
- Face inversion effect experience, viewpoint,
other-race, configural - vs. featural representation
- Transfer of generalization
- No binding problem, no need for oscillations
65Models early predictions neurons become
view-tuned during recognition
Logothetis, Pauls, and Poggio, 1995 Logothetis,
Pauls, 1995
Poggio, Edelman, Riesenhuber (1990, 2000)
66Recording Sites in Anterior IT
neurons tuned to faces are close by.
Logothetis, Pauls, and Poggio, 1995 Logothetis,
Pauls, 1995
67The Cortex Neurons Tuned to Object Views as
predicted by model
Logothetis, Pauls, Poggio, 1995
68View-tuned cells scale invariance (one training
view only)!
Logothetis, Pauls, and Poggio, 1995 Logothetis,
Pauls, 1995
69Predictions of the standard model for view tuned
IT cells vs Logothetis data (using same stimuli)
Riesenhuber Poggio, Nature Neuroscience,1999
70Max key operation in view-based module
Pooling of simple cells inputs by some of the
complex cell via a max-like operation is key
Riesenhuber Poggio, Nature Neuroscience,1999
71Testing the MAX Hypothesis
The task
and
vs.
RA
RB
RAB
RAB MAX(RA ,RB) ? Or RAB RA RB ?
72Just in some V4 neurons may do a Max
T. Gawne, J.M. Martin, 2002
73Some primate V4 neurons show a max operation
Gawne,Martin, 2002
74Model Ties Together Research on Different Levels
- Invariance
- Neuronal tuning
- Categorization and identification
75Joint Project (Max Riesenhuber) with Earl Miller
David Freedman (MIT) Neural Correlate of
Categorization (NCC)
Define categories in morph space
60 Dog Morphs
60 Cat Morphs
80 Cat Morphs
80 Dog Morphs
Prototypes 100 Dog
Prototypes
100 Cat
Category boundary
76Categorization task
Train monkey on categorization task
After training, record from neurons in IT PFC
77Single cell example a PFC neuron that responds
more strongly to DOGS than CATS
D. Freedman E. Miller M. RiesenhuberT.
Poggio (Science, 2001)
78The model suggests same computations for
different recognition tasks (and objects!)
Task-specific units (e.g., PFC)
General representation (IT)
? Is this whats going on in cortex?
HMAX