Title: Visual Classification and Regression using Scale Space Theory
1A New Approach for Classification
Visual Simulation Viewpoint
Zongben Xu Deyu Meng
Xian Jiaotong University
2Outline
- Introduction
- The existing approaches
- Visual sensation principle
- Visual classification approach
- Visual learning theory
- Concluding remarks
31. Introduction
- Data Mining (DM) the main procedure of KDD,
aims at the discovery of useful know-ledge from
large collections of data. The knowledge mainly
refers to - Clustering
- Classification
- Regression
-
4Clustering
- Partitioning a given dataset with known or
un-known distribution into homogeneous subgroups.
5Clustering
- Object categorization/classification from remote
sensed image example
6Classification
- Finding a discriminant rule (a function f(x))
from the experiential data with k labels
generated from an unknown but fixed distribution
(normally, k2 is focused).
7Classification
8Classification
- Fingerprint recognition example
9 Regression
- Finding a relationship (a function f(x)) between
the input and output training data generated by
an unknown but fixed distribution
10 Regression
- Air quality prediction example (the data
obtained at the Mong Kok monitory station of Hong
Kong based on hourly continuous measurement
during the whole year of 2000).
11Existing Approaches for Clustering
- Hierarchical clustering
- nested hierarchical clustering
- nonnested hierarchical clustering
- SLINK
- COMLINK
- MSTCLUS
- Partitional clustering
- K-means clustering
- Neural networks
- Kernel methods
- Fuzzy methods
12Existing Approaches for Classification
- Statistical approach
- Parametric methods
- Bayesian method
- Nonparametric methods
- Density estimation method
- Nearest-neighbor method
- Discriminant function approach
- Linear discriminant method
- Generalized linear discriminant method
- Fisher discriminant method
- Nonmetric approach
- Decision trees method
- Rule-based method
- Computational intelligence approach
- Fuzzy methods
- Neural Networks
- kernel methods Support Vector Machine
13Existing Approaches for Regression
- Interpolation methods
- Statistical methods
- Parameter regression
- Non-parameter regression
- Computational intelligent methods
- Fuzzy regression methods
- -insensitive fuzzy c-regression model
- Neural Networks
- kernel methods Support Vector Regression
14Main problems encountered
- Validity problem (Clustering) is there real
clustering? how many? - Efficiency/Scalability problem in most cases
efficient only for small/ middle sized data set. - Robustness problem most of the results are
sensitive to model parameters, and sample
neatness. - Model selection problem no general rule to
specify the model type and parameters.
15Research agenda
- The essence of DM is modeling from data. It
depends not only on how the data are generated,
but also on how we sense or perceive the data.
The existing DM methods are developed based on
the former principle, but less on the latter one.
- Our idea is to develop DM methods based on human
visual sensation and perception principle
(particularly, to treat a data set as an image,
and to mine the knowledge from the data in
accordance with the way we observe and perceive
the image).
16Research agenda (Cont.)
- We have successfully developed such an approach
for clustering , particularly solved the
clustering validity problem. See, - Clustering by Scale Space Filtering,
- IEEE Transaction on PAMI, 2212(2000), 1396-1410
- This report aims at initiating the approach for
classification, with an emphasis on solving the
Efficiency/ Scalability problem and the
Robustness problem. - The model selection problem is under our current
research.
172.1. Visual sensation principle
The structure of the human eye
182.1. Visual sensation principle
Accommodation (focusing) of an image by changing
the shape of the crystalline lens of the eyes (or
equivalently, by changing the distance between
image and eye when the shape of lens is fixed)
192.1. Visual sensation principle
How an image in retina varies with the distance
between object and eye (or equivalently, with the
shape of crystalline lens)? Scale space theory
provides us an explanation. The theory is
supported by neurophysiologic findings in animals
and psychophysics in man directly.
202.2. Scale Space Theory
212.2. Scale Space Theory
222.2. Scale Space Theory
232.3 Cell responses in retina
Only change of light can be perceived and only
three types of cell responses exist in retina
- 'ON' response the response to arrival of a light
stimulus (the blue region)
- 'OFF' response the response to removal of a
light stimulus (the red region)
- 'ON-OFF' response the response to the hybrids of
on and off (because both presentation and
removal of the stimulus may simultaneously exist)
(the yellow region)
242.3. Cell responses in retina
Between on and off regions, roughly at the
boundary is a narrow region where on-off
responses occur. Every cell has its own response
strength, roughly, the strength is Gaussian-like.
253. Visual Classification Approach our philosophy
263. VCA Our philosophy (Cont.)
273. VCA A method to choose scale
An observation
283. VCA A method to choose scale
293. VCA Method to choose scale
303. VCA Procedure
313. VCA Demonstrations
Linearly separable data without noise
323. VCA Demonstrations
Linearly separable data with 5 noise
333. VCA Demonstrations
Circularly separable data without noise
343. VCA Demonstrations
Circularly separable data with 5 noise
353. VCA Demonstrations
Spirally separable data without noise
363. VCA Demonstrations
spirally separable data with 5 noise
373. VCA Efficiency test
11 groups of benchmark datasets from UCI,
DELVE and STATLOG
383. VCA Efficiency test
Performance comparison between VCA SVM
393. VCA Scalability test
Time complexity of VCA with increase of size of
training data is quadratic (a), with increase of
dimension of data is linear (b).
(a) fixed 10-D but varying size data
sets are used.
(b) Fixed 5000 size but varying dimension
datasets are used.
403. VCA conclusion
1. Without increase of misclassification rate
(namely, loss of generalization capability),
much less computation effort is paid, as
compared with SVM (approximately 0.7 times of
SVM is required, increasing 142 times
computation efficiency). That is, VCA has very
high computation efficiency. 2. The VCA
s training time increases linearly with
dimension and quadtratically with size of
training data. This shows that VCA has a very
good scalabity.
414. Theory Visual classification machine
- Formalization (Learning theory)
- Let be sample space ( be
pattern space and
label space), and assume that there exists a
fixed but unknown relationship F (or
equivalently, there is a fixed but unknown
distribution on ). - Given a family of functions
- and a finite number of samples
- which is drawn independently identically
according to . -
424. Theory Visual classification machine
We are asked to find a function
in which approximates F in , that is,
find a a function in , for a certain
type of measure Q between machines output
and actual output , so that
(Learning
problem) where
(risk or generalization error)
434. Theory Visual classification machine
- Learning algorithm (Convergence)
- A learning algorithm L is a mapping from
to H with the following property - For any , there is an
integer such that whenever
, - where .
- In this case, we say that L(Z) is a
-solution of the learning problem. Given
an implementation scheme of a learning problem,
we say it is convergent if it is a learning
algorithm.
444. Theory Visual classification machine
- Visual classification machine (VCM)
- The function set
-
-
- The generalization error
- The learning implementation scheme (the procedure
of finding ) (Is it a learning algorithm?)
454. Theory Visual classification machine
- Learning theory of VCM
- How can the generalization performance of VCM be
controlled (what is the learning principle)? - If is it convergent? (If it is a learning
algorithm?)
Key is to develop a rigorous upper bound
estimation on and estimate
464. Theory Visual classification machine
- This theorem shows that to maximize the
generalization of the machine is equivalently to
minimize .
474. Theory Visual classification machine
- VCA is just designed to approximate
here. This reveals the learning principle behind
VCA and explain why VCA has strong generalization
capability.
484. Theory Visual classification machine
- This theorem shows that the VCA is a learning
algorithm. Consequently, a learning theory of VCM
is established.
495. Concluding remarks
- The existing approaches for classification has
mainly been aimed to exploring the intrinsic
structure of dataset, less or no emphasis paid on
simulating human sensation and perception. We
have initiated an approach for classification
based on human visual sensation and perception
principle (The core idea is to model the blurring
effect of lateral retinal interconnections based
on scale space theory). The preliminary
simulations have demonstrated that the new
approach potentially is encouraging and very
useful. - The main advantages of the new approach are its
very high efficiency and excellent scalibility.
It very often brings a significant reduction of
computation effort without loss of prediction
capability, especially compared with the
prevalently adopted SVM approach. - The theoretical foundations of VCA, Visual
learning theory, have been developed, which
reveals that (1) VCA attains its high
generalization performance via minimizing the
upper error bound between actual and optimal
risks (learning principle) (2) VCA is a learning
algorithm.
505. Concluding remarks
- Many problems deserve further research
- To apply nonlinear scale space theory for
further efficiency speed-up - to utilize VCA to practical engineering problems
(e.g., DNA sequence analysis) - to develop visual learning theory for regression
problem, etc.
51Thanks!