A New Approach To The Multiclass Classification Problem

About This Presentation

Title:

A New Approach To The Multiclass Classification Problem

Description:

In practice multi-class classification is done by combining binary classifiers ... Mapped feature vectors that lie inside the hypercone have a distinct class label ... – PowerPoint PPT presentation

Number of Views:255

Avg rating:3.0/5.0

Slides: 48

Provided by: ASM573

Category:

more less

Transcript and Presenter's Notes

Title: A New Approach To The Multiclass Classification Problem

1
A New Approach To The Multiclass Classification
Problem

Category Vector Space

2
Agenda

Problem
Motivation
Discussion
Preliminary Results

3
Problem
Classification Problem

Multi-class classification through binary
classification
One-vs-All
One-vs-One
Multi-class classification can be constructed
often as a generalization of binary
classification
In practice multi-class classification is done by
combining binary classifiers

4
Multiclass Applications Large Category Space
Problem
Object recognition
Automated protein classification
Digit recognition
http//www.glue.umd.edu/zhelin/recog.html
Phoneme recognition
300-600

The multi-class algorithm computationally
expensive

Waibel, Hanzawa, Hinton,Shikano, Lang 1989
5
Problem
Other Multiclass Applications

Hand-writing recognition (e.g., USPS)
Text classification
Face detection
Face expression recognition

6
Problem
Classification Setup
Training and test data drawn i.i.d. from fixed
but unknown probability distribution D

Data (xi,yi) i 1,,n

Labeled training set
Question design a classification rule
y f(x) such that, given a new x,
this predicts y with minimal probability of
error
7
Support Vector Machines (SVMs)
Problem

Training examples mapped to
(usually high-dimensional)
feature space by a feature
map F(x) (F1(x), , Fd(x))
Learn linear decision boundary
Trade-off between maximizing
geometric margin of the training
data and minimizing margin violations

_
_

_
_
_
_
8
Problem
Definition Of SVM Classifiers

Linear classifier defined in feature space by
SVM solution gives
as a linear combination of support vectors, a
subset of the training vectors

w
_
_
b

_
_
_
_
9
Problem
Definition Of A Margin

History (Vapnik, 1965) if linearly separable
Place hyerplane far from the data large margin

10
Problem
Maximize The Margin

History (Vapnik, 1965) if linearly separable
Place hyerplane far from the data large margin

Large margin classifier
Leads to good generalization (performance on test
sets)

11
Problem
Combining Binary Classifiers

One-vs-All (OVA)
For each class build a classifier for that class
vs the rest
Constructs k SVM models
Often very imbalanced classifiers
Asymmetry in the amount of training data
Earliest implementation for SVM multiclass
One-vs-One (OVO)
Constructs k(k-1)/2 classifiers
Rooted binary SVMs with k leaves
Traverse tree to reach leaf node

12
Example 1
Motivation

Race categories White, Black, Asian
Task Map the image training set to the race
labels
The training (learning)
Test (generalization)
Scenario Ambiguous test image is presented
Mixed race person
A person drawn from a race which is not
represented by the system (i.e. Hispanics, Native
Americans, etc)
No way of assigning a mixed label
The system cannot represent the mixed race person
using a combination of categories
No way of representing unknown race
Possible Solution
Indicate that the incoming image is outside the
margin of each learned category

13
Example 2
Motivation

Musical samples generated by a single instrument
Electric guitara set of note categories
C,C,D,D, etc.
Task Map the training set musical notes to the
labels
Reasonable learning and generalization properties
Scenario Given musical sequences
Intervals (two notes simultaneously struck such
as C,F )
Chords (containing three or more notes)
Ambiguity at the training set level
Forced to assign new labels to intervals and
chords even though they contain the same
featuressingle notesas the note categories
Music sequence case, if we learned a conditional
probability distribution p(L lx)
x is a music sequence and L C,C,D, ,B
is a set of note labels
When x is an intervalsay a tritone
No way of assigning high probability to the
tritone
Possible Solution
Accommodate the tritone by assigning it a new
label
Large label space
Truncate because of exponential size
considerations

14
Problems With Combining Binary Classifiers
Motivation

Categories are conceived as nominal labels
No underlying geometry for the categories
Inability of the conditional distribution to give
us a measure (value) for interpolated categories
Non-represented interpolated categories are left
out
Not easy to distinguish basic categories from
compound categories

15
Category Vector Spaces Solution
Motivation

Invoke the notion of a category vector space
Categories are defined with a geometric structure
Assume that the set of categories(labels) forms a
vector space
Music sequence would correspond to a label
in a twelve dimensional vector
C,C,D,D,E,F,F,G,G,A,A,B
Basic note C,C,D etc. would have its own
coordinate axis (vector space)
Learning problem
Map the training set music sequences to vectors
in a 12 dimensional space such that the training
and test set errors are small
Map the training musical sequences to the 12
dimensional vector space and then (if a support
vector machine approach is used), maximize the
margin of the mapped vectors in the category
space
Race classification example is analogous
Depends on how many races we wish to explicitly
represent
Map the training set to race category vector
space and maximize the margin
Generalization problem
Map a test set musical sequence or image into the
category space and then ask if it lies within the
margin of a note (or chord) or race category

Note Extensions to other multi-category learning
applications are straightforward assuming we can
map category labels to coordinate.
16
Multiclass Fisher Related Idea
Discussion
Given the feature vectors
D categories and a projected set of features
defined by the MC-FLD maximizes
where
Solution The columns of W are the top D
eigenvectors (corresponding to the largest
eigenvalues) of

Eigenvectors are orthonormal
Columns of W constitute a category vector space
Interpret as a category space projection
Optimal solution is a set of orthogonal weight
vectors

17
Disadvantage Of Multiclass Fisher
Discussion

Avoided this approach since margins are not
maximized in category space
We have not seen a classifier take a three class
problem with labels 0,1,2, map the input
features into a vector space
Basis vectors , and
Attempt to maximize the margin in the category
vector space
Not seen any previous work where a pattern from a
compound categorysay a combination of labels 1
and 2is also used in training with a conversion
of the compound category to a vector

18
Description of Category Vector Spaces
Discussion

Input feature vectors are mapped to the category
vector space using a kernel-based approach
In the category vector space, maximizing the
margin is equivalent to forming hypercones
Mapped feature vectors that lie inside the
hypercone have a distinct class label
Mapped vectors that lie in between hypercones are
ambiguous
Hypercones are not allowed to intersect

Depicts basic categories
19
Advantages Of Category Vector Space
Discussion

Each pattern now exists as a linear superposition
of category vectors in the category space.
Ensures ambiguity is handled at a fundamental
level
Compound categories can be directly represented
in the category space
Can maximize the compound category margin as well
as the margins for the basic categories

20
Technical Challenges
Discussion

Regression
Each input training set feature vector
must be mapped to a corresponding point
where M is the number of feature dimensions
and D the cardinality of the basic categories
Classification
Each mapped feature vector must maximize its
margin relative to its own category vector
against the other category vectors Here
is known and corresponds to a category vector

21
Regression In Category Space
Discussion

controls the width of the interval for which
there is no penalty
Slack variable vectors are non-negative
component-wise
Weight vector and bias help map the feature
vector to its counterpart.
The choice of kernel K (GRBF or otherwise) is
hidden in the operator which implements inner
products by projecting vectors in into a
suitable space
The regularization parameter weighs the
norm of against the data fitting error.
Larger the value of , the greater the
emphasis on the data fitting error

subject to the constraints
22
Classification In Category Space
Discussion

Associate each mapped vector with a category
vector
Category vectors
Can be basis vectors (axes corresponding to basic
categories) in the category space
Ordinary vectors (corresponding to compound
categories)
In this definition of membership, no distinction
is made between basic and compound categories.
We seek to maximize the margin in the category
space
Minimizing the norm of the mapped vectors is
equivalent to maximizing the margin provided the
inequalities can be satisfied

subject to the constraints
23
Discussion
Integrated Classification and Regression
Objective Function

The design of the objective function is so we can
obtain an integrated dual classification and
regression objective

24
Multi-Category GRBF
Preliminary Results

Gaussian radial basis function (GRBF) classifier
with multiple outputs
One for each basic category
Given a training set of registered and cropped
face images
Labels are White, Black, Asian
GRBF classifier to map the input feature vectors
into the category space
Since we know the label of each training set
pattern we approximate the mapped category space

Solution
25
Experimental Setup
Preliminary Results

45 training images from the Labeled Faces in the
Wild image database
Database contains over 13,000 images that were
captured using the Viola-Jones face detector
Each face has been labeled with the corresponding
name of the person
Of the 5749 people featured in the database
1680 individuals have multiple images with each
image being unique
In the 45 training images, 15 were from each of
the three races considered
45 images registered to one standard image
(after first converting them to grayscale) using
a landmark-based thin-plate spline (TPS)
The landmarks used were
Three(3) for each eye
Two(2) for the nose
Two(2) for the two ears (very approximate since
the ears are often not visible).
After registering the images, they were cropped
and resized to 13090 with the intensity scale
adjusted to 0,1.
Free parameters were set to and
These were carefully but qualitatively chosen to
get a good training set separation in category
space

White Basis y1 1,0,0T Black Basis y2
0,1,0T Asian Basis y3 0,0,1T
26
Race Classification Training Images
Preliminary Results
Training set images Top row Asian, Middle row
Black, Bottom row White
27
Category Space For Training Images
Preliminary Results
Training set images mapped into the category
vector space
28
Race Classification Testing Images
Preliminary Results

Test set images Top row Asian, Middle row
Black, Bottom row White
51 test set images (17 Asian, 16 Black, 18White)
Used the weights discovered by the GRBF
classifier to map the input test set images into
the
category space

29
Category Space Testing Images
Preliminary Results

In the graph above we can see the separation in
the category space

30
Preliminary Results
Pairwise Projection Of Category Space Testing
Images

The pairwise separations in the category space
show an improved visualization
One could in fact draw separating boundaries in
the three pairwise comparisons and obtain an
overall decision boundary in 3D

Pairwise classifications
Roughly separate each pair by drawing lines
through the origin
Removing the orthogonal subspace that is not
being compared against

31
Ambiguity Testing
Preliminary Results

Nine ambiguous (from our perspective) faces
Wanted to exhibit the tolerance of ambiguity that
is a hallmark of category spaces
The conclusion drawn from the result is a
subjective one

Ambiguous faces mapped into the category space.
Note how they cluster together.
32
Experiment With MPEG-7 Database
Preliminary Results
Butterfly
Bat
Bird
33
Experiment With MPEG-7 Database
Preliminary Results
Fly
Chicken
Batbird
34
3 Class Training
Preliminary Results
35
3 Class Testing
Preliminary Results
36
4 Class Training
Preliminary Results
37
4 Class Testing
Preliminary Results
38
Summary

Fundamental contribution is learning of category
spaces from patterns
Ensures ambiguity is handled at a fundamental
level
Compound categories can be directly represented
in the category space
Specific approach integrates regression and
classification (iCAR)
Combines a regression objective function (map the
patterns)
Maximum margin objective function
(perform multicategory classification in
category space)

39
Questions Discussion
Thank You
40
References
1 H. Guo. Diffeomorphic point matching with
applications in medical image analysis. PhD
thesis, University of Florida, Gainesville, FL,
2005. Ph.D. Thesis. 2 J. Zhang. New information
theoretic distance measures and algorithms for
multimodality image registration. PhD thesis,
University of Florida, Gainesville, FL, 2005.
Ph.D. Thesis. 3 A. A. Kumthekar. Affine image
registration using minimum spanning tree
entropies. Masters thesis, University of
Florida, Gainesville, FL, 2004. M. S. Thesis. 4
A. Rajwade, A. Banerjee, and A. Rangarajan. A new
method of probability density estimation
with application to mutual information-based
image registration. In IEEE Computer Vision and
Pattern Recognition (CVPR), volume 2, pages
17691776, 2006. 5 A. Peter and A. Rangarajan.
A new closed form information metric for shape
analysis. In Medical Image Computing and Computer
Assisted Intervention (MICCAI part 1), Springer
LNCS 4190, pages 249256. 2006. 6 A. S. Roy, A.
Gopinath, and A. Rangarajan. Deformable density
matching for 3D non-rigid registration of shapes.
InMedical Image Computing and Computer Assisted
Intervention (MICCAI part 1), Springer LNCS 4791,
pages 942949. 2007. 7 F.Wang, B. Vemuri, and
A. Rangarajan. Groupwise point pattern
registration using a novel CDF-based Jensen
Shannon divergence. In IEEE Computer Vision and
Pattern Recognition (CVPR), volume 1, pages
12831288, 2006. 8 L. Garcin, A. Rangarajan,
and L. Younes. Non-rigid registration of shapes
via diffeomorphic point matching and clustering.
In IEEE Conf. on Image Processing, volume 5,
pages 32993302, 2004. 9 F. Wang, B.C. Vemuri,
A. Rangarajan, I.M. Schmalfuss, and S.J.
Eisenschenk. Simultaneous nonrigid registration
of multiple point sets and atlas construction. In
European Conference on Computer Vision (ECCV),
pages 551563, 2006. 10 H. Guo, A. Rangarajan,
and S. Joshi. 3D diffeomorphic shape registration
on hippocampal datasets. In James S. Duncan and
Guido Gerig, editors, Medical Image Computing and
Computer Assisted Intervention (MICCAI), pages
984991. 2005.
41
References
11 A. Rangarajan, J. Coughlan, and A. L.
Yuille. A Bayesian network framework for
relational shape matching. In IEEE Intl. Conf.
Computer Vision (ICCV), volume 1, pages 671678,
2003. 12 J. Zhang and A. Rangarajan.
Multimodality image registration using an
extensible information metric and high
dimensional histogramming. In Information
Processing in Medical Imaging, pages
725737, 2005. 13 J. Zhang and A. Rangarajan.
Affine image registration using a new information
metric. In IEEE Computer Vision and Pattern
Recognition (CVPR), volume 1, pages 848855,
2004. 14 J. Zhang and A. Rangarajan. A unified
feature based registration method for
multimodality images. In IEEE International
Symposium on Biomedical Imaging (ISBI), pages
724727, 2004. 15 A. Peter and A. Rangarajan.
Shape matching using the Fisher-Rao Riemannian
metric Unifying shape representation and
deformation. In IEEE International Symposium on
Biomedical Imaging (ISBI), pages 11641167,
2006. 16 A. Rajwade, A. Banerjee, and A.
Rangarajan. Continuous image representations
avoid the histogram binning problem in mutual
information-based registration. In IEEE
International Symposium on Biomedical Imaging
(ISBI), pages 840844, 2006. 17 H. Guo, A.
Rangarajan, S. Joshi, and L. Younes. A new joint
clustering and diffeomorphism estimation algorithm
for non-rigid shape matching. In Chandra
Khambametteu, editor, IEEE CVPR Workshop
on Articulated and Non-rigid motion (ANM), pages
1622. 2004. 18 H. Guo, A. Rangarajan, S.
Joshi, and L. Younes. Non-rigid registration of
shapes via diffeomorphic point matching. In IEEE
Intl. Symposium on Biomedical Imaging (ISBI),
volume 1, pages 924927, 2004. 19 H. Guo, A.
Rangarajan, and S. Joshi. Diffeomorphic point
matching. In N. Paragios, Y. Chen, and O.
Faugeras, editors, The Handbook of Mathematical
Models in Computer Vision, pages
205220. 2005. 20 A. Peter and A. Rangarajan.
Maximum likelihood wavelet density estimation
with applications to image and shape matching.
IEEE Trans. Image Processing, 2007. (accepted
subject to minor revision).
42
References
21 F. Wang, B.C. Vemuri, A. Rangarajan, and
S.J. Eisenschenk. Simultaneous nonrigid
registration of multiple point sets and atlas
construction. IEEE Trans. Pattern Analysis and
Machine Intelligence, 2007. (in press). 22 A.
Peter and A. Rangarajan. Information geometry for
landmark shape analysis Unifying shape
representation and deformation. IEEE Trans.
Pattern Analysis and Machine Intelligence, 2007.
(revised and resubmitted). 23 A. Rajwade, A.
Banerjee, and A. Rangarajan. Probability density
estimation using isocontours and isosurfaces
Applications to information theoretic image
registration. IEEE Trans. Pattern Analysis and
Machine Intelligence, 2007. (under
revision). 24 A. Peter and A. Rangarajan. Shape
LAne Rouge Sliding wavelets for indexing and
retrieval. In IEEE Computer Vision and Pattern
Recognition (CVPR), 2008. (submitted). 25 A.
Rajwade, A. Banerjee, and A. Rangarajan.
Newimage-based density estimators for 3D
intermodality image registration. In IEEE
Computer Vision and Pattern Recognition (CVPR),
2008. (submitted). 26 A. Rangarajan and H.
Chui. Applications of optimizing neural networks
in medical image registration. In Artificial
Neural Networks in Medicine and Biology
(ANNIMAB), Perspectives in neural
computing, pages 99104. Springer, 2000. 27 A.
Rangarajan and H. Chui. A mixed variable
optimization approach to non-rigid image
registration. In Discrete Mathematical Problems
with Medical Applications, volume 55 of DIMACS
series in Discrete Mathematics and Computer
Science, pages 105123. American Mathematical
Society, 2000. 28 H. Chui and A. Rangarajan. A
new algorithm for non-rigid point matching. In
Proceedings of IEEE Conf. on Computer Vision and
Pattern RecognitionCVPR 2000, volume 2, pages
4451. IEEE Press, 2000. 29 H. Chui and A.
Rangarajan. A feature registration framework
using mixture models. In IEEEWorkshop on
Mathematical Methods in Biomedical Image Analysis
(MMBIA), pages 190197. IEEE Press, 2000. 30 H.
Chui, L. Win, J. Duncan, R. Schultz, and A.
Rangarajan. A unified feature registration method
for brain mapping. In Information Processing in
Medical Imaging (IPMI), pages 300314. Springer,
2001
43
References
31 A. Rangarajan. Learning matrix space image
representations. In Energy Minimization Methods
for Computer Vision and Pattern Recognition
(EMMCVPR), Lecture Notes in Computer Science,
LNCS 2134, pages 153168. Springer, New York,
2001. 32 A. Rangarajan, H. Chui, and
E.Mjolsness. A relationship between spline-based
deformable models and weighted graphs in
non-rigid matching. In IEEE Conf. on Computer
Vision and Pattern Recognition (CVPR), pages
I897904. IEEE Press, 2001. 33 H. Chui and A.
Rangarajan. Learning an atlas from unlabeled
point-sets. In IEEE Workshop on
Mathematical Methods in Biomedical Image Analysis
(MMBIA), pages 5865. IEEE Press, 2001. 34 H.
Chui and A. Rangarajan. A new joint point
clustering and matching algorithm for estimating
nonrigid deformations. In Intl. Conf. on
Mathematics and Engineering Techniques in
Medicine and Biological Sciences (METMBS), pages
I309315. CSREA Press, 2002. 35 A. Rangarajan
and A. L. Yuille. MIME Mutual information
minimization and entropy maximization for
Bayesian belief propagation. In T. G. Dietterich,
S. Becker, and Z. Ghahramani, editors,
Advances in Neural Information Processing Systems
14, pages 873880, Cambridge, MA, 2002. MIT
Press. 36 A. L. Yuille and A. Rangarajan. The
Concave Convex procedure (CCCP). In T. G.
Dietterich, S. Becker, and Z. Ghahramani,
editors, Advances in Neural Information
Processing Systems 14, pages 10331040, Cambridge,
MA, 2002. MIT Press. 37 H. Chui, L. Win, J.
Duncan, R. Schultz, and A. Rangarajan. A unified
non-rigid feature registration method for brain
mapping. Medical Image Analysis, 7(2)113130,
2003. 38 H. Chui and A. Rangarajan. A new point
matching algorithm for non-rigid registration.
Computer Vision and Image Understanding,
89(2-3)114141, 2003. 39 A. L. Yuille and A.
Rangarajan. The Concave-Convex procedure (CCCP).
Neural Computation, 15915936, 2003. 40 H.
Chui, A. Rangarajan, J. Zhang, and C.M. Leonard.
Unsupervised learning of an atlas from
unlabeled point-sets. IEEE Trans. Pattern
Analysis and Machine Intelligence, 26(2)160172,
2004. 41 P. Gardenfors. Conceptual spaces The
geometry of thought. MIT Press, 2000. 42 J. C.
Platt, N. Cristianini, and J. Shawe-Taylor. Large
margin DAGs for multiclass classification.
In Advances in Neural Information Processing
Systems (NIPS), volume 12, pages 547553. MIT
Press, 2000.
44
References
43 Y. Lee, Y. Lin, and G. Wahba. Multicategory
support vector machines, theory, and application
to the classification of microarray data and
satellite radiance data. Journal of the American
Statistical Association, 996781, 2004. 44
C.-W. Hsu and C.-J. Lin. A comparison of methods
for multiclass support vector machines.
IEEE Trans. Neural Networks, 13(2)415425,
2002. 45 T. Kolb. Music theory for guitarists
Everything you ever wanted to know but were
afraid to ask. Hal Leonard, 2005. 46 K.
Fukunaga. Introduction to Statistical Pattern
Recognition. Academic Press (second edition),
1990. 47 S. Mika, G. Ratsch, and K.-R. Muller.
A mathematical programming approach to the kernel
fisher algorithm. In T. K. Leen, T. G.
Dietterich, and V. Tresp, editors, Advances in
Neural Information Processing Systems 13, pages
591597. MIT Press, 2001. 48 D. Widdows.
Geometry and Meaning. Center for the Study of
Language and Information, 2004. 49 T. Jebara.
Machine Learning Discriminative and Generative.
Kluwer Academic Publishers, 2003. 50 V. Vapnik.
Statistical Learning Theory. Wiley Interscience,
1998. 51 B. Scholkopf, A. Smola, R. C.
Williamson, and P. L. Bartlett. New support
vector algorithms. Neural Computation,
12(5)12071245, 2000. 52 M. E. Tipping. Sparse
Bayesian learning and the relevance vector
machine. Journal of Machine Learning Research,
1211244, 2001. 53 U. Kressel. Pairwise
classification and support vector machines. In
Advances in Kernel Methods - Support Vector
Learning, pages 255268. MIT Press, 1999. 54 C.
M. Bishop. Pattern recognition and machine
learning. Springer, 2006. 55 J. Weston and C.
Watkins. Multi-class support vector machines.
Technical Report CSD-TR-98-04, Department of
Computer Science, Royal Holloway, University of
London, 1998. 56 E. L. Allwein, R. E. Schapire,
and Y. Singer. Reducing multiclass to binary a
unifying approach for margin classifiers. Journal
of Machine Learning Research, 1113141,
2001. 57 J. C. Platt. Fast training of support
vector machines using sequential minimal
optimization. In Advances in Kernel Methods -
Support Vector Learning, pages 185208. MIT
Press, 1999.
45
References
58 L. Kaufman. Solving the quadratic
programming problem arising in support vector
classification. In B. Schölkopf, C. Burges, and
A. Smola, editors, Advances in Kernel Methods -
Support Vector Learning, pages 147168. MIT
Press, 1999. 59 O. L. Mangasarian and D. R.
Musicant. Lagrangian support vector machines.
Journal of Machine Learning Research,
1(3)161177, 2001. 60 G. M. Fung and O. L.
Mangasarian. A feature selection Newton method
for support vector machine classification.
Computational Optimization and Applications,
281852002, 2004. 61 T. Joachims. Making
large-scale SVM learning practical. In B.
Schölkopf, C. Burges, and A. Smola, editors,
Advances in Kernel Methods - Support Vector
Learning, pages 169184. MIT Press, 1999. 62 K.
Crammer and Y. Singer. On the algorithmic
implementation of multiclass kernel-based vector
machine. Journal of Machine Learning Research,
2(2)265292, Springer 2002. 63 J. A. K.
Suykens and J. Vandewalle. Multiclass least
squares support vector machines. In
International Joint Conference on Neural
Networks, volume 2, pages 900903, 1999. 64 T.
Joachims. Training linear SVMs in linear time. In
Proceedings of the 12th ACM SIGKDD
international conference on Knowledge discovery
and data mining, volume 12, pages 217226,
2006. 65 G. B. Huang, M. Ramesh, T. Berg, and
E. Learned-Miller. Labeled faces in the wild A
database for studying face recognition in
unconstrained environments. Technical Report
07-49, University of Massachusetts, Amherst,
October 2007. Available at http//vis-www.cs.umass
.edu/lfw. 66 P. Viola and M. Jones. Rapid
object detection using a boosted cascade of
simple features. In IEEE Computer Vision and
Pattern Recognition (CVPR), volume 1, pages
511518, 2001. 67 G. Wahba. Spline models for
observational data. SIAM, Philadelphia, PA,
1990. 68 F. L. Bookstein. Principal warps
Thin-plate splines and the decomposition of
deformations. IEEE Trans. Patt. Anal. Mach.
Intell., 11(6)567585, June 1989. 69 S.
Ramaswamy, P. Tamayo, R. Rifkin, S. Mukherjee,
C.-H. Yeang, M. Angelo, C. Ladd, M. Reich, E.
Latulippe, J. P. Mesirov, T. Poggio, W. Gerald,
M. Lodadagger, E. S. Lander, and T. R. Golub.
Multiclass cancer diagnosis using tumor gene
expression signatures. Proceedings of the
National Academy of Sciences (PNAS),
98(26)1514915154, 2001.
46
References
70 D. Lowe. Object recognition from local
scale-invariant features. In IEEE International
Conference on Computer Vision (ICCV), volume 2,
pages 11501157, 1999. 71 M. E. Tipping and C.
M. Bishop. Mixtures of probabilistic principal
component analyzers. Neural Computation,
11(2)443482, 1999. 72 M. A. O. Vasilescu and
D. Terzopoulos. Multilinear Image Analysis for
Facial Recognition. In ICPR (2), pages 511514,
2002. 73 X. He, D. Cai, H. Liu, and J. Han.
Image clustering with tensor representation. In
Zhang H., Chua T., Steinmetz R., Kankanhalli M.
S., and Wilcox L., editors, ACM Multimedia, pages
132140. ACM, 2005. 74 J. B. MacQueen. Some
Methods for classification and Analysis of
Multivariate Observations. In Proceedings of 5th
Berkeley Symposium on Mathematical Statistics and
Probability, volume 1, pages 281297. University
of California Press, 1967. 75 D. Titterington,
A. Smith, and U. Makov. Statistical Analysis of
Finite Mixture Distributions. John Wiley Sons,
1985. 76 J. Pearl. Probabilistic Reasoning in
Intelligent Systems Networks of Plausible
Inference. Morgan Kaufmann, September 1988. 77
X. He, D. Cai, and P. Niyogi. Tensor Subspace
Analysis. InWeiss Y., Schölkopf B., and Platt J.,
editors, Advances in Neural Information
Processing Systems 18, pages 499506. MIT Press,
Cambridge, MA, 2006. 78 R. J. Hathaway. Another
interpretation of the EM algorithm for mixture
distributions. Statistics and Probability
Letters, 45356, 1986. 79 R. M. Neal and G. E.
Hinton. A view of the EM algorithm that justifies
incremental, sparse, and other variants. In
Jordan M. I., editor, Learning in Graphical
Models, pages 355370. Kluwer, 1998. 80 A. L.
Yuille and J. J. Kosowsky. Statistical physics
algorithms that converge. Neural
Computation, 6(3)341356, May 1994. 81 A. L.
Yuille, P. Stolorz, and J. Utans. Statistical
physics, mixtures of distributions, and the EM
algorithm. Neural Computation, 6(2)334340,
March 1994.
47
References
82 B. Leibe and B. Schiele. Analyzing
appearance and contour based methods for object
categorization. In IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), volume 2,
pages 409415, Madison, WI, June 2003. 83 G.
Griffin, A. Holub, and P. Perona. CalTech 256
object category dataset. Technical Report
CNS-TR- 2007-001, Calif. Inst. of Tech., 2007.

Write a Comment

User Comments (0)