Title: A New Approach To The Multiclass Classification Problem
1A New Approach To The Multiclass Classification
Problem
2Agenda
- Problem
- Motivation
- Discussion
- Preliminary Results
3Problem
Classification Problem
- Multi-class classification through binary
classification - One-vs-All
- One-vs-One
- Multi-class classification can be constructed
often as a generalization of binary
classification - In practice multi-class classification is done by
combining binary classifiers
4Multiclass Applications Large Category Space
Problem
Object recognition
Automated protein classification
Digit recognition
http//www.glue.umd.edu/zhelin/recog.html
Phoneme recognition
300-600
- The multi-class algorithm computationally
expensive
Waibel, Hanzawa, Hinton,Shikano, Lang 1989
5Problem
Other Multiclass Applications
- Hand-writing recognition (e.g., USPS)
- Text classification
- Face detection
- Face expression recognition
6Problem
Classification Setup
Training and test data drawn i.i.d. from fixed
but unknown probability distribution D
Labeled training set
Question design a classification rule
y f(x) such that, given a new x,
this predicts y with minimal probability of
error
7Support Vector Machines (SVMs)
Problem
- Training examples mapped to
- (usually high-dimensional)
- feature space by a feature
- map F(x) (F1(x), , Fd(x))
- Learn linear decision boundary
- Trade-off between maximizing
- geometric margin of the training
- data and minimizing margin violations
_
_
_
_
_
_
8Problem
Definition Of SVM Classifiers
- Linear classifier defined in feature space by
-
- SVM solution gives
-
-
-
- as a linear combination of support vectors, a
subset of the training vectors
w
_
_
b
_
_
_
_
9Problem
Definition Of A Margin
- History (Vapnik, 1965) if linearly separable
- Place hyerplane far from the data large margin
10Problem
Maximize The Margin
- History (Vapnik, 1965) if linearly separable
- Place hyerplane far from the data large margin
- Large margin classifier
- Leads to good generalization (performance on test
sets)
11Problem
Combining Binary Classifiers
- One-vs-All (OVA)
- For each class build a classifier for that class
vs the rest - Constructs k SVM models
- Often very imbalanced classifiers
- Asymmetry in the amount of training data
- Earliest implementation for SVM multiclass
- One-vs-One (OVO)
- Constructs k(k-1)/2 classifiers
- Rooted binary SVMs with k leaves
- Traverse tree to reach leaf node
12Example 1
Motivation
- Race categories White, Black, Asian
- Task Map the image training set to the race
labels - The training (learning)
- Test (generalization)
- Scenario Ambiguous test image is presented
- Mixed race person
- A person drawn from a race which is not
represented by the system (i.e. Hispanics, Native
Americans, etc) - No way of assigning a mixed label
- The system cannot represent the mixed race person
using a combination of categories - No way of representing unknown race
- Possible Solution
- Indicate that the incoming image is outside the
margin of each learned category
13Example 2
Motivation
- Musical samples generated by a single instrument
- Electric guitara set of note categories
C,C,D,D, etc. - Task Map the training set musical notes to the
labels - Reasonable learning and generalization properties
- Scenario Given musical sequences
- Intervals (two notes simultaneously struck such
as C,F ) - Chords (containing three or more notes)
- Ambiguity at the training set level
- Forced to assign new labels to intervals and
chords even though they contain the same
featuressingle notesas the note categories - Music sequence case, if we learned a conditional
probability distribution p(L lx) - x is a music sequence and L C,C,D, ,B
is a set of note labels - When x is an intervalsay a tritone
- No way of assigning high probability to the
tritone - Possible Solution
- Accommodate the tritone by assigning it a new
label - Large label space
- Truncate because of exponential size
considerations
14Problems With Combining Binary Classifiers
Motivation
- Categories are conceived as nominal labels
- No underlying geometry for the categories
- Inability of the conditional distribution to give
us a measure (value) for interpolated categories - Non-represented interpolated categories are left
out - Not easy to distinguish basic categories from
compound categories
15Category Vector Spaces Solution
Motivation
- Invoke the notion of a category vector space
- Categories are defined with a geometric structure
- Assume that the set of categories(labels) forms a
vector space - Music sequence would correspond to a label
in a twelve dimensional vector
C,C,D,D,E,F,F,G,G,A,A,B - Basic note C,C,D etc. would have its own
coordinate axis (vector space) - Learning problem
- Map the training set music sequences to vectors
in a 12 dimensional space such that the training
and test set errors are small - Map the training musical sequences to the 12
dimensional vector space and then (if a support
vector machine approach is used), maximize the
margin of the mapped vectors in the category
space - Race classification example is analogous
- Depends on how many races we wish to explicitly
represent - Map the training set to race category vector
space and maximize the margin - Generalization problem
- Map a test set musical sequence or image into the
category space and then ask if it lies within the
margin of a note (or chord) or race category
Note Extensions to other multi-category learning
applications are straightforward assuming we can
map category labels to coordinate.
16Multiclass Fisher Related Idea
Discussion
Given the feature vectors
D categories and a projected set of features
defined by the MC-FLD maximizes
where
Solution The columns of W are the top D
eigenvectors (corresponding to the largest
eigenvalues) of
- Eigenvectors are orthonormal
- Columns of W constitute a category vector space
- Interpret as a category space projection
- Optimal solution is a set of orthogonal weight
vectors
17Disadvantage Of Multiclass Fisher
Discussion
- Avoided this approach since margins are not
maximized in category space - We have not seen a classifier take a three class
problem with labels 0,1,2, map the input
features into a vector space - Basis vectors , and
- Attempt to maximize the margin in the category
vector space - Not seen any previous work where a pattern from a
compound categorysay a combination of labels 1
and 2is also used in training with a conversion
of the compound category to a vector
18Description of Category Vector Spaces
Discussion
- Input feature vectors are mapped to the category
vector space using a kernel-based approach - In the category vector space, maximizing the
margin is equivalent to forming hypercones - Mapped feature vectors that lie inside the
hypercone have a distinct class label - Mapped vectors that lie in between hypercones are
ambiguous - Hypercones are not allowed to intersect
Depicts basic categories
19Advantages Of Category Vector Space
Discussion
- Each pattern now exists as a linear superposition
of category vectors in the category space. - Ensures ambiguity is handled at a fundamental
level - Compound categories can be directly represented
in the category space - Can maximize the compound category margin as well
as the margins for the basic categories
20Technical Challenges
Discussion
- Regression
- Each input training set feature vector
must be mapped to a corresponding point
where M is the number of feature dimensions
and D the cardinality of the basic categories - Classification
- Each mapped feature vector must maximize its
margin relative to its own category vector
against the other category vectors Here
is known and corresponds to a category vector
21Regression In Category Space
Discussion
- controls the width of the interval for which
there is no penalty - Slack variable vectors are non-negative
component-wise - Weight vector and bias help map the feature
vector to its counterpart. - The choice of kernel K (GRBF or otherwise) is
hidden in the operator which implements inner
products by projecting vectors in into a
suitable space - The regularization parameter weighs the
norm of against the data fitting error.
Larger the value of , the greater the
emphasis on the data fitting error
subject to the constraints
22Classification In Category Space
Discussion
- Associate each mapped vector with a category
vector - Category vectors
- Can be basis vectors (axes corresponding to basic
categories) in the category space - Ordinary vectors (corresponding to compound
categories) - In this definition of membership, no distinction
is made between basic and compound categories. - We seek to maximize the margin in the category
space - Minimizing the norm of the mapped vectors is
equivalent to maximizing the margin provided the
inequalities can be satisfied
subject to the constraints
23Discussion
Integrated Classification and Regression
Objective Function
- The design of the objective function is so we can
obtain an integrated dual classification and
regression objective
24Multi-Category GRBF
Preliminary Results
- Gaussian radial basis function (GRBF) classifier
with multiple outputs - One for each basic category
- Given a training set of registered and cropped
face images - Labels are White, Black, Asian
- GRBF classifier to map the input feature vectors
into the category space - Since we know the label of each training set
pattern we approximate the mapped category space
Solution
25Experimental Setup
Preliminary Results
- 45 training images from the Labeled Faces in the
Wild image database - Database contains over 13,000 images that were
captured using the Viola-Jones face detector - Each face has been labeled with the corresponding
name of the person - Of the 5749 people featured in the database
- 1680 individuals have multiple images with each
image being unique - In the 45 training images, 15 were from each of
the three races considered - 45 images registered to one standard image
(after first converting them to grayscale) using
a landmark-based thin-plate spline (TPS) - The landmarks used were
- Three(3) for each eye
- Two(2) for the nose
- Two(2) for the two ears (very approximate since
the ears are often not visible). - After registering the images, they were cropped
and resized to 13090 with the intensity scale
adjusted to 0,1. - Free parameters were set to and
These were carefully but qualitatively chosen to
get a good training set separation in category
space
White Basis y1 1,0,0T Black Basis y2
0,1,0T Asian Basis y3 0,0,1T
26Race Classification Training Images
Preliminary Results
Training set images Top row Asian, Middle row
Black, Bottom row White
27Category Space For Training Images
Preliminary Results
Training set images mapped into the category
vector space
28Race Classification Testing Images
Preliminary Results
- Test set images Top row Asian, Middle row
Black, Bottom row White - 51 test set images (17 Asian, 16 Black, 18White)
- Used the weights discovered by the GRBF
classifier to map the input test set images into
the - category space
29Category Space Testing Images
Preliminary Results
- In the graph above we can see the separation in
the category space
30Preliminary Results
Pairwise Projection Of Category Space Testing
Images
- The pairwise separations in the category space
show an improved visualization - One could in fact draw separating boundaries in
the three pairwise comparisons and obtain an
overall decision boundary in 3D
- Pairwise classifications
- Roughly separate each pair by drawing lines
through the origin - Removing the orthogonal subspace that is not
being compared against
31Ambiguity Testing
Preliminary Results
- Nine ambiguous (from our perspective) faces
- Wanted to exhibit the tolerance of ambiguity that
is a hallmark of category spaces - The conclusion drawn from the result is a
subjective one
Ambiguous faces mapped into the category space.
Note how they cluster together.
32Experiment With MPEG-7 Database
Preliminary Results
Butterfly
Bat
Bird
33Experiment With MPEG-7 Database
Preliminary Results
Fly
Chicken
Batbird
343 Class Training
Preliminary Results
353 Class Testing
Preliminary Results
364 Class Training
Preliminary Results
374 Class Testing
Preliminary Results
38Summary
- Fundamental contribution is learning of category
spaces from patterns - Ensures ambiguity is handled at a fundamental
level - Compound categories can be directly represented
in the category space - Specific approach integrates regression and
classification (iCAR) - Combines a regression objective function (map the
patterns) - Maximum margin objective function
- (perform multicategory classification in
category space)
39Questions Discussion
Thank You
40References
1 H. Guo. Diffeomorphic point matching with
applications in medical image analysis. PhD
thesis, University of Florida, Gainesville, FL,
2005. Ph.D. Thesis. 2 J. Zhang. New information
theoretic distance measures and algorithms for
multimodality image registration. PhD thesis,
University of Florida, Gainesville, FL, 2005.
Ph.D. Thesis. 3 A. A. Kumthekar. Affine image
registration using minimum spanning tree
entropies. Masters thesis, University of
Florida, Gainesville, FL, 2004. M. S. Thesis. 4
A. Rajwade, A. Banerjee, and A. Rangarajan. A new
method of probability density estimation
with application to mutual information-based
image registration. In IEEE Computer Vision and
Pattern Recognition (CVPR), volume 2, pages
17691776, 2006. 5 A. Peter and A. Rangarajan.
A new closed form information metric for shape
analysis. In Medical Image Computing and Computer
Assisted Intervention (MICCAI part 1), Springer
LNCS 4190, pages 249256. 2006. 6 A. S. Roy, A.
Gopinath, and A. Rangarajan. Deformable density
matching for 3D non-rigid registration of shapes.
InMedical Image Computing and Computer Assisted
Intervention (MICCAI part 1), Springer LNCS 4791,
pages 942949. 2007. 7 F.Wang, B. Vemuri, and
A. Rangarajan. Groupwise point pattern
registration using a novel CDF-based Jensen
Shannon divergence. In IEEE Computer Vision and
Pattern Recognition (CVPR), volume 1, pages
12831288, 2006. 8 L. Garcin, A. Rangarajan,
and L. Younes. Non-rigid registration of shapes
via diffeomorphic point matching and clustering.
In IEEE Conf. on Image Processing, volume 5,
pages 32993302, 2004. 9 F. Wang, B.C. Vemuri,
A. Rangarajan, I.M. Schmalfuss, and S.J.
Eisenschenk. Simultaneous nonrigid registration
of multiple point sets and atlas construction. In
European Conference on Computer Vision (ECCV),
pages 551563, 2006. 10 H. Guo, A. Rangarajan,
and S. Joshi. 3D diffeomorphic shape registration
on hippocampal datasets. In James S. Duncan and
Guido Gerig, editors, Medical Image Computing and
Computer Assisted Intervention (MICCAI), pages
984991. 2005.
41References
11 A. Rangarajan, J. Coughlan, and A. L.
Yuille. A Bayesian network framework for
relational shape matching. In IEEE Intl. Conf.
Computer Vision (ICCV), volume 1, pages 671678,
2003. 12 J. Zhang and A. Rangarajan.
Multimodality image registration using an
extensible information metric and high
dimensional histogramming. In Information
Processing in Medical Imaging, pages
725737, 2005. 13 J. Zhang and A. Rangarajan.
Affine image registration using a new information
metric. In IEEE Computer Vision and Pattern
Recognition (CVPR), volume 1, pages 848855,
2004. 14 J. Zhang and A. Rangarajan. A unified
feature based registration method for
multimodality images. In IEEE International
Symposium on Biomedical Imaging (ISBI), pages
724727, 2004. 15 A. Peter and A. Rangarajan.
Shape matching using the Fisher-Rao Riemannian
metric Unifying shape representation and
deformation. In IEEE International Symposium on
Biomedical Imaging (ISBI), pages 11641167,
2006. 16 A. Rajwade, A. Banerjee, and A.
Rangarajan. Continuous image representations
avoid the histogram binning problem in mutual
information-based registration. In IEEE
International Symposium on Biomedical Imaging
(ISBI), pages 840844, 2006. 17 H. Guo, A.
Rangarajan, S. Joshi, and L. Younes. A new joint
clustering and diffeomorphism estimation algorithm
for non-rigid shape matching. In Chandra
Khambametteu, editor, IEEE CVPR Workshop
on Articulated and Non-rigid motion (ANM), pages
1622. 2004. 18 H. Guo, A. Rangarajan, S.
Joshi, and L. Younes. Non-rigid registration of
shapes via diffeomorphic point matching. In IEEE
Intl. Symposium on Biomedical Imaging (ISBI),
volume 1, pages 924927, 2004. 19 H. Guo, A.
Rangarajan, and S. Joshi. Diffeomorphic point
matching. In N. Paragios, Y. Chen, and O.
Faugeras, editors, The Handbook of Mathematical
Models in Computer Vision, pages
205220. 2005. 20 A. Peter and A. Rangarajan.
Maximum likelihood wavelet density estimation
with applications to image and shape matching.
IEEE Trans. Image Processing, 2007. (accepted
subject to minor revision).
42References
21 F. Wang, B.C. Vemuri, A. Rangarajan, and
S.J. Eisenschenk. Simultaneous nonrigid
registration of multiple point sets and atlas
construction. IEEE Trans. Pattern Analysis and
Machine Intelligence, 2007. (in press). 22 A.
Peter and A. Rangarajan. Information geometry for
landmark shape analysis Unifying shape
representation and deformation. IEEE Trans.
Pattern Analysis and Machine Intelligence, 2007.
(revised and resubmitted). 23 A. Rajwade, A.
Banerjee, and A. Rangarajan. Probability density
estimation using isocontours and isosurfaces
Applications to information theoretic image
registration. IEEE Trans. Pattern Analysis and
Machine Intelligence, 2007. (under
revision). 24 A. Peter and A. Rangarajan. Shape
LAne Rouge Sliding wavelets for indexing and
retrieval. In IEEE Computer Vision and Pattern
Recognition (CVPR), 2008. (submitted). 25 A.
Rajwade, A. Banerjee, and A. Rangarajan.
Newimage-based density estimators for 3D
intermodality image registration. In IEEE
Computer Vision and Pattern Recognition (CVPR),
2008. (submitted). 26 A. Rangarajan and H.
Chui. Applications of optimizing neural networks
in medical image registration. In Artificial
Neural Networks in Medicine and Biology
(ANNIMAB), Perspectives in neural
computing, pages 99104. Springer, 2000. 27 A.
Rangarajan and H. Chui. A mixed variable
optimization approach to non-rigid image
registration. In Discrete Mathematical Problems
with Medical Applications, volume 55 of DIMACS
series in Discrete Mathematics and Computer
Science, pages 105123. American Mathematical
Society, 2000. 28 H. Chui and A. Rangarajan. A
new algorithm for non-rigid point matching. In
Proceedings of IEEE Conf. on Computer Vision and
Pattern RecognitionCVPR 2000, volume 2, pages
4451. IEEE Press, 2000. 29 H. Chui and A.
Rangarajan. A feature registration framework
using mixture models. In IEEEWorkshop on
Mathematical Methods in Biomedical Image Analysis
(MMBIA), pages 190197. IEEE Press, 2000. 30 H.
Chui, L. Win, J. Duncan, R. Schultz, and A.
Rangarajan. A unified feature registration method
for brain mapping. In Information Processing in
Medical Imaging (IPMI), pages 300314. Springer,
2001
43References
31 A. Rangarajan. Learning matrix space image
representations. In Energy Minimization Methods
for Computer Vision and Pattern Recognition
(EMMCVPR), Lecture Notes in Computer Science,
LNCS 2134, pages 153168. Springer, New York,
2001. 32 A. Rangarajan, H. Chui, and
E.Mjolsness. A relationship between spline-based
deformable models and weighted graphs in
non-rigid matching. In IEEE Conf. on Computer
Vision and Pattern Recognition (CVPR), pages
I897904. IEEE Press, 2001. 33 H. Chui and A.
Rangarajan. Learning an atlas from unlabeled
point-sets. In IEEE Workshop on
Mathematical Methods in Biomedical Image Analysis
(MMBIA), pages 5865. IEEE Press, 2001. 34 H.
Chui and A. Rangarajan. A new joint point
clustering and matching algorithm for estimating
nonrigid deformations. In Intl. Conf. on
Mathematics and Engineering Techniques in
Medicine and Biological Sciences (METMBS), pages
I309315. CSREA Press, 2002. 35 A. Rangarajan
and A. L. Yuille. MIME Mutual information
minimization and entropy maximization for
Bayesian belief propagation. In T. G. Dietterich,
S. Becker, and Z. Ghahramani, editors,
Advances in Neural Information Processing Systems
14, pages 873880, Cambridge, MA, 2002. MIT
Press. 36 A. L. Yuille and A. Rangarajan. The
Concave Convex procedure (CCCP). In T. G.
Dietterich, S. Becker, and Z. Ghahramani,
editors, Advances in Neural Information
Processing Systems 14, pages 10331040, Cambridge,
MA, 2002. MIT Press. 37 H. Chui, L. Win, J.
Duncan, R. Schultz, and A. Rangarajan. A unified
non-rigid feature registration method for brain
mapping. Medical Image Analysis, 7(2)113130,
2003. 38 H. Chui and A. Rangarajan. A new point
matching algorithm for non-rigid registration.
Computer Vision and Image Understanding,
89(2-3)114141, 2003. 39 A. L. Yuille and A.
Rangarajan. The Concave-Convex procedure (CCCP).
Neural Computation, 15915936, 2003. 40 H.
Chui, A. Rangarajan, J. Zhang, and C.M. Leonard.
Unsupervised learning of an atlas from
unlabeled point-sets. IEEE Trans. Pattern
Analysis and Machine Intelligence, 26(2)160172,
2004. 41 P. Gardenfors. Conceptual spaces The
geometry of thought. MIT Press, 2000. 42 J. C.
Platt, N. Cristianini, and J. Shawe-Taylor. Large
margin DAGs for multiclass classification.
In Advances in Neural Information Processing
Systems (NIPS), volume 12, pages 547553. MIT
Press, 2000.
44References
43 Y. Lee, Y. Lin, and G. Wahba. Multicategory
support vector machines, theory, and application
to the classification of microarray data and
satellite radiance data. Journal of the American
Statistical Association, 996781, 2004. 44
C.-W. Hsu and C.-J. Lin. A comparison of methods
for multiclass support vector machines.
IEEE Trans. Neural Networks, 13(2)415425,
2002. 45 T. Kolb. Music theory for guitarists
Everything you ever wanted to know but were
afraid to ask. Hal Leonard, 2005. 46 K.
Fukunaga. Introduction to Statistical Pattern
Recognition. Academic Press (second edition),
1990. 47 S. Mika, G. Ratsch, and K.-R. Muller.
A mathematical programming approach to the kernel
fisher algorithm. In T. K. Leen, T. G.
Dietterich, and V. Tresp, editors, Advances in
Neural Information Processing Systems 13, pages
591597. MIT Press, 2001. 48 D. Widdows.
Geometry and Meaning. Center for the Study of
Language and Information, 2004. 49 T. Jebara.
Machine Learning Discriminative and Generative.
Kluwer Academic Publishers, 2003. 50 V. Vapnik.
Statistical Learning Theory. Wiley Interscience,
1998. 51 B. Scholkopf, A. Smola, R. C.
Williamson, and P. L. Bartlett. New support
vector algorithms. Neural Computation,
12(5)12071245, 2000. 52 M. E. Tipping. Sparse
Bayesian learning and the relevance vector
machine. Journal of Machine Learning Research,
1211244, 2001. 53 U. Kressel. Pairwise
classification and support vector machines. In
Advances in Kernel Methods - Support Vector
Learning, pages 255268. MIT Press, 1999. 54 C.
M. Bishop. Pattern recognition and machine
learning. Springer, 2006. 55 J. Weston and C.
Watkins. Multi-class support vector machines.
Technical Report CSD-TR-98-04, Department of
Computer Science, Royal Holloway, University of
London, 1998. 56 E. L. Allwein, R. E. Schapire,
and Y. Singer. Reducing multiclass to binary a
unifying approach for margin classifiers. Journal
of Machine Learning Research, 1113141,
2001. 57 J. C. Platt. Fast training of support
vector machines using sequential minimal
optimization. In Advances in Kernel Methods -
Support Vector Learning, pages 185208. MIT
Press, 1999.
45References
58 L. Kaufman. Solving the quadratic
programming problem arising in support vector
classification. In B. Schölkopf, C. Burges, and
A. Smola, editors, Advances in Kernel Methods -
Support Vector Learning, pages 147168. MIT
Press, 1999. 59 O. L. Mangasarian and D. R.
Musicant. Lagrangian support vector machines.
Journal of Machine Learning Research,
1(3)161177, 2001. 60 G. M. Fung and O. L.
Mangasarian. A feature selection Newton method
for support vector machine classification.
Computational Optimization and Applications,
281852002, 2004. 61 T. Joachims. Making
large-scale SVM learning practical. In B.
Schölkopf, C. Burges, and A. Smola, editors,
Advances in Kernel Methods - Support Vector
Learning, pages 169184. MIT Press, 1999. 62 K.
Crammer and Y. Singer. On the algorithmic
implementation of multiclass kernel-based vector
machine. Journal of Machine Learning Research,
2(2)265292, Springer 2002. 63 J. A. K.
Suykens and J. Vandewalle. Multiclass least
squares support vector machines. In
International Joint Conference on Neural
Networks, volume 2, pages 900903, 1999. 64 T.
Joachims. Training linear SVMs in linear time. In
Proceedings of the 12th ACM SIGKDD
international conference on Knowledge discovery
and data mining, volume 12, pages 217226,
2006. 65 G. B. Huang, M. Ramesh, T. Berg, and
E. Learned-Miller. Labeled faces in the wild A
database for studying face recognition in
unconstrained environments. Technical Report
07-49, University of Massachusetts, Amherst,
October 2007. Available at http//vis-www.cs.umass
.edu/lfw. 66 P. Viola and M. Jones. Rapid
object detection using a boosted cascade of
simple features. In IEEE Computer Vision and
Pattern Recognition (CVPR), volume 1, pages
511518, 2001. 67 G. Wahba. Spline models for
observational data. SIAM, Philadelphia, PA,
1990. 68 F. L. Bookstein. Principal warps
Thin-plate splines and the decomposition of
deformations. IEEE Trans. Patt. Anal. Mach.
Intell., 11(6)567585, June 1989. 69 S.
Ramaswamy, P. Tamayo, R. Rifkin, S. Mukherjee,
C.-H. Yeang, M. Angelo, C. Ladd, M. Reich, E.
Latulippe, J. P. Mesirov, T. Poggio, W. Gerald,
M. Lodadagger, E. S. Lander, and T. R. Golub.
Multiclass cancer diagnosis using tumor gene
expression signatures. Proceedings of the
National Academy of Sciences (PNAS),
98(26)1514915154, 2001.
46References
70 D. Lowe. Object recognition from local
scale-invariant features. In IEEE International
Conference on Computer Vision (ICCV), volume 2,
pages 11501157, 1999. 71 M. E. Tipping and C.
M. Bishop. Mixtures of probabilistic principal
component analyzers. Neural Computation,
11(2)443482, 1999. 72 M. A. O. Vasilescu and
D. Terzopoulos. Multilinear Image Analysis for
Facial Recognition. In ICPR (2), pages 511514,
2002. 73 X. He, D. Cai, H. Liu, and J. Han.
Image clustering with tensor representation. In
Zhang H., Chua T., Steinmetz R., Kankanhalli M.
S., and Wilcox L., editors, ACM Multimedia, pages
132140. ACM, 2005. 74 J. B. MacQueen. Some
Methods for classification and Analysis of
Multivariate Observations. In Proceedings of 5th
Berkeley Symposium on Mathematical Statistics and
Probability, volume 1, pages 281297. University
of California Press, 1967. 75 D. Titterington,
A. Smith, and U. Makov. Statistical Analysis of
Finite Mixture Distributions. John Wiley Sons,
1985. 76 J. Pearl. Probabilistic Reasoning in
Intelligent Systems Networks of Plausible
Inference. Morgan Kaufmann, September 1988. 77
X. He, D. Cai, and P. Niyogi. Tensor Subspace
Analysis. InWeiss Y., Schölkopf B., and Platt J.,
editors, Advances in Neural Information
Processing Systems 18, pages 499506. MIT Press,
Cambridge, MA, 2006. 78 R. J. Hathaway. Another
interpretation of the EM algorithm for mixture
distributions. Statistics and Probability
Letters, 45356, 1986. 79 R. M. Neal and G. E.
Hinton. A view of the EM algorithm that justifies
incremental, sparse, and other variants. In
Jordan M. I., editor, Learning in Graphical
Models, pages 355370. Kluwer, 1998. 80 A. L.
Yuille and J. J. Kosowsky. Statistical physics
algorithms that converge. Neural
Computation, 6(3)341356, May 1994. 81 A. L.
Yuille, P. Stolorz, and J. Utans. Statistical
physics, mixtures of distributions, and the EM
algorithm. Neural Computation, 6(2)334340,
March 1994.
47References
82 B. Leibe and B. Schiele. Analyzing
appearance and contour based methods for object
categorization. In IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), volume 2,
pages 409415, Madison, WI, June 2003. 83 G.
Griffin, A. Holub, and P. Perona. CalTech 256
object category dataset. Technical Report
CNS-TR- 2007-001, Calif. Inst. of Tech., 2007.