Title: IMM Publikationsdatabase
1Archetypal Analysis for Machine Learning
Morten Mørup DTU Informatics Cognitive Systems
Group Technical University of Denmark
Joint work with Lars Kai Hansen DTU
Informatics Cognitive Systems Group Technical
University of Denmark
2(No Transcript)
3Archetypical Analysis (AA)
X
?
X
C
S
AA formed by two simplex constraints Archetype
Xck formed by convex combination of the data
points Projection sn gives the convex
combination of archetypes forming each data point
4The Original paper of Adler and Breiman
considered 3 applications
Swiss army head shape
Los Angeles Basin air polution 1976
Tokamak Fusion Data
Other Applications Flame dynamics (Stone Adler
1996) End member extraction of Galaxy Spectra
(Chan et al, 2003) Data driven Benchmarking
(Porzio et al. 2008)
5Archetypical analysis extract the principal
convex hull (PCH) of the data cloud
Convex hull Blue lines and light shaded region
(dots indicate points in convex set) Dominant
convex hull green lines and gray shaded region
(dots indicate archetypes)
(Dwyer, 1988)
While convex set can be identified in linear time
O(N) (McCallum Avis 1979)finding C and S is a
non-convex (NP hard) problem.
NB One might think that AA is highy driven by
outliers, however, outliers are only relevant
if they reflect representative dynamics in the
data!
6Our (new) mathematical results
1 The AA/PCH model is in general unique!
See Theorem 1
2 The AA/PCH model can be efficiently
initialized by the proposed FurthestSum algorithm
3 The AA/PCH model parameters can be efficiently
optimized by normalization invariant projected
gradient
For details on derivation of updates and their
computational complexity see section 2.3
The proposed FurthestSum algorithmguarantee
extraction of points in the convex set, see
Theorem 2
Large scale Applications
7Our Machine Learning Applications
- Computer vision
- NeuroImaging
- TextMining
- Collaborative Filtering
8Computer Vision CBCL face database
Face database K361 pixels, N2429 ? all images
belong with probabilty 1 to convex set
?
X
X
C
S
SVD/PCA Low -gt high freq. dynamicsNMF Part
Based RepresentationAA Archetypes/FreaksK-means
Centroids/Prototypes
9Archetypal Analysis naturally bridges clustering
methods with low rank representations
10NeuroImaging Positron Emission Tomography
?
X
X
C
S
Altansering tracer injected, recorded signal in
theory mixture of 3 underlying binding profiles
(Archetypes) Low binding regions, High binding
regions and artery/veines. Each voxel a given
concentration fraction of these tissue types.
XC
S
Low Binding
High Binding
Artery/Veines
11Text Mining NIPS term-document (bag of words)
X
C
S
X
?
XC
Distinct Aspects
Prototypical Aspects
12Collaborative filtering MovieLens
Medium size and large size Movie lens data
(www.grouplens.org) Medium size 1,000,209
ratings of 3,952 movies by 6,040 users Large
size 10,000,054 ratings of 10,677 movies given
by 71,567
Extracts features representing distinct user
types, each user represented as a given
concentration fraction of the user types. AA
appear to have less tendency to overfit.
13Conclusion
- Archetypal Analysis is Unique in general (Theorem
1) - Archetypal Analysis can be efficiently
initialized by the proposed FurhtestSum algorithm
(Theorem 2) and optimized through normalization
invariant projected gradient. - Archetypal Analysis naturally bridges clustering
with low rank approximations - Archetypal Analysis results in easy interpretable
features that are closely related to the actual
data - Archetypal Analysis useful for a large variety of
machine learning problem domains within
unsupervised learning.(Computer Vision,
NeuroImaging, TextMining, Collaborative
Filtering) - Archetypal Analysis can be extended to kernel
representations finding the principal convex hull
in (a potentially infinite) Hilbert space (see
section 2.4 of the paper).
14Open problems and current research directions
- What is the optimal number of components?Cross-va
lidation based on missing value prediction (see
also collaborative filtering example in the
paper)Bayesian generative models for AA/PCH that
automatically penalize model complexity. - What if pure archetypes cannot be well
represented by the data available?
vs.
15Selected References from the paper
1 Adele Cutler and Leo Breiman, Archetypal
analysis, Technometrics, vol. 36, no. 4, pp.
338347, Nov 1994. 2 D. S. Hochbaum and D. B.
Shmoys., A best possible heuristic or the
k-center problem., Mathematics of Operational
Research, vol. 10, no. 2, pp. 180184, 1985. 7
Emily Stone and Adele Cutler, Introduction to
archetypal analysis of spatio-temporal dynamics,
Phys. D, vol. 96, no.1-4, pp. 110131, 1996.8
Giovanni C. Porzio, Giancarlo Ragozini, and
Domenico Vistocco, On the use of archetypes as
benchmarks, Appl. Stoch. Model. Bus. Ind., vol.
24, no. 5, pp. 419437, 2008. 9 B. H. P. Chan,
D. A. Mitchell, and L. E. Cram, Archetypal
analysis of galaxy spectra, MON.NOT.ROY.ASTRON.SO
C., vol. 338, pp. 790, 2003. 11 D. McCallum
and D. Avis, A linear algorithm for finding the
convex hull of a simple polygon, Information
Processing Letters, vol. 9, pp. 201206,
1979. 12 Rex A. Dwyer, On the convex hull of
random points in a polytope, Journal of Applied
Probability, vol. 25, no. 4, pp.688699, 1988.