Title: Efficient Bayesian Algorithms for Clustering
1Efficient Bayesian Algorithms for Clustering
- Katherine Heller
- Gatsby Computational Neuroscience Unit
- Women In Machine Learning Workshop
- San Diego, CA
2What Is Clustering?
- Imagine we have data in some feature space
- One of the most important goals of unsupervised
learning is to discover meaningful clusters in
data. - There are many clustering methods spectral,
hierarchical, k-means, mixture modeling, etc. - We take a model-based Bayesian approach to
defining a cluster and evaluate cluster
membership in this paradigm.
3Marginal Likelihoods
- We use marginal likelihoods to evaluate cluster
membership - The marginal likelihood is defined as
- and can be interpreted as
- the probability that all data
- points in were generated
- from the same model with
- unknown parameters
- Used to compare cluster
- models
4Outline
- Introduction
- Clustering
- Marginal Likelihoods
- Bayesian Hierarchical Clustering
- Clustering on Demand
- Bayesian Sets
- Content Based Image Retrieval
- Conclusions
5Traditional Hierarchical Clustering
- As in Duda and Hart (1973)
- Many distance metrics are possible
6Limitations of Traditional Hierarchical
Clustering Algorithms
- How many clusters should there be?
- It is hard to choose a distance metric
- They do not define a probabilistic model of the
data, so they cannot - Predict the probability or cluster assignment of
new data points - Be compared to or combined with other
probabilistic models - Our Goal To overcome these limitations by
defining a novel statistical approach to
hierarchical clustering
7Bayesian Hierarchical Clustering Building the
Tree
- The algorithm is virtually identical to
traditional hierarchical clustering except that
instead of distance it uses marginal likelihood
to decide on merges. - For each potential merge
it compares two hypotheses - all data in came from one cluster
-
- data in came from some other
clustering - consistent with the subtrees
- Prior
- Posterior probability of merged hypothesis
- Probability of data given the tree
Heller and Ghahramani ICML 2005
8Comparison
Bayesian Hierarchical Clustering
Traditional Hierarchical Clustering
9Summary of BHC
- We developed a Bayesian Hierarchical Clustering
algorithm which - Is simple, deterministic and fast (no MCMC,
one-pass, etc.) - Can take as input any simple probabilistic model
p(xq) and gives as output a mixture of these
models - Suggests where to cut the tree and how many
clusters there are in the data - Gives more reasonable results than traditional
hierarchical clustering algorithms - This algorithm also
- Recursively computes an approximation to the
marginal likelihood of a Dirichlet Process
Mixture - which can be easily turned into a new lower
bound
10Results a Toy Example
11Results a Toy Example
12Predicting New Data Points
134 Newsgroups Results
800 examples, 50 attributes rec.sport.baseball,
rec.sports.hockey, rec.autos, sci.space
14Newsgroups Average Linkage HC
15Newsgroups Bayesian HC
16Clustering On Demand
Assume a universe of objects ( )
.
17Clustering On Demand
18Bayesian Sets Approach
- Rank each object in by how well it would
fit into a set which includes (i.e. how
relevant it is to the query) - Use a Bayesian (model-based probabilistic)
relevance criterion - Limit output to the top few items
Ghahramani and Heller NIPS 2005
19Bayesian Sets Criterion
We can write this score as
20Bayesian Sets Criterion
This has a nice intuitive interpretation
21Bayesian Sets Criterion
22Sparse Binary Data
E.g
If we use a multivariate Bernoulli model
With conjugate Beta prior
where
and
23Results EachMovie
1813 people by 1532 movies
24A Bayesian Content-Based Image Retrieval System
- We can use the Bayesian Sets method as the basis
of a content-based image retrieval system
25The Image Retrieval Prototype System
- The Algorithm
- Input query word wpenguins
- Find all training images with label w
- Take the binary feature vectors for these
training images as query set and use Bayesian
Sets algorithm - For each image, x, in the unlabelled test set,
we compute score(x) which measures the
probability that x belongs in the set of images
with the label w. - Return the images with the highest score
- The algorithm is very fast
- about 0.2 sec on this laptop to query 22,000
test images
Heller and Ghahramani CVPR 2006
26Example Queries
Query Desert
Query Pet
Query Sign
Query Building
Query Penguins
Query Eiffel
27Example Training Images for Desert
28Results for Image Retrieval
NNall - nearest neighbors to any member of the
query set Nnmean - nearest neighbors to the mean
of the query set BO - Behold Search online,
www.beholdsearch.com A Yavlinsky, E
Schofield and S Rüger (CIVR, 2005)
29Future Work
- Exploring bounds on the marginal likelihood of a
DPM - How tight is the bound?
- Improved structures for combinatorial
approximation - Since the score is probabilistic it should be
possible to find a principled threshold for
number of items in the returned set - Automated Analogical Reasoning with Relational
Data - Image Annotation
- Incorporate relevance feedback
30Conclusions
- Presented work on Bayesian hierarchical
clustering, information retrieval from sets of
items, and image retrieval, all based on
computing marginal likelihoods. - There are many interesting directions in which to
take this work
31Acknowledgements
- Collaborators
- Zoubin Ghahramani
- Ricardo Silva
- Venkat Ramesh
- Thanks to
- David MacKay, Avrim Blum, Sam Roweis, Alexei
Yavlinsky, Simon Tong