Efficient Bayesian Algorithms for Clustering - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

Efficient Bayesian Algorithms for Clustering

Description:

Efficient Bayesian Algorithms for Clustering – PowerPoint PPT presentation

Number of Views:95

Avg rating:3.0/5.0

Slides: 32

Provided by: katherin116

Category:

more less

Transcript and Presenter's Notes

Title: Efficient Bayesian Algorithms for Clustering

1
Efficient Bayesian Algorithms for Clustering

Katherine Heller
Gatsby Computational Neuroscience Unit
Women In Machine Learning Workshop
San Diego, CA

2
What Is Clustering?

Imagine we have data in some feature space
One of the most important goals of unsupervised
learning is to discover meaningful clusters in
data.
There are many clustering methods spectral,
hierarchical, k-means, mixture modeling, etc.
We take a model-based Bayesian approach to
defining a cluster and evaluate cluster
membership in this paradigm.

3
Marginal Likelihoods

We use marginal likelihoods to evaluate cluster
membership
The marginal likelihood is defined as
and can be interpreted as
the probability that all data
points in were generated
from the same model with
unknown parameters
Used to compare cluster
models

4
Outline

Introduction
Clustering
Marginal Likelihoods
Bayesian Hierarchical Clustering
Clustering on Demand
Bayesian Sets
Content Based Image Retrieval
Conclusions

5
Traditional Hierarchical Clustering

As in Duda and Hart (1973)
Many distance metrics are possible

6
Limitations of Traditional Hierarchical
Clustering Algorithms

How many clusters should there be?
It is hard to choose a distance metric
They do not define a probabilistic model of the
data, so they cannot
Predict the probability or cluster assignment of
new data points
Be compared to or combined with other
probabilistic models
Our Goal To overcome these limitations by
defining a novel statistical approach to
hierarchical clustering

7
Bayesian Hierarchical Clustering Building the
Tree

The algorithm is virtually identical to
traditional hierarchical clustering except that
instead of distance it uses marginal likelihood
to decide on merges.
For each potential merge
it compares two hypotheses
all data in came from one cluster
data in came from some other
clustering
consistent with the subtrees
Prior
Posterior probability of merged hypothesis
Probability of data given the tree

Heller and Ghahramani ICML 2005
8
Comparison
Bayesian Hierarchical Clustering
Traditional Hierarchical Clustering
9
Summary of BHC

We developed a Bayesian Hierarchical Clustering
algorithm which
Is simple, deterministic and fast (no MCMC,
one-pass, etc.)
Can take as input any simple probabilistic model
p(xq) and gives as output a mixture of these
models
Suggests where to cut the tree and how many
clusters there are in the data
Gives more reasonable results than traditional
hierarchical clustering algorithms
This algorithm also
Recursively computes an approximation to the
marginal likelihood of a Dirichlet Process
Mixture
which can be easily turned into a new lower
bound

10
Results a Toy Example
11
Results a Toy Example
12
Predicting New Data Points
13
4 Newsgroups Results
800 examples, 50 attributes rec.sport.baseball,
rec.sports.hockey, rec.autos, sci.space
14
Newsgroups Average Linkage HC
15
Newsgroups Bayesian HC
16
Clustering On Demand
Assume a universe of objects ( )
.
17
Clustering On Demand
18
Bayesian Sets Approach

Rank each object in by how well it would
fit into a set which includes (i.e. how
relevant it is to the query)
Use a Bayesian (model-based probabilistic)
relevance criterion
Limit output to the top few items

Ghahramani and Heller NIPS 2005
19
Bayesian Sets Criterion
We can write this score as
20
Bayesian Sets Criterion
This has a nice intuitive interpretation
21
Bayesian Sets Criterion
22
Sparse Binary Data
E.g
If we use a multivariate Bernoulli model
With conjugate Beta prior
where
and
23
Results EachMovie
1813 people by 1532 movies
24
A Bayesian Content-Based Image Retrieval System

We can use the Bayesian Sets method as the basis
of a content-based image retrieval system

25
The Image Retrieval Prototype System

The Algorithm
Input query word wpenguins
Find all training images with label w
Take the binary feature vectors for these
training images as query set and use Bayesian
Sets algorithm
For each image, x, in the unlabelled test set,
we compute score(x) which measures the
probability that x belongs in the set of images
with the label w.
Return the images with the highest score
The algorithm is very fast
about 0.2 sec on this laptop to query 22,000
test images

Heller and Ghahramani CVPR 2006
26
Example Queries
Query Desert
Query Pet
Query Sign
Query Building
Query Penguins
Query Eiffel
27
Example Training Images for Desert
28
Results for Image Retrieval

NNall - nearest neighbors to any member of the
query set Nnmean - nearest neighbors to the mean
of the query set BO - Behold Search online,
www.beholdsearch.com A Yavlinsky, E
Schofield and S Rüger (CIVR, 2005)
29
Future Work

Exploring bounds on the marginal likelihood of a
DPM
How tight is the bound?
Improved structures for combinatorial
approximation
Since the score is probabilistic it should be
possible to find a principled threshold for
number of items in the returned set
Automated Analogical Reasoning with Relational
Data
Image Annotation
Incorporate relevance feedback

30
Conclusions

Presented work on Bayesian hierarchical
clustering, information retrieval from sets of
items, and image retrieval, all based on
computing marginal likelihoods.
There are many interesting directions in which to
take this work

31
Acknowledgements