Bayesian%20Hierarchical%20Clustering - PowerPoint PPT Presentation

About This Presentation

Title:

Bayesian%20Hierarchical%20Clustering

Description:

Number of Views:151

Avg rating:3.0/5.0

Slides: 20

Provided by: DavidW406

Learn more at: https://people.ee.duke.edu

Category:

Tags: 20clustering | 20hierarchical | approximate | bayesian | inference

Transcript and Presenter's Notes

Title: Bayesian%20Hierarchical%20Clustering

1
Bayesian Hierarchical Clustering

2
Outline

3
Hierarchical Clustering

4
Traditional Hierarchical Clustering

Bottom-up agglomerative algorithm
Begin with each data point in own cluster
Iteratively merge two closest clusters
Stop when have single cluster
Closeness based on given distance measure (e.g.,
Euclidean distance between cluster means)
Limitations
No guide to choosing correct number of
clusters, or where to prune tree
Distance metric selection (especially for data
such as images or sequences)
How to evaluate how good result is, how to
compare to other models, how to make predictions
and cluster new data with existing hierarchy

5
Bayesian Hierarchical Clustering (BHC)

Basic idea
Use marginal likelihoods to decide which clusters
to merge
Asks what the probability is that all the data in
a potential merge were generated from the same
mixture component. Compare to exponentially many
hypotheses at lower levels of the tree
Generative model used is a Dirichlet Process
Mixture Model (DPM)

6
BHC Algorithm Overview

One-pass, bottom-up method
Initializes each data point in own cluster, and
iteratively merges pairs of clusters
Uses a statistical hypothesis test to choose
which clusters to merge
At each stage, algorithm considers merging all
pairs of existing trees

7
BHC Algorithm Merging

Two hypotheses compared
1. all data in the pair of trees to be merged was
generated i.i.d. from the same probabilistic
model with unknown parameters (e.g., a
Gaussian)
2. said data has two or more clusters in it

8
Hypothesis H1

9
Hypothesis H2

10
Merging Clusters

11
Dirichlet Process Mixture Models (DPMs)

Probability of a new data point belonging to a
cluster is proportional to the number of points
already in that cluster
a controls the probability of the new point
creating a new cluster

12
Merged Hypothesis Prior

DPM with a defines a prior on all partitions of
the nk data points in Dk
Prior on merged hypothesis, pk, is the relative
mass of all nk points belonging to one cluster
versus all other partitions of those nk points,
consistent with the tree structure.

13
DPM

Other quantities needed for the posterior merged
hypothesis probabilities can also be written and
computed with the DPM (see math/proofs in paper)

14
Results

15
(No Transcript)
16
(No Transcript)
17
Unique Aspects of Algorithm

Is a hierarchical way of organizing nested
clusters, not a hierarchical generative model
Is derived from DPMs
Hypothesis test is not for one vs. two clusters
at each stage (is one vs. many other clusterings)
Is not iterative and does not require sampling

18
Summary

Defines probabilistic model of data, can compute
probability of new data point belonging to any
cluster in tree.
Model-based criterion to decide on merging
clusters.
Bayesian hypothesis testing used to decide which
merges are advantageous, and to decide
appropriate depth of tree.
Algorithm can be interpreted as approximate
inference method for a DPM gives new lower bound
on marginal likelihood by summing over
exponentially many clusterings of the data.

19
Why This Paper?

Mixed-type data problems both continuous and
discrete features
How to perform density estimation?
One way partition continuous data into groups
determined by the values of the discrete
features.
Problem number of groups grows quickly. (e.g., 5
features, each of which can take 4 values, gives
451024 groups)
How to determine which groups should be combined
to reduce the total number of groups?
Possible solution idea in this paper, except
rather than leaves being individual data points,
they would be groups of data points as determined
by the discrete feature-values