4. Ad-hoc I: Hierarchical clustering - PowerPoint PPT Presentation

1 / 15

About This Presentation

Title:

4. Ad-hoc I: Hierarchical clustering

Description:

Flat methods generate a single ... Automatic and manual methods for dendogram pruning. Methods for assigning observations in pruned subtrees to. clusters. ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 16

Provided by: werners6

Learn more at: https://stat.uw.edu

Category:

more less

Transcript and Presenter's Notes

Title: 4. Ad-hoc I: Hierarchical clustering

1

4. Ad-hoc I Hierarchical clustering
Hierarchical versus Flat
Flat methods generate a single partition into k
clusters. The number k of clusters has to be
determined by the user ahead of time.
Hierarchical methods generate a hierarchy of
partitions, i.e.
a partition P1 into 1 clusters (the entire
collection)
a partition P2 into 2 clusters
a partition Pn into n clusters (each object
forms its own cluster)
It is then up to the user to decide which of the
partitions reflects actual sub-populations in the
data.

2
Note A sequence of partitions is called
"hierarchical" if each cluster in a given
partition is the union of clusters in the next
larger partition.
Top hierarchical sequence of partitionsBottom
non hierarchical sequence
3

Hierarchical methods again come in two varieties,
agglomerative and divisive.
Agglomerative methods
Start with partition Pn, where each object forms
its own cluster.
Merge the two closest clusters, obtaining Pn-1.
Repeat merge until only one cluster is left.
Divisive methods
Start with P1.
Split the collection into two clusters that are
as homogenous (and as different from each
other) as possible.
Apply splitting procedure recursively to the
clusters.

4
Note Agglomerative methods require a rule to
decide which clusters to merge. Typically one
defines a distance between clusters and then
merges the two clusters that are
closest. Divisive methods require a rule for
splitting a cluster.
5
4.1 Hierarchical agglomerative clustering Need
to define a distance d(P,Q) between groups, given
a distance measure d(x,y) between observations.
Commonly used distance measures 1. d1(P,Q)
min d(x,y), for x in P, y in Q ( single
linkage ) 2. d2(P,Q) ave d(x,y), for x in P,
y in Q ( average linkage ) 3. d3(P,Q) max
d(x,y), for x in P, y in Q ( complete
linkage ) 4.
( centroid
method ) 5.
( Wards method )
d5 is called Wards distance.
6

Motivation for Wards distance
Let Pk P1 ,, Pk be a partition of the
observations into k groups.
Measure goodness of a partition by the sum of
squared distances of observations from their
cluster means

Consider all possible (k-1)-partitions
obtainable from Pk by a merge
Merging two clusters with smallest Wards
distance optimizes goodness of new partition.

4.2 Hierarchical divisive clustering
There are divisive versions of single linkage,
average linkage, and Wards method.
Divisive version of single linkage
Compute minimal spanning tree (graph connecting
all the objects with smallest total edge
length.
Break longest edge to obtain 2 subtrees, and a
corresponding partition of the objects.
Apply process recursively to the subtrees.
Agglomerative and divisive versions of single
linkage give identical results (more later).

8
Divisive version of Wards method. Given cluster
R. Need to find split of R into 2 groups P,Q
to minimize
or, equivalently, to maximize Wards distance
between P and Q. Note No computationally
feasible method to find optimal P, Q for large
R. Have to use approximation.
9

Iterative algorithm to search for the optimal
Wards split
Project observations in R on largest principal
component.
Split at median to obtain initial clusters P, Q.
Repeat
Assign each observation to cluster with
closest mean
Re-compute cluster means
Until convergence
Note
Each step reduces RSS(P, Q)
No guarantee to find optimal partition.

10
Divisive version of average linkage Algorithm
Diana, Struyf, Hubert, and Rousseuw, pp. 22
11

4.3 Dendograms
Result of hierarchical clustering can be
represented as binary tree
Root of tree represents entire collection
Terminal nodes represent observations
Each interior node represents a cluster
Each subtree represents a partition
Note The tree defines many more partitions than
the n-2 nontrivial ones constructed during the
merge (or split) process.
Note For HAC methods, the merge order defines a
sequence of n subtrees of the full tree. For HDC
methods a sequence of subtrees can be defined if
there is a figure of merit for each split.

12
If distance between daughter clusters is
monotonically increasing as we move up the tree,
we can draw dendogram y-coordinate of vertex
distance between daughter clusters.
Point set and corresponding single linkage
dendogram
13

Standard method to extract clusters from a
dendogram
Pick number of clusters k.
Cut dendogram at a level that results in k
subtrees.

4.4 Experiment
Try hierarchical method on unimodal 2D datasets.
Experiments suggest
Except in completely clear-cut situations, tree
cutting (cutree) is useless for extracting
clusters from a dendogram.
Complete linkage fails completely for elongated
clusters.

Needed
Diagnostics to decide whether the daughters of a
dendogram node really correspond to spatially
separated clusters.
Automatic and manual methods for dendogram
pruning.
Methods for assigning observations in pruned
subtrees to clusters.

Write a Comment

User Comments (0)