Title: Clustering Algorithms
1Introduction to Hierarchical Clustering Analysis
Dinh Dong Luong
2Introduction
- Data clustering concerns how to group a set of
objects based on their similarity of attributes
and/or their proximity in the vector space. - Main methods
- Partitioning K-Means
- Hierarchical BIRCH,ROCK,
- Density-based DBSCAN,
- A good clustering method will produce high
quality clusters with - high intra-class similarity cohesive within
clusters - low inter-class similarity distinctive between
clusters
3Stages in clustering
4Clustering Algorithms
- A. Distance and Similarity Measures
- B. Hierarchical Clustering
- Agglomerative
- Single linkage, complete linkage, group average
linkage, median linkage, centroid linkage,
balanced iterative reducing and clustering using
hierarchies (BIRCH), clustering using
representatives (CURE), robust clustering using
links (ROCK) - Divisive
- divisive analysis (DIANA), monothetic analysis
(MONA)
5Distance and Similarity Measures
6Similarity Measurements
Two profiles (vectors)
and
1 ? Pearson Correlation ? 1
7Similarity Measurements
- Pearson Correlation Trend Similarity
8Similarity Measurements
9Similarity Measurements
- Euclidean Distance Absolute difference
10Similarity Measurements
1 ? Cosine Correlation ? 1
11Similarity Measurements
- Cosine Correlation Trend Mean Distance
12Similarity Measurements
13Similarity Measurements
Similar?
14Taxonomy of Clustering Approaches
15Hierarchical Clustering
- Agglomerative clustering treats each data point
as a singleton cluster, and then successively
merges clusters until all points have been merged
into a single remaining cluster. Divisive
clustering works the other way around.
16Hierarchical Clustering
Calculate the similarity between all possible
combinations of two profiles
- Keys
- Similarity
- Clustering
Two most similar clusters are grouped together to
form a new cluster
Calculate the similarity between the new cluster
and all remaining clusters.
17General agglomerative clustering
18Clustering
C1
Merge which pair of clusters?
C2
C3
19Clustering
Single Linkage
Dissimilarity between two clusters Minimum
dissimilarity between the members of two clusters
C2
C1
20Clustering
Complete Linkage
Dissimilarity between two clusters Maximum
dissimilarity between the members of two clusters
C2
C1
21Clustering
Average Linkage
Dissimilarity between two clusters Averaged
distances of all pairs of objects (one from each
cluster).
C2
C1
22Clustering
Average Group Linkage
Dissimilarity between two clusters Distance
between two cluster means.
C2
C1
23My Idea Presentation
24Future Work
- Step 1 Use a simple hierarchical algorithms with
moment features to run and evaluate clustering
results. - Step 2 Find out good features for clustering on
our dataset by trying some feature variance
(Haar-like, shape quantization,). - Step 3 Choose an optimal hierarchical clustering
algorithm