Clustering and Fuzzy Clustering - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Clustering and Fuzzy Clustering

Description:

A chapter of the book by Witold Pedrycz, 'Knowledge-Based clustering' ... Tiles are made from clay moulded into the right shape, brushed, glazed, and baked. ... – PowerPoint PPT presentation

Number of Views:969
Avg rating:3.0/5.0
Slides: 40
Provided by: xin64
Category:

less

Transcript and Presenter's Notes

Title: Clustering and Fuzzy Clustering


1
Clustering and Fuzzy Clustering
  • CIS 6660.1 seminar
  • Xin Wang
  • Sep 29,2005

2
paper
  • A chapter of the book by Witold Pedrycz,
    "Knowledge-Based clustering", Wiley, 2005.

3
Outline
  • Overview of clustering
  • Hierarchical Clustering
  • Objective function-based Clustering
  • Hard Clustering
  • Fuzzy clustering
  • Extensions of fuzzy clustering
  • Summary

4
What is clustering
  • Data(Patterns)-gtClusters(Classes)
  • Desired properties
  • homogeneity within clusters
  • heterogeneity between clusters

5
Why clustering
  • improves data understanding
  • reveals internal structure of data
  • Useful for data analysis and interpretation

6
How to clutering
  • similarity measure classes
  • examples
  • distance, connectivity, intensity

7
Categories of Clustering
  • Hierarchical clustering
  • Objective function-based Clustering

8
Hierarchical clustering
  • Two modes
  • Bottom-up
  • Top-down

9
Hierarchical clustering(cont.)
  • Three types of distance

B
B
A
A
B
A
10
Objective function-based Clustering (hard
clustering)
  • N patterns in Rn, c clusters, a set of prototypes
    v1,v2,vc
  • Objective function

a certain distance between xk and vi
partition matrix
Min Q with respect to v1,v2,,vc and U U
11
The definition of U
  • If pattern k belongs to cluster i, then
  • Uik1
  • else
  • Uik0
  • The entries of U are binary

12
Constraints of U
  • Each cluster is nontrival Example N8, c3
  • Each pattern belongs to
  • a single cluster

13
Example Classify cracked tiles(hard C-Means)
475Hz 557Hz Ok? ------------- 0.958 0.003
Yes 1.043 0.001 Yes 1.907 0.003 Yes 0.780
0.002 Yes 0.579 0.001 Yes 0.003 0.105 No 0.001
1.748 No 0.014 1.839 No 0.007 1.021 No 0.004
0.214 No Table 1 frequency intensities for ten
tiles.
Tiles are made from clay moulded into the right
shape, brushed, glazed, and baked. Unfortunately,
the baking may produce invisible cracks.
Operators can detect the cracks by hitting the
tiles with a hammer, and in an automated system
the response is recorded with a microphone,
filtered, Fourier transformed, and normalised. A
small set of data is given in TABLE 1 (adapted
from MIT, 1997).
14
  • Place two cluster centres (x) at random.
  • Assign each data point ( and o) to the nearest
    cluster centre (x)

15
  • Compute the new centre of each class
  • Move the crosses (x)

16
Iteration 2
17
Iteration 3
18
Iteration 4 (then stop, because no visible
change) Each data point belongs to the cluster
defined by the nearest centre
19
U 0.0000 1.0000 0.0000 1.0000
0.0000 1.0000 0.0000 1.0000 0.0000
1.0000 1.0000 0.0000 1.0000
0.0000 1.0000 0.0000 1.0000 0.0000
1.0000 0.0000
  • The membership matrix U
  • The last five data points (rows) belong to the
    first cluster (column)
  • The first five data points (rows) belong to the
    second cluster (column)

20
Challenge to hard clustering
21
Why fuzzy clustering?
  • In real applications there is very often no
    sharp boundary between clusters.
  • Clusters may not be well separated for noise or
    lack of discriminatory power of feature space in
    which the patterns are represented.
  • Fuzzy clustering can deal with unsharp or
    overlapping cluster boundaries.

22
Hard Fuzzy clustering
  • Hard Clustering
  • crisp clusters
  • Each data belongs to one cluster.
  • Fuzzy Clustering
  • Each data belongs to more than one cluster.
  • membership grade, partial membership

23
Objective function-based Clustering (Fuzzy
Clustering)
  • N patterns in Rn, c clusters, a set of prototypes
    v1,v2,vc
  • Objective function

a certain distance between xk and vi
Fuzzy partition matrix
Fuzzification factor
24
Definition Constraints of U
  • U a matrix with entries confined to the unit
    interval
  • Constraints
  • The Clusters are nontrival.
  • The total membership grades sum to 1.

25
Fuzzy C-Means(FCM) algorithm description
  • introduced by Dunn in 1974 , improved by
    Bezdek in 1981
  • Step1 (Initialization phase)
  • (a) select values of c (number of clusters),
  • m (the fuzzification factor), and e(the
  • termination criterion).
  • (b) Choose the distance fucntion.
  • (c) Initialize(randomly) the partion matrix.

26
Fuzzy C-Means(FCM) algorithm description(cont.)
  • Step2(main iteration loop)
  • compute prototypes of clusters
  • compute the partition matrix
  • Step3
  • If stopping criterion is met, i.e.
  • then stop.
  • Else, go to step 2.

27
Computing matrix prototypes
  • For each pattern t1,2,,N, augmented fuction
  • ?denoting a Lagrange multiplier

28
Example for FCM
Each data point belongs to two clusters to
different degrees
29
  • Place two cluster centres
  • Assign a fuzzy membership to each data point
    depending on distance

30
  • Compute the new centre of each class
  • Move the crosses (x)

31
Iteration 2
32
Iteration 5
33
Iteration 10
34
Iteration 13 (then stop, because no visible
change) Each data point belongs to the two
clusters to a degree
35
U 0.0025 0.9975 0.0091 0.9909
0.0129 0.9871 0.0001 0.9999 0.0107
0.9893 0.9393 0.0607 0.9638
0.0362 0.9574 0.0426 0.9906 0.0094
0.9807 0.0193
  • The membership matrix U
  • The last five data points (rows) belong mostly to
    the first cluster (column)
  • The first five data points (rows) belong mostly
    to the second cluster (column)

36
Cluster Validity
  • What is the optimal number of clusters?
  • Partition Index
  • 1/c, 1
  • Partition Entropy
  • 0, ln(c)

37
Extensions of fuzzy clustering
  • Fuzzy C Varieties (Bezdek et al. 1981)
  • points-gtr-dimensional variety
  • Possibilistic Clustering(Krishnampuram 1993,
    keller 1996) drop the unity constraint
  • Noise Clustering(Ohashi 1984, Dave 1991)
  • localize the noise and place it in a single
    auxiliary cluster(end up with c1 clusters)

38
Summary
  • Hierarchical Objective function-based
    clustering

39
  • The end !
  • Thank you!
Write a Comment
User Comments (0)
About PowerShow.com