About Pairwise Data Clustering HansJoachim Mucha - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

About Pairwise Data Clustering HansJoachim Mucha

Description:

Special weighted distances and weights of observations ... Voronoi tessellation, where the objects have. minimum distance to their. centroid and thus, the ... – PowerPoint PPT presentation

Number of Views:193
Avg rating:3.0/5.0
Slides: 26
Provided by: smabseam2
Category:

less

Transcript and Presenter's Notes

Title: About Pairwise Data Clustering HansJoachim Mucha


1
About Pairwise Data ClusteringHans-Joachim Mucha
2
Plan
  • Cluster analysis based on proximity matrices
  • A generalization of the sum-of-squares criterion
  • Special weighted distances and weights of
    observations
  • Pairwise data clustering of contingency tables

3
Introduction - About pairwise data clustering
  • Finding clusters arises as a data analysis
    problem in various research areas like, for
    instance, ecology, psychology, sociology, and
    linguistics.
  • In these application fields, often proximity
    matrices and contingency tables instead of data
    matrices have to be analyzed by clustering
    techniques in order to detect subsets (groups,
    clusters). Proximity matrices contain pairwise
    proximities (distances, similarities). That is, a
    proximity value for each pair of data points is
    given.
  • There are advantages and disadvantages of
    partitioning and hierarchical methods of pairwise
    data clustering. Two well-known clustering
    methods for analyzing symmetric proximity
    matrices (or data matrices), K-means and Ward,
    will be considered here in a little bit more
    detail.
  • Some generalizations of the simple model-based
    Gaussian clustering (K-means and Ward) are based
    on weights of observations and weighted
    (Euclidean) distances. This leads to methods,
    which are of special interest, e.g. for
    clustering of rows and columns of contingency
    tables.

4
Cluster analysis based on proximity matrices
Note often, only a distance matrix D is needed
for (model-based) hierarchical and
partitioning methods!
  • From a data
  • matrix X to a
  • distance matrix
  • D, and then to
  • hierarchies and
  • partitions

5
Cluster analysis based on proximity matrices
Now suppose for a while (in order to derive
pairwise data clustering from the sum-of-squares
criterion only), a sample of I independent
observations (objects) is given in RJ and denote
by X (xij) the corresponding data matrix
consisting of I rows and J columns (variables).
Further, let C x1, ..., xi, ..., xI denote
the finite set of these I entities
(observations). Alternatively, let us write
shortly C 1, ..., i, ..., I. The main task
of cluster analysis is finding a partition of the
set of I objects (observations) into K non-empty
clusters (subsets, groups) Ck, k 1,2,...,K.
Hereby, the intersection of each pair of clusters
Ck and Cl is empty. And, the union of all
clusters gives the whole set C.
6
Cluster analysis based on proximity matrices
The sum-of-squares criterion has to be
minimized in the Gaussian case with uniform
diagonal covariance matrix. Herein is the
sample cross-product matrix for the kth cluster
Ck and is the usual maximum likelihood estimate
of expectation values in cluster Ck. The sum of
squares criterion can be written in the
following equivalent form without an explicit
specification of cluster centers .
7
Cluster analysis based on proximity matrices
Herein dil is the squared Euclidean distance
between two observations i and l
. Furthermore, nk is the cardinality of
cluster Ck. There are two well- known clustering
techniques for minimizing the sum-of-squares
criterion based on pairwise distances the
partitioning K-means (Späth (1985) Cluster
Dissection...) minimizes this criterion for a
single partition into K clusters by exchanging
observations between clusters, and the
hierarchical Ward method (Ward (1963)) minimizes
it in a stepwise manner by agglomerations of
subsets starting with clusters containing only a
single observation each.
8
Example Hierarchical clustering
  • Visualization of Ward's
  • clustering results into 15
  • and 3 clusters.
  • As the three cluster
  • solution shows, the Ward
  • method never creates
  • hyperplanes as borderlines
  • between clusters.
  • Proximity Euclidean
  • distances of 4000 random
  • generated points in R2
  • coming from one standard
  • normal distributed population.

9
Cluster analysis based on proximity matrices
Partitioning methods
  • The partitioning K-means method based on a data
    matrix X goes back
  • to Steinhaus (1956) and MacQueen (1967). Here we
    deal with a K-
  • means technique based on pairwise (squared
    Euclidean) distances.
  • This, however, presents practical problems of
    storage and computation
  • time for increasing C because of their quadratic
    increase, as Späth
  • already pointed out in 1985. Meanwhile, new
    generations of computers
  • can deal easily with both problems of pairwise
    data clustering of about
  • 10000 objects. So, everyone can do this on its
    personal computer today.
  • In order to cluster a practical unlimited number
    of observations, we
  • generalize the criterion later on by introducing
    positive weights of
  • observations. Instead of dealing with millions of
    objects directly, their
  • appropriate weighted representatives are
    clustered subsequent to a
  • preprocessing step of data aggregation.
  • Note pairwise data clustering is the first
    choice in case of I ltlt J !!!

10
Example K-means partitioning method based on a
proximity matrix
  • Both Ward and K-means
  • minimize the same
  • criterion, but the
  • results are usually
  • different.
  • The K-means method is
  • leading to the well-known
  • Voronoi tessellation,
  • where the objects have
  • minimum distance to their
  • centroid and thus, the
  • borderlines between
  • clusters are hyperplanes.
  • Data 4000 points in R2
  • (same data as before).

11
Example pairwise clustering
  • Bivariate density surface
  • of a two-dimensional
  • three class data. Here
  • both Ward and K-means
  • clustering should be
  • successful in dividing
  • (decomposing) the data
  • into smaller subsets.
  • Proximity Euclidean
  • distances of 4000 random
  • generated points in R2
  • coming from three
  • different normal
  • distributed populations.

12
Example pairwise clustering
  • A comparison of the
  • performance of Ward and
  • K-means clustering of the
  • three class data says
  • Ward performs slightly
  • better than K-means.
  • The three sub-populations
  • have the following
  • different mean values
  • (-3 , 3), (0 , 0), (3 , 3)
  • and standard deviations
  • (1, 1), (0.7, 0.7), (1.2,1.2).

13
Cluster analysis based on proximity matrices
some applications
  • Archaeology
  • Data 613 Roman bricks
  • and tiles from the Rhine
  • area that are described
  • by nine oxids and ten
  • chemical trace elements.
  • Aim of clustering
  • Finding brickyards that
  • are not yet identified and
  • confirm supposed sites
  • of brickyards.
  • Are there clusters?
  • Fingerprint of the
  • proximity matrix
  • (Euclidean distances)

14
Cluster analysis based on proximity matrices
some applications
  • Fingerprint of the same
  • proximity matrix as
  • before but rearranged
  • with respect to the
  • result of pairwise data
  • clustering. Obviously,
  • there are clusters with
  • low pairwise distances
  • inside and high
  • distances between.
  • (Mucha et al. (2002),
  • Proc. of the 24th Annual
  • Conference of GfKl,
  • Springer, Berlin).

15
Cluster analysis based on proximity matrices
some applications
  • Linguistics Pairwise clustering of 217 regions
    (Mucha Haimerl (2005),
  • Proc. of the 28th Annual Conference of GfKl,
    Springer, Berlin).

Data Similarity matrix of Dialect regions
(coming from 3386 phonetic maps at 217
locations in North-Italy). Task Segmentation
of the set of locations.
16
A generalization of the sum-of-squares criterion
  • Now let the diagonal covariance matrix vary
    between groups. Then the
  • logarithmic sum-of-squares criterion
  • has to be minimized. An equivalent formulation
    based on pairwise
  • distances can be derived as
  • .

17
Special weighted distances and weights of
observations
  • The sum-of-squares criterion can be generalized
    by using positive
  • weights of observations to
  • ,
  • where Mk and mi denote the weight (mass) of
    cluster Ck and the mass
  • of observation i, respectively. Concerning the
    K-means algorithm, the
  • observations have to be exchanged between
    clusters in order to
  • minimize the criterion above. Here the following
    condition of exchange
  • of an observation i coming from cluster Ck and
    shifting into cluster Cg
  • has to be fulfilled
  • where
  • and

18
Special weighted distances and weights of
observations
  • Similarly, a generalized logarithmic
    sum-of-squares criterion can be
  • derived as
  • ,
  • where u(Ck ) denotes the within-cluster weighted
    logarithmic sum-of-
  • squares for the cluster Ck .
  • Taking into account weights of the variables, the
    squared weighted
  • Euclidean distance
  • is used, where here Q is restricted to be
    diagonal. For example,
  • means weighting the variables by their inverse
    variances. Here S
  • denotes the usual estimate of the covariance
    matrix.

19
Pairwise data clustering of contingency tables
  • Now lets consider a contigency table H (hij)
    consisting of I rows
  • and J columns. Then, and
  • denote the corresponding row profiles and weights
    of rows, respectively.
  • Without loss of generality, let us consider the
    cluster analysis of rows.
  • Usually, the total inertia of H
  • has to be decomposed by cluster analysis in a way
    that the sum of
  • within-cluster inertitia
  • becomes a minimum, where
  • defines the weights of variables (columns).

20
Pairwise data clustering of contingency tables
  • Example 1
  • Data World's largest
  • merchant fleets by country
  • of owner.
  • Self-Propelled Oceangoing
  • Vessels 1,000 Gross Tons
  • and Greater (as of July 1,
  • 2003). "Other" are Roll-
  • on/Roll-off, passenger,
  • breakbulk ships, partial
  • containerships,
  • refrigerated cargo, barge
  • carriers, and specialized
  • cargo ships.
  • Source CIA World Factbook

21
Pairwise data clustering of contingency tables
  • World's largest
  • merchant fleets
  • Correspondence
  • analysis plot

22
Pairwise data clustering of contingency tables
  • Hierarchical cluster
  • analysis of world's
  • largest merchant
  • Fleets.
  • From contingency
  • tables to Chi-square
  • Distances, and further
  • to hierarchies and
  • partitions.
  • The column points are
  • clustered in the same
  • way as the row points.

23
Pairwise data clustering of contingency tables
  • Example 2
  • Reference Greenacre, M. J. (1988)
  • Clustering the Rows and Columns of a
  • Contingency Table, Journal of Classific.
  • 5, 39-51.
  • Data Guttman (1971) 1554 Israeli
  • adults and their principal worries.

Ward clustering (Greenacre)
K-means leads to a better result in four
clusters Oth, POL, ECO, MIL, ENR, SAB,
MTO, and PER.
24
Other pairwise data clustering methods
  • Formulation of the optimization problem of a
    pairwise clustering cost function in the maximum
    entropy framework using a variational principle
    to derive corresponding data partitionings in a
    d-dimensional Euclidian space. See Buhmann
    Hoffman (1994) A Maximum Entropy Approach to
    Pairwise Data Clustering. In Proceedings of the
    International Conference on Pattern Recognition,
    Hebrew University, Jerusalem, vol.II, IEEE
    Computer Society Press, pp.207-212.
  • PAM (partitioning around medoids), e.g., for
    clustering gene-expression data. See Kaufman
    Rousseeuw (1990) Finding Groups in Data An
    Introduction to Cluster Analysis. New York
    Wiley.

25
Conclusions - About pairwise data clustering
  • The well-known sum-of-squares criterion of
    model-based cluster analysis can be formulated on
    the basis of pairwise distances.
  • It can be generalized in three ways first by
    allowing different volumes of clusters, second by
    using weighted observations, and then by
    weighting the variables.
  • As a special case of weighting the variables and
    observations, the decomposition of the inertia of
    contingency tables into clusters is possible by
    the hierarchical Ward or the partitioning K-means
    method.
  • Thank you for your attention
Write a Comment
User Comments (0)
About PowerShow.com