Twomode cluster analysis: Monte Carlo tests of the accuracy of methods

1 / 26
About This Presentation
Title:

Twomode cluster analysis: Monte Carlo tests of the accuracy of methods

Description:

Twomode cluster analysis: Monte Carlo tests of the accuracy of methods –

Number of Views:45
Avg rating:3.0/5.0
Slides: 27
Provided by: frpsych
Category:

less

Transcript and Presenter's Notes

Title: Twomode cluster analysis: Monte Carlo tests of the accuracy of methods


1
Two-mode cluster analysis Monte Carlo tests of
the accuracy of methods
  • Sabine Krolak-Schwerdt
  • Saarland University

2
Two-mode cluster analysis Monte Carlo tests of
the accuracy of methods
  • Sabine Krolak-Schwerdt
  • Saarland University
  • Overview
  • Indication for two-mode clustering
  • Classification of two-mode clustering methods
  • Monte Carlo study
  • Data model used to construct the data sets
  • Experiment 1 Non-overlapping clusters
  • Experiment 2 Overlapping clusters
  • Conclusions for selection of methods



3
Classification of two-mode clustering methods
  • Generalizations of the ADCLUS model
  • GENNCLUS (DeSarbo, 1982)
  • PENCLUS (Both Gaul, 1987)
  • Baier et al. (1996)
  • Representations by ultrametric tree structures
  • Missing value method (Espejo Gaul, 1986)
  • Centroid effect method (Eckes Orlik, 1993)
  • ESOCLUS (Schwaiger, 1997)
  • ...
  • Reordering methods
  • Bond energy algorithm (McCormick, Schweitzer
    White, 1972)
  • Modal block method (Hartigan, 1976)
  • Two-way joining (Hartigan, 1975)
  • Gridpat (Krolak-Schwerdt, Orlik Ganter, 1994)
  • ...

4
Generalizations of the ADCLUS model
  • Data
  • is a nonsymmetric (similarity) matrix
  • of order n x m
  • Model
  • where
  • binary l x k matrix designating membership
    of the n objects in k clusters
  • k x k matrix of weights
  • binary m x k matrix designating membership
    of the m attributes in k clusters


  • (cf. DeSarbo, 1982)

5
Representations by ultrametric trees
6
Representations by ultrametric trees
Grand matrix
7
Reordering approaches Two-way joining


(Hartigan, 1975)
8
Monte Carlo study Selected methods
  • Generalizations of the ADCLUS model
  • GENNCLUS (DeSarbo, 1982)
  • PENCLUS (Both Gaul, 1987)
  • Baier et al. (1996)
  • Representations by ultrametric tree structures
  • Missing value method (Espejo Gaul, 1986)
  • Centroid effect method (Eckes Orlik, 1993)
  • ESOCLUS (Schwaiger, 1997)
  • ...
  • Reordering methods
  • Bond energy algorithm (McCormick, Schweitzer
    White, 1972)
  • Modal block method (Hartigan, 1976)
  • Two-way joining (Hartigan, 1975)
  • Gridpat (Krolak-Schwerdt, Orlik Ganter, 1994)
  • ...

9
Monte Carlo study Selected methods
  • Generalizations of the ADCLUS model
  • GENNCLUS (DeSarbo, 1982)
  • PENCLUS (Both Gaul, 1987)
  • Baier et al. (1996)
  • Representations by ultrametric tree structures
  • Missing value method (Espejo Gaul, 1986)
  • Centroid effect method (Eckes Orlik, 1993)
  • ESOCLUS (Schwaiger, 1996)
  • ...
  • Reordering methods
  • Bond energy algorithm (McCormick, Schweitzer
    White, 1972)
  • Modal block method (Hartigan, 1976)
  • Two-way joining (Hartigan, 1975)
  • Gridpat (Krolak-Schwerdt, Orlik Ganter, 1994)
  • ...

10
Data model
11
Data model
nonsymmetric matrix of order n x
m n objects
- m attributes binary n x K
matrix designating membership of the n
objects in K clusters K x K matrix of
weights binary m x K matrix
designating membership of the m attributes
in K clusters
12
Data model
Model of nonoverlapping clusters

13
Non-overlapping clusters
...
...
14
Data model
M overlapping clusters
M
15
Data model
Weight matrix of experiment 2 Toeplitz Matrix
16
Non-overlapping clusters
...
...
17
Model parameters of Experiment 1
(3 clusters, non-overlapping)
18
Model parameters of Experiment 2
Factors of the experimental design
Overlap large vs. small Cluster number
3, 5 or 8 Parameter of Toeplitz matrix
large vs. small Size of variance large
vs. small
19
Experiment 1 ANOVA of Adjusted Rand indices
20
Experiment 1 ANOVA of Adjusted Rand indices
Number of clusters F(2,180) 112.21,
3 5 8 p ? 0.001 0.81 0.69 0.56
Structure of clusters F(3,180)
19.86, p ?? 0.001
1 2 3 4
0.75 0.71 0.68 0.61
21
Experiment 1 ANOVA of Adjusted Rand indices
Number of clusters F(8,180)
68.44, Method 3 5 8 p ?
0.001 ESOCLUS 1.00 0.90 0.52 Centroid effect
method 0.96 0.78 0.57 Baier et
al. 0.54 0.53 0.50 GRIDPAT 0.90 0.76 0.30 Two-
way joining 0.63 0.48 0.94
22
Experiment 1 ANOVA of Adjusted Rand indices
Number of clusters F(8,180)
68.44, Method 3 5 8 p ?
0.001 ESOCLUS 1.00 0.90 0.52 Centroid effect
method 0.96 0.78 0.57 Baier et
al. 0.54 0.53 0.50 GRIDPAT 0.90 0.76 0.30 Two-
way joining 0.63 0.48 0.94
Structure of clusters F(12,180)
3.12, Method 1 2 3 4 p ??
0.001 ESOCLUS 0.88 0.84 0.80 0.71 Centroid
effect method 0.87 0.84 0.78 0.59 Baier et
al. 0.61 0.50 0.50 0.47 GRIDPAT 0.67 0.69 0.60
0.65 Two-way joining 0.70 0.70 0.71 0.63
23
Experiment 2 ANOVA of Omega indices
24
Experiment 2 ANOVA of Omega indices
Cluster Overlap F(1,240) 29.72,
p ? 0.001
Large
Small
0.84
0.87
s of the normal distribution F(1,240)
28.66, p?? 0.001
Large
Small
0.84
0.87
25
Experiment 2 ANOVA of Omega indices
Large overlap Method 3 5
8 ESOCLUS 0.80 0.81 0.81 Centroid effect
method 0.81 0.81 0.82 Baier et
al. 0.92 0.85 0.78 GRIDPAT 0.92 0.87 0.82 Two-
way joining 0.80 0.89 0.76
F(8,240) 6.61, p ? 0.001
Small overlap Method 3 5
8 ESOCLUS 0.88 0.90 0.91 Centroid effect
method 0.87 0.91 0.92 Baier et
al. 0.85 0.84 0.86 GRIDPAT 0.82 0.87 0.87 Two-
way joining 0.68 0.78 0.79
26
Conclusions
  • Recovery performance of two-mode clustering
    methods depends on the type and complexity of the
    data structure.
  • Methods performed best if the input data
    correspond to the data structure presumed by the
    method
  • - Non-overlapping clusters ESOCLUS, Centroid
    effect method
  • - Overlapping clusters Baier et al.,
    Two-way joining, GRIDPAT
  • Some apriori knowledge or hypothesis on the type
    and structure of data is necessary for the
    selection of an optimal method.
Write a Comment
User Comments (0)
About PowerShow.com