Classification - PowerPoint PPT Presentation

1 / 60

About This Presentation

Title:

Classification

Description:

Typology. Define conceptual attributes. Select appropriate attributes. Create typology matrix (substruction) Insert empirical entities in matrix ... – PowerPoint PPT presentation

Number of Views:40

Avg rating:3.0/5.0

Slides: 61

Provided by: fda017

Category:

more less

Transcript and Presenter's Notes

Title: Classification

1
Classification Clustering

Pieter Spronck
http//www.cs.unimaas.nl/p.spronck

2
Binary Division of Marbles
3
Big vs. Small
4
Transparent vs. Opaque
5
Marble Attributes

Size (big vs. small)
Transparency (transparent vs. opaque)
Shininess (shiny vs. dull)
Colouring (monochrome vs. polychrome)
Colour (blue, green, yellow, )

6
Grouping of Marbles
7
Marbles
8
Honouring All Distinctions
9
Colour Coding
10
Natural Grouping
11
Types of Clusters

Uniquely classifying clusters
Overlapping clusters
Probabilistic clusters
Dendrograms

12
Uniquely Classifying Clusters
13
Overlapping Clusters
14
Probabilistic Clustering
15
Dendrogram
transparent
opaque
not clear
clear
16
Classification

Ordering of entities into groups based on their
similarity
Minimisation of within-group variance
Maximisation of between-group variance
Exhaustive and exclusive
Principal technique clustering

17
Reasons for Classification

Descriptive power
Parsimony
Maintainability
Versatility
Identification of distinctive attributes

18
Typology vs. Taxonomy

Typology conceptual
Taxonomy empirical

19
Typology

Define conceptual attributes
Select appropriate attributes
Create typology matrix (substruction)
Insert empirical entities in matrix
Extend matrix if necessary
Reduce matrix if necessary

20
Defining Conceptual Attributes

Meaningful
Focus on ideal types
Order of importance
Exhaustive domains

21
Conceptual Marble Attributes
22
Typology Matrix
23
Matrix Extension
24
Reduction

Functional reduction
Pragmatic reduction
Numerical reduction
Reduction by using criterion types

25
Functional Reduction
26
Functionally Reduced Matrix
27
Pragmatic Reduction
28
Pragmatically Reduced Matrix
29
Criticising Typological Classification

Reification
Resilience
Problematic attribute selection
Unmanageability

30
Taxonomy

Define empirical attributes
Select appropriate attributes
Create entity matrix
Apply clustering technique
Analyse clusters

31
Empirical Attributes
32
Selecting Attributes

Size (big/small)
Colour (yellow, green, blue, red, white)
Colouring (monochrome/polychrome)
Shininess (shiny/dull)
Transparency (transparent/opaque)
Glass colour (clear, green, )

33
Entity Matrix
34
Automatic Clustering Parameters

Agglomerative vs. divisive
Monothetic vs. polythetic
Outliers permitted
Limits to number of clusters
Form of linkage (single, complete, average)

35
Automatic Clustering
36
Polythetic to Monothetic
NNN polychrome, dull, opaque
NYYN small, monochrome,shiny, opaque
NYYY small, monochrome,shiny, transparent
NYY polychrome, shiny, transparent
37
Analysing Clusters
small, monochrome,shiny, transparent
small, monochrome,shiny, opaque
polychrome, dull, opaque
Stone
polychrome, shiny, transparent
Vanilla
Classic
Tiger
38
Criticising Taxonomical Classification

Dependent on specimens
Difficult to generalise
Difficult to label
Biased towards academic discipline
Not the last word

39
Typology vs. Taxonomy
40
Operational Classification

Typology
(conceptual)
Taxonomy
(empirical)

Operational typology
(conceptual
empirical)

41
Automated Clustering Methods

Iterative distance-based clustering the k-means
method
Incremental clusteringthe Cobweb method
Probability-based clusteringthe EM algorithm

42
k-Means Method

Iterative distance-based clustering
Divisive
Polythetic
Predefined number of clusters (k)
Outliers permitted

43
k-Means (pass 1)
k 2 attributes size (big/small), colouring
(monochrome/polychrome), shininess (shiny/dull),
transparency (transparent/opaque)
?
?
44
k-Means (pass 2)
k 2 attributes size (big/small), colouring
(monochrome/polychrome), shininess (shiny/dull),
transparency (transparent/opaque)
Cluster average small, polychrome, dull, opaque
Cluster average small, monochrome, shiny,
transparent.
45
k-Means (pass 3)
k 2 attributes size (big/small), colouring
(monochrome/polychrome), shininess (shiny/dull),
transparency (transparent/opaque)
Cluster average big, polychrome, dull, opaque
?
Cluster average small, monochrome, shiny,
transparent.
46
Cobweb Algorithm

Incremental clustering
Agglomerative
Polythetic
Dynamic number of clusters
Outliers permitted

47
Cobweb Procedure

Builds a tree by adding instances to it
Uses a Category Utility function to determine the
quality of the clustering
Changes the tree structure if this positively
influences the Category Utility (by merging nodes
or splitting nodes)
Cutoff value may be used to group sufficiently
similar instances together

48
Category Utility

Measure for quality of clustering
The better the predictive value of the average
attribute values of the instances in the clusters
for the individual attribute values, the higher
the CU will be

49
Category Utility for Size (1)
C1
C2
CU (d((a2c2)(e2g2))h((b2c2)(f2g2)))/2 0
50
Category Utility for Size (2)
C1
C2
CU (d((a2c2)(e2g2))h((b2c2)(f2g2)))/2
((1/2)((1/3)(1/3))(1/2)((1/9)(5/9)))/
2 1/9
51
Category Utility for Size (3)
C1
C2
a) PrsizebigC1 1 b) PrsizebigC2 0 c)
Prsizebig 1/3 d) PrC1 1/3
e) PrsizesmallC1 0 f) PrsizesmallC2
1 g) Prsizesmall 2/3 h) PrC2 1/2
CU (d((a2c2)(e2g2))h((b2c2)(f2g2)))/2
((1/3)((8/9)(4/9))(2/3)((1/9)(5/9)))/
2 2/9
52
Cobweb Example
attributes size (big/small), colouring
(monochrome/polychrome), shininess (shiny/dull),
transparency (transparent/opaque)
53
Cobweb Result Example
attributes size (big/small), colouring
(monochrome/polychrome), shininess (shiny/dull),
transparency (transparent/opaque)
54
Cobweb Numerical

Probability of values of attributes of instances
in a cluster is based on the standard deviation
from the estimate for the mean value
Acuity is presumed variance in attribute values

55
Disadvantages of Previous Methods

Fast and hard to judge
Dependent on initial setup
Ad-hoc limitations
Hard to escape from local minima

56
Probability-based Clustering

Finite mixture models
Each cluster is defined by a vector of
probabilities for instances to have certain
values for their attributes, and a probability
for instances to reside in the cluster.
Clustering equals searching for optimal sets of
probabilities for a sample set

57
Expectation-Maximisation (EM)

Probability-based clustering
Divisive
Polythetic
Predefined number of clusters (k)
Outliers permitted

58
EM Procedure

Select k cluster vectors randomly
Calculate cluster probabilities for each instance
(under the assumption that the instance
attributes are independent)
Use calculations to re-estimate values
Repeat until increase in quality becomes
negligible

59
EM Result Example
pC10.2 pbig0.6 pmonochrome0.3 pshiny0.4 ptrans
parent0.4
pC20.8 pbig0.2 pmonochrome0.8 pshiny0.9 ptran
sparent0.5
60
The Essence of Classification