Classification - PowerPoint PPT Presentation

1 / 60
About This Presentation
Title:

Classification

Description:

Typology. Define conceptual attributes. Select appropriate attributes. Create typology matrix (substruction) Insert empirical entities in matrix ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 61
Provided by: fda017
Category:

less

Transcript and Presenter's Notes

Title: Classification


1
Classification Clustering
  • Pieter Spronck
  • http//www.cs.unimaas.nl/p.spronck

2
Binary Division of Marbles
3
Big vs. Small
4
Transparent vs. Opaque
5
Marble Attributes
  • Size (big vs. small)
  • Transparency (transparent vs. opaque)
  • Shininess (shiny vs. dull)
  • Colouring (monochrome vs. polychrome)
  • Colour (blue, green, yellow, )

6
Grouping of Marbles
7
Marbles
8
Honouring All Distinctions
9
Colour Coding
10
Natural Grouping
11
Types of Clusters
  • Uniquely classifying clusters
  • Overlapping clusters
  • Probabilistic clusters
  • Dendrograms

12
Uniquely Classifying Clusters
13
Overlapping Clusters
14
Probabilistic Clustering
15
Dendrogram
transparent
opaque
not clear
clear
16
Classification
  • Ordering of entities into groups based on their
    similarity
  • Minimisation of within-group variance
  • Maximisation of between-group variance
  • Exhaustive and exclusive
  • Principal technique clustering

17
Reasons for Classification
  • Descriptive power
  • Parsimony
  • Maintainability
  • Versatility
  • Identification of distinctive attributes

18
Typology vs. Taxonomy
  • Typology conceptual
  • Taxonomy empirical

19
Typology
  • Define conceptual attributes
  • Select appropriate attributes
  • Create typology matrix (substruction)
  • Insert empirical entities in matrix
  • Extend matrix if necessary
  • Reduce matrix if necessary

20
Defining Conceptual Attributes
  • Meaningful
  • Focus on ideal types
  • Order of importance
  • Exhaustive domains

21
Conceptual Marble Attributes
22
Typology Matrix
23
Matrix Extension
24
Reduction
  • Functional reduction
  • Pragmatic reduction
  • Numerical reduction
  • Reduction by using criterion types

25
Functional Reduction
26
Functionally Reduced Matrix
27
Pragmatic Reduction
28
Pragmatically Reduced Matrix
29
Criticising Typological Classification
  • Reification
  • Resilience
  • Problematic attribute selection
  • Unmanageability

30
Taxonomy
  • Define empirical attributes
  • Select appropriate attributes
  • Create entity matrix
  • Apply clustering technique
  • Analyse clusters

31
Empirical Attributes
32
Selecting Attributes
  • Size (big/small)
  • Colour (yellow, green, blue, red, white)
  • Colouring (monochrome/polychrome)
  • Shininess (shiny/dull)
  • Transparency (transparent/opaque)
  • Glass colour (clear, green, )

33
Entity Matrix
34
Automatic Clustering Parameters
  • Agglomerative vs. divisive
  • Monothetic vs. polythetic
  • Outliers permitted
  • Limits to number of clusters
  • Form of linkage (single, complete, average)

35
Automatic Clustering
36
Polythetic to Monothetic
NNN polychrome, dull, opaque
NYYN small, monochrome,shiny, opaque
NYYY small, monochrome,shiny, transparent
NYY polychrome, shiny, transparent
37
Analysing Clusters
small, monochrome,shiny, transparent
small, monochrome,shiny, opaque
polychrome, dull, opaque
Stone
polychrome, shiny, transparent
Vanilla
Classic
Tiger
38
Criticising Taxonomical Classification
  • Dependent on specimens
  • Difficult to generalise
  • Difficult to label
  • Biased towards academic discipline
  • Not the last word

39
Typology vs. Taxonomy
40
Operational Classification
  • Typology
  • (conceptual)
  • Taxonomy
  • (empirical)
  • Operational typology
  • (conceptual
  • empirical)

41
Automated Clustering Methods
  • Iterative distance-based clustering the k-means
    method
  • Incremental clusteringthe Cobweb method
  • Probability-based clusteringthe EM algorithm

42
k-Means Method
  • Iterative distance-based clustering
  • Divisive
  • Polythetic
  • Predefined number of clusters (k)
  • Outliers permitted

43
k-Means (pass 1)
k 2 attributes size (big/small), colouring
(monochrome/polychrome), shininess (shiny/dull),
transparency (transparent/opaque)
?
?
44
k-Means (pass 2)
k 2 attributes size (big/small), colouring
(monochrome/polychrome), shininess (shiny/dull),
transparency (transparent/opaque)
Cluster average small, polychrome, dull, opaque
Cluster average small, monochrome, shiny,
transparent.
45
k-Means (pass 3)
k 2 attributes size (big/small), colouring
(monochrome/polychrome), shininess (shiny/dull),
transparency (transparent/opaque)
Cluster average big, polychrome, dull, opaque
?
Cluster average small, monochrome, shiny,
transparent.
46
Cobweb Algorithm
  • Incremental clustering
  • Agglomerative
  • Polythetic
  • Dynamic number of clusters
  • Outliers permitted

47
Cobweb Procedure
  • Builds a tree by adding instances to it
  • Uses a Category Utility function to determine the
    quality of the clustering
  • Changes the tree structure if this positively
    influences the Category Utility (by merging nodes
    or splitting nodes)
  • Cutoff value may be used to group sufficiently
    similar instances together

48
Category Utility
  • Measure for quality of clustering
  • The better the predictive value of the average
    attribute values of the instances in the clusters
    for the individual attribute values, the higher
    the CU will be

49
Category Utility for Size (1)
C1
C2
CU (d((a2c2)(e2g2))h((b2c2)(f2g2)))/2 0
50
Category Utility for Size (2)
C1
C2
CU (d((a2c2)(e2g2))h((b2c2)(f2g2)))/2
((1/2)((1/3)(1/3))(1/2)((1/9)(5/9)))/
2 1/9
51
Category Utility for Size (3)
C1
C2
a) PrsizebigC1 1 b) PrsizebigC2 0 c)
Prsizebig 1/3 d) PrC1 1/3
e) PrsizesmallC1 0 f) PrsizesmallC2
1 g) Prsizesmall 2/3 h) PrC2 1/2
CU (d((a2c2)(e2g2))h((b2c2)(f2g2)))/2
((1/3)((8/9)(4/9))(2/3)((1/9)(5/9)))/
2 2/9
52
Cobweb Example
attributes size (big/small), colouring
(monochrome/polychrome), shininess (shiny/dull),
transparency (transparent/opaque)
53
Cobweb Result Example
attributes size (big/small), colouring
(monochrome/polychrome), shininess (shiny/dull),
transparency (transparent/opaque)
54
Cobweb Numerical
  • Probability of values of attributes of instances
    in a cluster is based on the standard deviation
    from the estimate for the mean value
  • Acuity is presumed variance in attribute values

55
Disadvantages of Previous Methods
  • Fast and hard to judge
  • Dependent on initial setup
  • Ad-hoc limitations
  • Hard to escape from local minima

56
Probability-based Clustering
  • Finite mixture models
  • Each cluster is defined by a vector of
    probabilities for instances to have certain
    values for their attributes, and a probability
    for instances to reside in the cluster.
  • Clustering equals searching for optimal sets of
    probabilities for a sample set

57
Expectation-Maximisation (EM)
  • Probability-based clustering
  • Divisive
  • Polythetic
  • Predefined number of clusters (k)
  • Outliers permitted

58
EM Procedure
  • Select k cluster vectors randomly
  • Calculate cluster probabilities for each instance
    (under the assumption that the instance
    attributes are independent)
  • Use calculations to re-estimate values
  • Repeat until increase in quality becomes
    negligible

59
EM Result Example
pC10.2 pbig0.6 pmonochrome0.3 pshiny0.4 ptrans
parent0.4
pC20.8 pbig0.2 pmonochrome0.8 pshiny0.9 ptran
sparent0.5
60
The Essence of Classification
  • A successful classification defines fundamental
    characteristics
  • A classification can never be better than the
    attributes it is based upon
  • There is no magic formula
Write a Comment
User Comments (0)
About PowerShow.com