Statistical Pattern Recognition Part I: Clustering - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Statistical Pattern Recognition Part I: Clustering

Description:

connectionist pattern recognition is sometimes considered as subset of ... p = 1 gives city-block distance (a.k.a. Manhattan distance or taxi-cab distance) ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 20
Provided by: claudecch
Category:

less

Transcript and Presenter's Notes

Title: Statistical Pattern Recognition Part I: Clustering


1
Image Processing, Computer Vision, and Pattern
Recognition
Fac. of Comp., Eng. Tech. Staffordshire
University
Statistical Pattern RecognitionPart I
Clustering
Dr. Claude C. Chibelushi
2
Outline
  • Introduction
  • Features
  • Clustering
  • k-means clustering algorithm
  • Pattern Similarity
  • Summary

3
Introduction
  • Examples of pattern recognition tasks

Image scene
  • scene object (car, road, ...)
  • alphabet letter (a, b, c, ...)
  • writers name

Handwriting
  • word (one, two, three, ...)
  • speakers name

Speech
4
Introduction
  • Pattern recognition
  • Categorisation of patterns into a finite number
    of classes
  • Common approaches
  • statistical pattern recognition
  • connectionist pattern recognition is sometimes
    considered as subset of statistical recognition
  • syntactic pattern recognition

5
Introduction
  • Pattern recognition
  • Statistical
  • classification based on statistical distribution
    of patterns
  • Syntactic
  • classification based on structural relationship
    between elements of pattern
  • Connectionist
  • classification using artificial neural networks
    (statistical basis?)

6
Introduction
  • Terminology
  • Feature(s) representation or description of
    salient pattern attribute(s)
  • Class category , group, type
  • patterns with common characteristics/properties
  • Classification assignment of class label to
    observed pattern

7
Features
  • Patterns often considered as points in space
  • selected pattern attributes or properties
    (features) represented by
  • numerical measurements
  • linguistic labels
  • space often multidimensional, hence
  • set of numerical property values called feature
    vector

8
Features
  • Examples attributes for
  • pixel-based image segmentation
  • pixel colour, edge strength,
  • object recognition
  • object shape, colour, size,

9
Features
  • Example 2D feature plot for two image types (5
    samples of each)

10
Clustering
  • Clustering unsupervised learning that identifies
    groups of similar data samples
  • partial supervision (labelling of some samples)
    is possible
  • Cluster group of similar data samples, which are
    dissimilar from samples in other groups
  • for pattern recognition sample is synonymous
    with pattern

11
Clustering
  • Example applications
  • Image processing image segmentation into
    homogeneous regions
  • data samples image pixels
  • sample details pixel colour, edge strength, edge
    direction,
  • Marketing market segmentation into groups of
    customers with similar buying behaviour
  • data samples customers
  • sample details database of purchases,
    demographics,
  • Insurance profile identification of low- or
    high-risk clients
  • data samples clients
  • sample details database of client claim history,
    demographics,

12
Clustering
  • k-means clustering algorithm
  • Iterative algorithm for finding k clusters
  • number of clusters assumed known
  • Each cluster represented by prototype
  • cluster centroid (mean)

13
Clustering
  • k-means clustering algorithm pseudo code
  • kMeansClustering(dataSamples, k)
  • initialise cluster centroids
  • do
  • // assign each sample to a cluster
  • for each data sample
  • find nearest cluster centroid
  • // update each cluster mean
  • for each cluster
  • update centroid
  • until (centroids change is insignificant)

14
Clustering
  • k-means clustering algorithm
  • Limitations
  • requires prior knowledge of number of clusters
  • may fail to find optimal grouping of samples
  • sensitive to choice of
  • initial cluster prototypes
  • e.g. what if a prototype is far from any data
    sample?
  • distance measure (similarity measure)

15
Clustering
  • k-means clustering algorithm
  • As result of limitations, experimentation is
    often required
  • multiple trials with different
  • k
  • initial prototype values
  • distance measures

16
Pattern similarity
  • Similarity often tied to geometrical distance
    between points that represent samples
  • Many similarity measures are available
  • choice is application dependent

17
Pattern similarity
  • Minkowski metric
  • popular similarity measure
  • family of distance measurements between two
    points
  • v point (represents feature vector)
  • vk coordinate of point (i.e. feature)
  • m dimensionality of data space (i.e. number of
    features)

18
Pattern similarity
  • Minkowski metric (ctd.)
  • special cases
  • p 1 gives city-block distance (a.k.a. Manhattan
    distance or taxi-cab distance)
  • equivalent to Hamming distance if observation
    vectors are binary
  • p 2 gives Euclidean distance

19
Pattern similarity
  • Minkowski metric pseudo code
  • sum 0
  • for feature from 1 to m
  • diff featVector1feature featVector2feature
  • pDiff power( abs(diff), p )
  • sum sum pDiff
  • sum power( sum, 1 / p )

20
Pattern similarity
  • Minkowski metric (ctd.)
  • limitations
  • sensitive to non-uniform scale across features
  • large-scale features may dominate distance
    calculation
  • e.g. when features have different measurement
    units
  • sensitive to correlation between features
  • see limitations of minimum-distance classifier
    (in Part II)

21
Summary
  • Clustering unsupervised identification groups of
    similar data samples
  • Features salient pattern attributes
  • k-means clustering algorithm
  • iterative algorithm for finding k clusters
  • each represented by prototype
  • experimentation is often required as result of
    algorithm limitations
  • Geometrical distance often used as similarity
    measure
  • e.g. Minkowski metric
Write a Comment
User Comments (0)
About PowerShow.com