Statistical Pattern Recognition Part I: Clustering - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

Statistical Pattern Recognition Part I: Clustering

Description:

connectionist pattern recognition is sometimes considered as subset of ... p = 1 gives city-block distance (a.k.a. Manhattan distance or taxi-cab distance) ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 20

Provided by: claudecch

Category:

more less

Transcript and Presenter's Notes

Title: Statistical Pattern Recognition Part I: Clustering

1
Image Processing, Computer Vision, and Pattern
Recognition
Fac. of Comp., Eng. Tech. Staffordshire
University
Statistical Pattern RecognitionPart I
Clustering
Dr. Claude C. Chibelushi
2
Outline

Introduction
Features
Clustering
k-means clustering algorithm
Pattern Similarity
Summary

3
Introduction

Examples of pattern recognition tasks

Image scene

scene object (car, road, ...)

alphabet letter (a, b, c, ...)
writers name

Handwriting

word (one, two, three, ...)
speakers name

Speech
4
Introduction

Pattern recognition
Categorisation of patterns into a finite number
of classes
Common approaches
statistical pattern recognition
connectionist pattern recognition is sometimes
considered as subset of statistical recognition
syntactic pattern recognition

5
Introduction

Pattern recognition
Statistical
classification based on statistical distribution
of patterns
Syntactic
classification based on structural relationship
between elements of pattern
Connectionist
classification using artificial neural networks
(statistical basis?)

6
Introduction

Terminology
Feature(s) representation or description of
salient pattern attribute(s)
Class category , group, type
patterns with common characteristics/properties
Classification assignment of class label to
observed pattern

7
Features

Patterns often considered as points in space
selected pattern attributes or properties
(features) represented by
numerical measurements
linguistic labels
space often multidimensional, hence
set of numerical property values called feature
vector

8
Features

Examples attributes for
pixel-based image segmentation
pixel colour, edge strength,
object recognition
object shape, colour, size,

9
Features

Example 2D feature plot for two image types (5
samples of each)

10
Clustering

Clustering unsupervised learning that identifies
groups of similar data samples
partial supervision (labelling of some samples)
is possible
Cluster group of similar data samples, which are
dissimilar from samples in other groups
for pattern recognition sample is synonymous
with pattern

11
Clustering

Example applications
Image processing image segmentation into
homogeneous regions
data samples image pixels
sample details pixel colour, edge strength, edge
direction,
Marketing market segmentation into groups of
customers with similar buying behaviour
data samples customers
sample details database of purchases,
demographics,
Insurance profile identification of low- or
high-risk clients
data samples clients
sample details database of client claim history,
demographics,

12
Clustering

k-means clustering algorithm
Iterative algorithm for finding k clusters
number of clusters assumed known
Each cluster represented by prototype
cluster centroid (mean)

13
Clustering

k-means clustering algorithm pseudo code
kMeansClustering(dataSamples, k)
initialise cluster centroids
do
// assign each sample to a cluster
for each data sample
find nearest cluster centroid
// update each cluster mean
for each cluster
update centroid
until (centroids change is insignificant)

14
Clustering

k-means clustering algorithm
Limitations
requires prior knowledge of number of clusters
may fail to find optimal grouping of samples
sensitive to choice of
initial cluster prototypes
e.g. what if a prototype is far from any data
sample?
distance measure (similarity measure)

15
Clustering

k-means clustering algorithm
As result of limitations, experimentation is
often required
multiple trials with different
k
initial prototype values
distance measures

16
Pattern similarity

Similarity often tied to geometrical distance
between points that represent samples
Many similarity measures are available
choice is application dependent

17
Pattern similarity