Title: Cluster Analysis
1Cluster Analysis
- Objectives
- ADDRESS HETEROGENEITY
- Combine observations into groups or clusters such
that groups formed are homogeneous (similar)
within the group and heterogeneous (different)
from other groups on some variables (?). - When we dont have some variables, we can still
form groups using Multidimensional Scaling (MDS)
Techniques. - MDS - continuous Space
- Cluster - discrete groups
- Main Application in Marketing Market
Segmentation - Data requirement generally interval or ratio
(ordinal and nominal ??) - Steps
- Decide on measures of distance (similarity or
dissimilarity) - Hierarchical Cluster decide on how to combine
observations - Non-hierarchical cluster (K-means or quick
cluster) - Interpretation of clusters
- How many clusters
- Cluster validation
2Cluster AnalysisMeasures of Distance
Similarity or Dissimilarity
- Two types of measures of distance ( or proximity,
similarity) - Direct we shall use in MDS
- Indirect
- Derived from original variables or factor scores
- Indirect Measures of distance
- Non-metric we shall use in MDS
- Metric Data
- Euclidean Distance
- Minkowski Distance
- Mahalanobis Distance
- Distance between BMW and Ford
-
iBMW jFord k nos. variables
Euclidean
Minkowski
v2
Mahalanobis
ED
v1
3Cluster AnalysisHierarchical Clustering
- Methods to combine observations
- Centroid
- Nearest Neighbor or single linkage
- Farthest-neighbor or complete linkage
- Average linkage
- Wards
- Centriod Method
Dendogram
distance
Data should be scaled?
s1
s2
s3
s4
s5
s6
Nearest neighbor
4Cluster AnalysisNon-Hierarchical Clustering
- K-Means Cluster/ Quick Cluster
- The data are divided into k-groups each group
representing a cluster - STEPS
- Select k initial cluster centroids, the number of
cluster desired - Assign each observation to the cluster to which
it is closest - Reassign or relocate each observation to one of
the k clusters according to predetermined
stopping rule
Say we want 3 clusters and first 3 observations
are centroids
Change criterion Continue if gt 2
Which Clustering Method is Best? 1. Hierarchical
Which one to use? Advantage no prior
knowledge of nos. of clusters, Disadvantage
Once assigned, no reassignment 2. K-Means / Quick
Cluster require prior knowledge, how many
clusters? Complementary Run Hierarchical,
decide on no of clusters, Run K -Means
5Interpretation of Clusters
6Cluster AnalysisValidation
Cross-validation
S1 assignment based on cluster on 1-14 cases S2
assignment based on separate cluster
Example from Text
Hit rate 112/151 74
7- Latent Segments Model to Incorporate
Heterogeneity
8Introduction
- Customer segmentation - partition consumers into
homogeneous groups that differ in purchasing
behavior - It provides information about consumer
preferences and market structure at segment level - Consumers with similar socio-demographics have
different purchasing behavior - Brand choice probabilities can be used to define
both market segment and market structure - Theoretical model Multinomial logit
- Conceptual appeal being grounded in economic
theory - Analytical tractability and ease of econometric
estimation - Excellent Empirical performance
9- Kamakura and Russell (1989) propose and test
latent segmentation. - Number of applications and numerous citation,
200 - Discrete interpretation of continuous
distribution. - Number of useful applications in Marketing and
other areas. - In our own work used to determine size of price
sensitive segment (25 to 35).