Title: Segmentation and Profiling using SPSS for Windows
1Segmentation and Profiling using SPSS for
Windows
2Why Segmentation?
- Used by e.g. retail and consumer product
companies - Trying to learn about and describe their
customers' buying habits, gender, age, income
level, etc. - These companies tailor their marketing and
product development strategies to each consumer
group to increase sales and build brand loyalty. - A valuable approach in Market Research, and SPSS
offers some useful tools to facilitate this
commercial process
3Segmentation in SPSS
- Most of the techniques for segmentation and
profiling are exploratory - There is no right or wrong answer, and the
results are open to interpretation - Trying to make sense of the data or find patterns
- Iterative techniques
- If it does not make business sense then it is not
a good model!
4Segmentation in SPSS
- Techniques include
- Factor Analysis / Principal Components Analysis
- Hierarchical Clustering
- K-Means Cluster
- Non-Linear Principal Components Analysis
(PRINCALS/CATPCA) - The new Two-Step Cluster
5 Which Technique to Use?
Cluster Analysis
Categories
Factor Analysis
Exploratory
Confirmatory
Discriminant Analysis
AnswerTree
6 Which Test to use?
- Factor Analysis - to find patterns within
variables - Categories - use if data doesnt fit assumptions
for Factor Analysis - Cluster Analysis - to find patterns between
individuals - Two-Step Cluster To use with both categorical
and continuous variables - Discriminant Analysis - to look for differences
between groups, try to predict target variable - AnswerTree - combinations of data, to predict
target
7 Multivariate Analysis
- These techniques are inter-related, but dont
have to use all of them - Can use a combination of these techniques to
segment the data
8 Main Considerations
- Looking for patterns or trying to make
predictions? - Levels of Measurement of the data (categorical or
continuous) - Sample size
- Missing values
- Does data fulfil assumptions for test?
9 Before you start. .. Check
your data!
10 Handling Missing Data
- Check before analysis for any patterns within
missing data - Check before analysis that missing values are
defined as missing - otherwise may compromise the
model - Be aware that most segmentation techniques ignore
any cases with missing values - so may have less
usable data than you think!
11Variable and Value Labels.
- It is worth checking the labels on your file
- SPSS may truncate long variable and value labels
in the output, making it difficult to interpret
the output - Make sure all the useful information is at the
beginning of the variable and value labels - so
even if they are truncated, the output is still
easy to read
12Data Coding
- Check the direction of the coding scheme, and
maybe consider re-coding the data if the codes
are counter-intuitive - e.g. if have a rating scale that ranges from high
to low, rather than low to high - ... it can be difficult to interpret output and
factor scores etc. once the data has been through
several transformations
13Sample Data
- Data usage of underarm deodorants for men
- Three brands tested
- Rambo the current market leader
- Brad second most popular
- Clint recently launched product
14Profiling the Customers..
- Clint isnt selling as well as was hoped, so
the research aims to find out - Who is buying Clint?
- What sort of characteristics do they share?
- Who is buying the other deodorants tested?
- How might the marketing campaign be changed to
ensure that the correct market is targeted?
15Data Collected
- Ratings of a range of lifestyle attribute
questions, e.g. I tend to own the most
up-to-date products, My family is most
important thing in my life, I prefer to dress
and entertain casually etc. (34 of these) - Demographics age, type of work, exercise etc.
- Brand of D/O usually use
- How see yourself in relation to others, e.g.
What makes you distinctive from your friends
16Segmentation the steps
- Run Principal Components Analysis on attribute
rating questions, to see if any underlying
dimension in the variables - Check using Discriminant Analysis to see if these
dimensions help predict brand used - Run Cluster Analysis to see if can find
similarities between cases - Decide if other variables need to be included,
e.g. categorical demographics - Run Two-Step Cluster using all variables
17Factor Analysis
18Factor Analysis what is it?
- Looks for relationships between continuous
variables (based on correlations), in this case
attribute rating questions - Derives underlying constructs or dimensions in
the data - Tries to reduce a large number of variables to a
small number of factors which explain most of the
variance in the data - If cant interpret the resulting solution then no
good!
19Run Principal Components Analysis on 34 rated
attributes
20Factor Analysis Results
- The best solution produced 9 factors, interpreted
below - F1 High computer use
- F2 Rules, need to conform
- F3 Party animal
- F4 Family man
- F5 Likes new products, experiments
- F6 Likes pampering, pays more for trusted brands
- F7 Cautious, follower rather than leader for new
products - F8 Relaxed, casual
- F9 Home loving
21Do these factors help?
- Run Discriminant Analysis to see if can predict
D/O used
22Factor Analysis Results
- The factors are good at predicting Rambo usage,
but not at differentiating between Brad and
Clint - So try instead investigating relationships
between cases using Cluster Analysis - Options for clustering are
- Hierarchical Cluster
- K-Means Cluster
- Two-Step Cluster
23Hierarchical Cluster
- This is often thought of as the proper cluster
method - Looking for natural groupings within the data
- Bases groupings upon the similarity or
dissimilarity between cases, rather than
variables - Very iterative technique time consuming!
24Clustering Data - Diagram
data point one case
25Decisions before Cluster
- Which variables to use?
- Which distance measures between cases to use?
- Which criteria for creating clusters to choose?
- NB
- The quality of the analysis will always depend
upon the variables used - Cluster Analysis will always find a solution!
- It is not possible to assess in the analysis
itself how appropriate a variable is
26Stages of Hierarchical Cluster
- Select variables for analysis (carefully!)
- Build and assess model
- Save cluster membership
- If required, create cluster matrix for K-Means
- NB
- Because based on cases, need to make sure data is
measured on same scale - if not, data should be
standardized
27Run Hierarchical Cluster Analysison Saved
Factor Variables
28Decision with D/O Data
- I cant get a very good (i.e. useful to the
business) model from Hierarchical Cluster
analysis - Also, I want to be able to include both
categorical and continuous variables in the same
model - So I decide to use Two-Step Cluster instead
29Two-Step Cluster
30Two-Step Cluster
- The TwoStep Cluster Analysis procedure is an
exploratory tool designed to reveal natural
groupings (or clusters) within a data set that
would otherwise not be apparent. - The algorithm employed by this procedure has
several features that differentiate it from
traditional clustering techniques - The ability to create clusters based on both
categorical and continuous variables. - Automatic selection of the number of clusters.
- The ability to analyze large data files
efficiently.
31TwoStep Cluster
- Uses scalable cluster analysis algorithm
- This algorithm can handle both continuous and
categorical variables or attributes and requires
only one data pass in the procedure - The first step of the procedure pre-clusters the
records into many small sub-clusters - Then it clusters the sub-clusters created in the
pre-cluster step into the desired number of
clusters - If the desired number of clusters is unknown,
TwoStep Cluster analysis automatically finds the
proper number of clusters
32Two-Step Cluster
- This is unlike other clustering methods in SPSS -
if the desired number of clusters is unknown,
TwoStep Cluster analysis automatically finds the
proper number of clusters - Or you can pre-specify the number of clusters
required - flexibility
33Run Two-Step Cluster Analysison Saved Factor
Variablesand Categorical Variables
34(No Transcript)
35(No Transcript)
36Link to more information
- More useful information about Two-Step Cluster
can be found at the following websites - http//www.rrz.uni-hamburg.de/RRZ/Software/SPSS/Al
gorith.120/twostep_cluster.pdf - NB This was the handout for the talk, with
algorithm etc. - Also useful
- http//www.spss.com/pdfs/S115AD8-1202A.pdf
- http//www.norusis.com/pdf/SPC_v13.pdf
37Some of the output producedby the Two-Step
Cluster Analysis is reproduced in thenext few
slides
38Brand usually use by Cluster
- Clint spray seems to be associated with Cluster
6, with the roll-on version being associated with
Clusters 4 and 2
39Employment Status by Cluster
- Cluster 2 (Clint roll-on) is largely made up of
part-time, retired and not working respondents,
Cluster 4 also has a high number of retired
respondents, while Cluster 6 Clint spray) also
has a high percentage of part-time and unemployed.
40Age Group by Cluster
- Cluster 2 (Clint roll-on) is largely made up of
the younger and older age groups, Cluster 4 also
has a high percentage of older respondents.
Cluster 6 is more from 25 years upwards
41Cluster 4 (Clint roll-on) has below average
computer use and need to conform, above
average on Home Loving Family Man
42Cluster 6 (Clint spray) has above average
scores on Relaxed, Casual but not much else
this is Mr Laid Back!
43Summary of Findings
- Profiling of this data suggests that Clint is
not targeting the expected market - Clint is often not seen as sufficiently
different from Brad, it has no perceived USP - Clint is being used by a high percentage of
older, retired, and part-time or not employed
consumers, which may be a result of the
aggressive product launch campaign with free
samples, discounted prices etc. - Clint marketing needs some more work!
44Summary of Segmenting and Profiling this data
using SPSS
- Principal Components Analysis helped investigate
relationships between the rated attribute
variables - Hierarchical Cluster was used to try and find
similarities between cases, using the factors
derived from PCA - Two-Step Cluster was then used to enable
clustering of both continuous and categorical
variables in the same model - Useful conclusions were drawn about the market
positioning of Clint deodorant