Title: Marketing Research
1Marketing 756 Lectures 22 and 24
The Use of Factor Analysis in Marketing Research
2Reminders
- HW 5 Due Today
- HW 6 Handed out Next Thursday
- Final Exam schedule
- Take Home (honor code)
- April 21st 1200pm May 2nd 1200pm.
- 3 hours in a contiguous period
- Open-book open notes
- Individual-level
- Computer output and tables contained within the
exam.
3Class Outline
- Uses of Factor Analysis
- A motivating example
- Factor analysis
- Basic idea
- Interpreting output
- Example Choice of Business School
- Next Lecture
- A worked out S-Plus example from start to finish
4General Idea of Factor Analysis
- You have a set of measured variables, call them
X1,,XJ - You believe that these measured variables X1,,XJ
may actually be indicators of higher-level
beliefs. - You want to figure out what these higher level
beliefs are, how many of them are there in your J
items, and how each of these J items maps onto
these higher-level beliefs. - However, when you construct these higher-level
beliefs you want to retain as much information
as possible in the original J items.
J items ----gt K ? J higher-level factors
5Uses of Factor Analysis
- Develop super-variables or general constructs
or - higher-level factors.
- Imagine saying to your manager, Our customers
say we are - doing well on Q4, poorly on Q7, and about average
on Q9.
What exactly are you supposed to do with this
information?
Contrast this with saying to your manager, We
are doing well on perceived product quality,
poorly on delivery, and about average on
price/value.
These are constructs!
6What is a construct?
Construct K
Construct 2
Construct 1
Item 1
Item 2
Item 3
Item 4
Item J
A construct is comprised of a subset of items
that are measuring the same higher order
value/belief.
What we will talk about is how many constructs
(K), what are they, and which items belong on
which constructs?
7Uses of Factor Analysis
- (2) Reduce redundancies in survey items
A Imagine have multiple survey items whose
responses are so highly correlated that they are
essentially redundant.
B Imagine that you have survey items that do
not load highly on any of the identified
constructs.
Target these items for removal as they provide
little additional information, little managerial
impact, and can lead to a shorter survey or one
with better items.
However, Factor Analysis does not have a Y,
therefore .
8Uses of Factor Analysis
- (3) Create uncorrelated measures useful for
clustering, regression, etc
The resulting constructs that come out of factor
analysis are uncorrelated. gt cor(Constructij,Con
structij) 0
Multicollinearity is no longer a problem
Get clean estimates of the effect of each
construct on the dependent measure of interest.
9So, why is Factor Analysis so common?
- Provides managers with higher-level factors that
are more managerially actionable and describable. - Reduces the number of items from J items to K?J
factors. - Factors are uncorrelated which provides clean
interpretation when using the results in future
analyses. - Provides managerial understanding as to the key
issues.
10Motivating Example 1
- A group of 12 students rated on a -5 to 5 scale,
the importance of good faculty and program
reputation in their choice of B-school - The figure below provides the scatter plot
1or 2 variables?
r0.911
- Questions that factor analysis answers
- Are there really two dimensions operating, or
are both variables really - measuring the same thing?
- (2) Can the number of variables could be reduced
from 2 to 1 - without sacrificing information?
Heuristically Yes
Heuristically Yes
11Motivating Example 2
A study was conducted by a bank to determine
whether or not special marketing programs should
be developed for several key segments. One of the
studys research questions concerned attitudes
toward banking. The respondents were asked to
express their opinions on a 0-9, agree-disagree
scale on the following five items 1. Small
banks charge less than large banks (X1) 2. Large
banks are more likely to make mistakes than small
banks (X2) 3. Tellers do not need to be extremely
courteous and friendly, its enough for them
simply to be civil (X3) 4. I want to be known
personally at my bank and be treated with special
courtesy (X4) 5. If a financial institution
treated me in a impersonal or uncaring way, I
would never patronize that organization again
(X5)
12Data from 15 respondents
On course website as factor analysis data
13If you were really good at staring at
correlation matrices
- Click on statistics-data summaries-correlation
and highlight V1-V5
V1 V2 V3
V4 V5 V1 1.00000000
0.6098012 0.4686953 -0.01794587 -0.0964166 V2
0.60980123 1.0000000 0.2304821 0.18968584
0.3186308 V3 0.46869526 0.2304821 1.0000000
-0.83182655 -0.7739358 V4 -0.01794587 0.1896858
-0.8318265 1.00000000 0.9273184 V5 -0.09641660
0.3186308 -0.7739358 0.92731841 1.0000000
Factor 1 is made up of V3,V4, and V5
Factor 2 is made up of V1 and V2
That is, FA is looking for blocks of items
where there is high correlation within blocks,
low correlation across blocks. If there are lots
of items, and/or the items that are correlated
are not contiguous, it is IMPOSSIBLE to do this
by eye.
14Marketing Research Questions
- Is it possible to represent each of these
variables as a linear combination of a smaller
set of factors? gt factor analysis - In reducing the number of variables, factor
analysis attempts to retain as much information
as possible and makes the new variables (i.e.,
derived factors) meaningful and easy to work
with. - It is then possible to (a) group the individuals
based on similarity of responses to questions X1
through X5? gt cluster analysis or (b) use the
resulting factor scores to predict bank volumes
(regression on factor scores)? - Next Lecture we pull it all together
15Factor Analysis Basic Idea
- Factor analysis provides an approach to the above
questions. - It will generate a set of new dimensions, as
opposed to X1,, X5 the old a priori dimensions,
the first of which is called F1 (the first
principal factor), which retains as much as
possible the inter-point distance information, or
variance, that was contained in the original
dimensions. - Find F1 that maximizes var(F11,, FI1)var(F1)
- An important statistic is the percentage of
original variance that is included in the factor,
i.e. var(F1)/var(X1,,X5)
16Specifically, how does Factor Analysis work?
- The objective of the factor analysis is to
represent each of the original variables (X) as a
linear combination of a smaller set of factors
(F) - That is, in the b-school example
- In the banking example, we have (in the case of
two factors) - The lnn are called the factor loadings
Some people have called FA regression where you
dont know the Xs.
17Bank Example revisited
Construct 1
Construct 2
Item 1
Item 2
Item 3
Item 4
Item 5
There are J5 original items
We are going to try and represent them by K2
general factors
The arrows represent the factor loadings
Factor analysis will tell us which items are
linked to which constructs, how many constructs
we need, and an ability to name these constructs.
18Applying factor analysis to the bank example
- Goal To reduce the number of variables by
eliminating redundancy so that the remaining
variables (i.e., derived factors) can be managed
and work on easily - Input A set of five variable values for each
individual in the sample. - Output (1) The factor loadings and
variance-explained percentages. (2) A set of
independent factors, and (3) A set of factor
scores New Xs.
19What are the steps in Interpreting Output?
- STEP 1 Determine the number of factors K
Criterion 1 Accept a factor analysis solution
where variance explained gt Z. (e.g. Z50 or
above)
- Criterion 2 Keep all factors whose variance
explained gt 1/J - It is doing more than average lifting.
- Keep all factors with eigenvalues gt 1.
- Criterion 3 A Scree Plot.
- Plot on Y-axis the variance explained or the
eigenvalues and - on the x-axis the number of factors. Look for a
kink in the plot.
20Assigning Xs to factors
- STEP 2 Assign original attributes (X) to factors
(F)
(1) Look at factor loading matrix L
(2) Determine a critical value L (e.g. 0.4)
(3) Say all Xs with L ? L load on factor F
- (4) 0/1 code the factor loadings
- L ? L -gt L1if positive L-1 if negative
- L lt L -gt L0
- Done for managerial interpretation as it leads to
factors being - SUMS of the original Xs.
21STEP 3 Name the factors See which questions load
on which factors based on step 2. Read the
questions. Name the factors.
1. Small banks charge less than large banks
(X1) 2. Large banks are more likely to make
mistakes than small banks (X2) 3. Tellers do not
need to be extremely courteous and friendly, its
enough for them simply to be civil (X3) 4. I
want to be known personally at my bank and be
treated with special courtesy (X4) 5. If a
financial institution treated me in a impersonal
or uncaring way, I would never patronize
that organization again (X5)
Price/value
service
22STEP 4 Determine the quality of the fitOverall
and for each X
The of variance explained by your K factor
solution gives you an indication of the quality
of the overall fit. It is akin to an R2 measure
from regression.
For each variable X, you get a reported
uniqueness score. That is, XL F
error The error is how unique is each X, i.e.
not explainable by the general Factor
solution. A large uniqueness (say gt0.4)
indicates a poor fit.
23STEP 5 Recoding the data
- Compute factor scores for each individual on each
factor
Once you run a factor analysis, if it adequately
fits, you no longer use the original Xs
(Xi1,Xi5) in subsequent analyses, but you
use Fi1,FiK
Uncorrelated, measure general constructs, have
managerial action and interpretation after
naming.
24Factor Analysis OutputStatistics Multivariate
Factor Analysis
Factor1 Factor2 1 -0.8581021 1.1573563 2
0.9912370 -0.3467979 3 -1.0464547 -2.1894628 4
1.7296603 -0.3569697 5 -0.4764332 1.1771556 6
0.2586998 0.4177715 7 -0.1647823 0.3144445 8
-0.7309114 0.6687987 9 1.6097433 -0.5965211 10
0.5464722 0.2499990 11 -1.1198209 -1.5933829 12
-0.2183903 0.9499942 13 1.4236142 -0.2257015 14
-1.0583693 -0.7279851 15 -0.8861627 1.1013012
Step 1 number of factors, K2 satisfies criterion
Step 5 The factor scores, our Xs from this
point forward
Step 3 Price/value
Step 3 Service
Step 2 Assigning Xs to factors X3,X4,X5 -gt
F1 X1,X2 -gt F2
Step 4 Overall fit 0.833, and variables X1 and
X2 have some unexplainable parts
25Although uncorrelated, there is a pattern!!!!
High and low service quality have lower
price/value scores
26Managerial Inferences
- There are two general factors.
- First factor is service and is mainly comprised
of Q3, Q4, Q5. - Second factor is price/value and is made up of
Q1, Q2. - People with very high and very low service
perceptions have lower price/value perceptions. - Note that the factor scores could now be used in
future analyses.
27An example to wet your appetite for next
classRegression of Bank Volume on Factor Scores
Residual Standard Error 57.0984, Multiple
R-Square 0.7115 N 15, F-statistic 14.7939
on 2 and 12 df, p-value 0.0006
coef std.err t.stat p.value Intercept 196.4864
14.7427 13.3277 0.0000 Service 62.7182 14.8563
4.2217 0.0012 P/V 53.9897 15.1908 3.5541 0.0040
Now, no multicollinearity, R2 adds, and
interpretation is clean.
28Punch Line
- Factor analysis is a widely used technique to
uncover higher order structure. - It leads to managerially interesting stories and
actions. - There is an entire branch of measurement, SEM
Structural Equation Modeling that deals more
formally with this type of analysis. - Widely used technique to CONFIRM an underlying
structure, i.e. useful for checking questionnaire
design. Called Confirmatory Factor Analysis.
29Marketing 756 Marketing Research
A Complete Analysis From Start to Finish
30Reminders
- HW 6 due on April 19th, returned by April 26th
into your mail folders. Answer key distributed
April 21st so that you can study for the final. - Upcoming Lecture Topics
- Multidimensional Scaling
- Discriminant Analysis
- Ocean Spray Cranberries
- Course Overview
31Reminder from previous lecture
- Factor analysis is a technique that allows for
- Computation of uncorrelated effects.
- Reduction from J variables to K super-variables
- Eliminating questions that dont load on any
variables of interest - 0/1 variable coding -gt super variables that are
summed scores of the original Xs.
Once done, these are your new variables.
In this lecture, we will do an analysis from
Start-gt Finish, acting somewhat as a review but
also showing you how to use these techniques in
combination
32What is a construct?
Construct K
Construct 2
Construct 1
Item 1
Item 2
Item 3
Item 4
Item J
A construct is comprised of a subset of items
that are measuring the same higher order
value/belief.
33Data Set for the complete analysisFull Example
Data on the course website
- DuPont mail survey data for 58 respondents.
- One set of questions involved satisfaction with
DuPont in 6 primary areas. - An additional set of questions were about
interests of the company, their size, number of
employees, etc
34Complete Analysis from start to finishFull
Example Data
Demographic/Background questions
Exp1 interest in exporting (1L, 2M, 3H) Size
Number of employees in thousands Revenue
Amount sold to that company by DuPont in
MM Years Number of years as a DuPont
customer Numprod Number of products that they
buy from DuPont
Survey Questions
- Q1-Q4 questions about quality
- TS1-TS3 questions about tech support
- SM1-SM2 questions about sales and marketing
support - SD1-SD7 questions about supply and delivery
- INN1-INN3 questions about innovation
- Overall overall satisfaction
35As an example Please rate DuPont on a 1-10
scale on each of the following questions
Likert scale questions, treated as continuous,
grouped into a priori blocks, randomization
across blocks and within blocks.
36Managerial Goals of the survey
- Assess satisfaction in each of a number of key
areas. - Can be done via means and cross-tabs
- Confirm the question structure as given by the
six question areas. - Can be done by a confirmatory factor analysis
- Understand heterogeneity across market segments.
- Can be done via clustering
- Understand the relationship between satisfaction
along - dimensions and sales.
- Can be done via regression
37Step 1 Exploratory analyses viaHistograms and 5
number summaries
Wide variation in size
Most companies have 7 years or less
Less than 4 products
Exp1 Size Revenue
Years Numprod Min 1.0000000
27.00000 0.3000000 4.5000000 2.000000 Mean
2.0000000 42.51724 1.5465517 6.2810345 4.827586
Max 3.0000000 69.00000 4.0000000 9.5000000
11.000000 Std Dev. 0.8583951 10.70209 0.8698085
0.9291492 2.609953
38Scatterplot Matrix Demographic variables
39Correlation Matrix of Demographics
Exp1 Size Revenue
Years Numprod Exp1 1.00000000 0.2234370
0.23967042 0.3343459 0.09396939 Size
0.22343700 1.0000000 -0.11910395 -0.2105350
0.56916018 Revenue 0.23967042 -0.1191040
1.00000000 0.2375097 -0.07059154 Years
0.33434595 -0.2105350 0.23750966 1.0000000
-0.16559460 Numprod 0.09396939 0.5691602
-0.07059154 -0.1655946 1.00000000
Hypothesis test for correlation
Export interest and Years significantly
correlated with revenue
40Analysis of Means of Survey Items
Q1 Q2 Q3 Q4
TS1 TS2 Mean 8.448276 8.551724
8.137931 8.637931 7.862069 8.189655 Std Dev.
1.230781 1.512201 1.549740 1.150107 1.810770
1.648571 TS3 SM1 SM2
SD1 SD2 SD3 Mean 8.224138
8.293103 7.948276 8.672414 8.775862 8.189655 Std
Dev. 1.533750 1.991285 2.258880 1.160580
1.060339 1.721453 SD4 SD5
SD6 SD7 INN1 INN2 Mean
8.189655 8.275862 9.000000 9.017241 7.224138
7.431034 Std Dev. 1.711231 1.598169 1.008734
1.177142 2.449798 2.256469 INN3
OVERALL Mean 7.931034 8.327586 Std Dev.
1.936375 1.647837
DuPont rates relatively poorly in terms of
Innovativeness.
Focus groups after the fact revealed that RD
promises not kept were the cause.
41Step 2 Run A Factor Analysis On the 19 Itemsto
create the super-variables
Statistics-Multivariate-Factor Analysis
(highlight the 19 items) Type in the number of
factors (six in this case), in general, you can
run an increasing number of factors. Type in
where you want the output saved under the
predictions tab. Under the results tab click
on everything and type in 0.4 (some people use
higher, some lower), for the cutoff loading
value Under the plots click on all the
buttons Hit OK to run.
42How Many Factors?
Importance of factors Factor1
Factor2 Factor3 Factor4 Factor5 SS
loadings 2.8352286 2.5379725 2.4980395 2.3006915
1.64812986 Proportion Var 0.1492226 0.1335775
0.1314758 0.1210890 0.08674368 Cumulative Var
0.1492226 0.2828001 0.4142758 0.5353649
0.62210853 Factor6 SS
loadings 1.3664325 Proportion Var
0.0719175 Cumulative Var 0.6940260
(1) Each of the six factors has greater than 1/19
0.053 proportion variance explained
(2) Sums of squares of loadings Factor1
Factor2 Factor3 Factor4 Factor5 Factor6
Factor7 2.815683 2.577082 2.54258 2.089417
1.741779 1.26776 0.7709414 The number of
variables is 19 and the number of observations is
58 Test of the hypothesis that 7 factors are
sufficient versus the alternative that more are
required The chi square statistic is 54.31 on 59
degrees of freedom. The p-value is 0.649
It appears that 7 is not needed.
43How well does the model fit the variables?
- 69.4 of the variance explained. Not bad 6/19
variables have almost - 70 of the variance.
II. Uniquenesses Q1 Q2 Q3
Q4 TS1 TS2 TS3 0.5484324
0.3415941 0.3516339 0.4188738 0.238474 0.2168948
0.1659384 SM1 SM2 SD1
SD2 SD3 SD4 0.07444144 0.03017874
0.4193007 0.2150707 0.2188743 0.07769325
SD5 SD6 SD7 INN1 INN2
INN3 0.7253591 0.7646984 0.1970875 0.4627636
0.3462046 4e-010
Factor Analysis solution does not fit Supply and
Delivery Items Well
44Does the Factor Analysis confirm
theQuestionnaire Design?
Loadings Factor1 Factor2 Factor3 Factor4
Factor5 Factor6 Q1 0.504
Q2 0.722
Q3 0.731
Q4 0.753
TS1 0.424
0.418 TS2
0.409 0.655 TS3
0.814 SM1
0.898 SM2
0.425 0.791
SD1 0.710
SD2 0.855
SD3 0.709
SD4 0.863
SD5
SD6
SD7 0.863
INN1
0.556 INN2
0.546 INN3
0.891
Quality Factor somewhat confirmed
Technical Support not entirely distinct from
Supply and Delivery Issues
Sales and Marketing somewhat supported
Sales and Marketing fractionated into Complaint
Handling
Technical Support
Innovativeness
45Interpret and name the factors(Go back to
questions and see what they really are)
- Factor 1 Quality
- Factor 2 Meeting Order Needs
- Factor 3 Sales and Marketing Support
- Factor 4 Complaint Handling
- Factor 5 Technical Support
- Factor 6 Innovativeness
- Comment This confirms the underlying survey
design, which is one use of factor analysis. Did
things cluster the way the study was designed. - Called Confirmatory Factor Analysis
46Which variables are targets for removal?
Loadings Factor1 Factor2 Factor3 Factor4
Factor5 Factor6 Q1 0.504
Q2 0.722
Q3 0.731
Q4 0.753
TS1 0.424
0.418 TS2
0.409 0.655 TS3
0.814 SM1
0.898 SM2
0.425 0.791
SD1 0.710
SD2 0.855
SD3 0.709
SD4 0.863
SD5
SD6
SD7 0.863
INN1
0.556 INN2
0.546 INN3
0.891
These two questions dont load highly on any
factors.
47Step 3 Utilize factor scores to form groups
- Cluster Analyze the factor scores
- Statistics-Cluster Analysis-Agglomerative
Hierarchical (run a hierarchical clustering
routine), on plot option type in clustering tree.
Two clean clusters, remaining points dont join
that closely.
48Step 4 Run k-means clustering on factor scores
Statistics-Cluster Analysis-K-means Type in 3
clusters, and under results, save cluster
membership
K-Means Clustering Centers
Factor1 Factor2 Factor3 Factor4
Factor5 1, -0.19500655 -0.4244374 -0.4746552
-1.6262781 -0.8575811 2, 0.02633563 0.1068013
0.2572054 0.3013399 0.1181485 3, 0.25864208
-0.5998617 -3.9083794 0.2367636 1.0826261
Factor6 1, 0.11797696 2,
0.02920225 3, -1.21714908 Clustering vector
1 1 2 2 1 2 2 2 2 1 2 2 1 2 1 2 2 2 2 2 2 2 2 2
1 2 2 2 2 2 2 2 2 2 2 35 1 2 2 2 2 2 2 2 1 2 2
2 2 2 2 2 2 2 1 2 2 3 2 3 Within cluster sum of
squares 1 82.280812 141.551988
9.547397 Cluster sizes 1 9 47 2
One large cluster and two minor ones.
Cluster one is dissatisfied with all factors
Cluster two is mostly satisfied with all
49Step 5 Profile the clusters based onthe
demographics
- Now we profile the groups based on Exp1, Size ,
Revenue, Years, Numprod
- Statistics-Data Summaries-Summary Statistics
- Highlight the five variables above and use
cluster-id - as the grouping variable
Compare to overall norms to assess the cluster.
--------------------------------------------------
---- cluster.id2 Exp1 Size
Revenue Years Numprod Min
1.0000000 27.000000 0.3000000 4.5000000
2.0000000 1st Qu. 1.0000000 33.500000
0.9000000 5.5000000 3.0000000 Mean
1.9574468 41.191489 1.4702128 6.2617021
4.6382979 Median 2.0000000 39.000000
1.0000000 6.0000000 4.0000000 3rd Qu.
3.0000000 48.500000 2.0000000 7.0000000
6.0000000 Max 3.0000000 69.000000
3.6000000 9.5000000 11.0000000 Total N
47.0000000 47.000000 47.0000000 47.0000000
47.0000000 NA's 0.0000000 0.000000
0.0000000 0.0000000 0.0000000 Std Dev.
0.8586503 10.237693 0.7857190 0.9449070
2.6078938 SE Mean 0.1252470 1.493321
0.1146089 0.1378288 0.3804004 LCL Mean
1.7053376 38.185590 1.2395170 5.9842670
3.8725916 UCL Mean 2.2095560 44.197389
1.7009085 6.5391372 5.4040042 ------------------
------------------------------------
50Understand the drivers of revenue in each cluster
Ran a regression with DV Revenue and the
demographics and factor scores as independent
variables for persons in Cluster 2.
Value Std. Error t value Pr(gtt)
(Intercept) 1.2465 0.2728 4.5687 0.0000
Exp1 0.1928 0.1252 1.5405 0.1308
Factor1 0.1810 0.1194 1.5162 0.1368
Factor3 0.6163 0.2283 2.6992 0.0099
Residual standard error 0.7243 on 43 degrees
of freedom Multiple R-Squared 0.2056
F-statistic 3.709 on 3 and 43 degrees of
freedom, the p-value is 0.0185
Ran a stepwise regression with all covariates and
six factor scores
Export interest, and Quality and Sales and
Marketing Support are key drivers of revenue
Normally, due this cluster by cluster.
51Summary
- Do exploratory analyses on the data (face
validity, low hanging fruit) - Create Super-variables/Constructs via factor
analysis - Name factors to make them meaningful and
managerially actionable - Form Market Segments via cluster analysis of
factor scores - Profile the clusters based on demographics/behavio
rs using simple cross-tabs - Run regressions cluster-by-cluster to see
significant drivers.
52Punch Line
- Factor Analysis can be used in Marketing Research
to - Help focus decision making on a few key
constructs - Create independent constructs
- Understand the relationships between variables
- Act as variables in future studies
53Lets see what we knowabout Marketing Research
- http//www.gactr.uga.edu/is/mr/quiz/index.phtml