Marketing Research - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

Marketing Research

Description:

HW # 6 'Handed out Next Thursday' Final Exam schedule. Take Home ... Criterion 3: A Scree Plot. Plot on Y-axis the % variance explained or the eigenvalues and ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 54
Provided by: tec72
Category:

less

Transcript and Presenter's Notes

Title: Marketing Research


1
Marketing 756 Lectures 22 and 24
The Use of Factor Analysis in Marketing Research
2
Reminders
  • HW 5 Due Today
  • HW 6 Handed out Next Thursday
  • Final Exam schedule
  • Take Home (honor code)
  • April 21st 1200pm May 2nd 1200pm.
  • 3 hours in a contiguous period
  • Open-book open notes
  • Individual-level
  • Computer output and tables contained within the
    exam.

3
Class Outline
  • Uses of Factor Analysis
  • A motivating example
  • Factor analysis
  • Basic idea
  • Interpreting output
  • Example Choice of Business School
  • Next Lecture
  • A worked out S-Plus example from start to finish

4
General Idea of Factor Analysis
  • You have a set of measured variables, call them
    X1,,XJ
  • You believe that these measured variables X1,,XJ
    may actually be indicators of higher-level
    beliefs.
  • You want to figure out what these higher level
    beliefs are, how many of them are there in your J
    items, and how each of these J items maps onto
    these higher-level beliefs.
  • However, when you construct these higher-level
    beliefs you want to retain as much information
    as possible in the original J items.

J items ----gt K ? J higher-level factors
5
Uses of Factor Analysis
  • Develop super-variables or general constructs
    or
  • higher-level factors.
  • Imagine saying to your manager, Our customers
    say we are
  • doing well on Q4, poorly on Q7, and about average
    on Q9.

What exactly are you supposed to do with this
information?
Contrast this with saying to your manager, We
are doing well on perceived product quality,
poorly on delivery, and about average on
price/value.
These are constructs!
6
What is a construct?
Construct K
Construct 2
Construct 1

Item 1
Item 2
Item 3
Item 4
Item J
A construct is comprised of a subset of items
that are measuring the same higher order
value/belief.
What we will talk about is how many constructs
(K), what are they, and which items belong on
which constructs?
7
Uses of Factor Analysis
  • (2) Reduce redundancies in survey items

A Imagine have multiple survey items whose
responses are so highly correlated that they are
essentially redundant.
B Imagine that you have survey items that do
not load highly on any of the identified
constructs.
Target these items for removal as they provide
little additional information, little managerial
impact, and can lead to a shorter survey or one
with better items.
However, Factor Analysis does not have a Y,
therefore .
8
Uses of Factor Analysis
  • (3) Create uncorrelated measures useful for
    clustering, regression, etc

The resulting constructs that come out of factor
analysis are uncorrelated. gt cor(Constructij,Con
structij) 0
Multicollinearity is no longer a problem
Get clean estimates of the effect of each
construct on the dependent measure of interest.
9
So, why is Factor Analysis so common?
  • Provides managers with higher-level factors that
    are more managerially actionable and describable.
  • Reduces the number of items from J items to K?J
    factors.
  • Factors are uncorrelated which provides clean
    interpretation when using the results in future
    analyses.
  • Provides managerial understanding as to the key
    issues.

10
Motivating Example 1
  • A group of 12 students rated on a -5 to 5 scale,
    the importance of good faculty and program
    reputation in their choice of B-school
  • The figure below provides the scatter plot

1or 2 variables?
r0.911
  • Questions that factor analysis answers
  • Are there really two dimensions operating, or
    are both variables really
  • measuring the same thing?
  • (2) Can the number of variables could be reduced
    from 2 to 1
  • without sacrificing information?

Heuristically Yes
Heuristically Yes
11
Motivating Example 2
A study was conducted by a bank to determine
whether or not special marketing programs should
be developed for several key segments. One of the
studys research questions concerned attitudes
toward banking. The respondents were asked to
express their opinions on a 0-9, agree-disagree
scale on the following five items 1. Small
banks charge less than large banks (X1) 2. Large
banks are more likely to make mistakes than small
banks (X2) 3. Tellers do not need to be extremely
courteous and friendly, its enough for them
simply to be civil (X3) 4. I want to be known
personally at my bank and be treated with special
courtesy (X4) 5. If a financial institution
treated me in a impersonal or uncaring way, I
would never patronize that organization again
(X5)
12
Data from 15 respondents
On course website as factor analysis data
13
If you were really good at staring at
correlation matrices
  • Click on statistics-data summaries-correlation
    and highlight V1-V5

V1 V2 V3
V4 V5 V1 1.00000000
0.6098012 0.4686953 -0.01794587 -0.0964166 V2
0.60980123 1.0000000 0.2304821 0.18968584
0.3186308 V3 0.46869526 0.2304821 1.0000000
-0.83182655 -0.7739358 V4 -0.01794587 0.1896858
-0.8318265 1.00000000 0.9273184 V5 -0.09641660
0.3186308 -0.7739358 0.92731841 1.0000000
Factor 1 is made up of V3,V4, and V5
Factor 2 is made up of V1 and V2
That is, FA is looking for blocks of items
where there is high correlation within blocks,
low correlation across blocks. If there are lots
of items, and/or the items that are correlated
are not contiguous, it is IMPOSSIBLE to do this
by eye.
14
Marketing Research Questions
  • Is it possible to represent each of these
    variables as a linear combination of a smaller
    set of factors? gt factor analysis
  • In reducing the number of variables, factor
    analysis attempts to retain as much information
    as possible and makes the new variables (i.e.,
    derived factors) meaningful and easy to work
    with.
  • It is then possible to (a) group the individuals
    based on similarity of responses to questions X1
    through X5? gt cluster analysis or (b) use the
    resulting factor scores to predict bank volumes
    (regression on factor scores)?
  • Next Lecture we pull it all together

15
Factor Analysis Basic Idea
  • Factor analysis provides an approach to the above
    questions.
  • It will generate a set of new dimensions, as
    opposed to X1,, X5 the old a priori dimensions,
    the first of which is called F1 (the first
    principal factor), which retains as much as
    possible the inter-point distance information, or
    variance, that was contained in the original
    dimensions.
  • Find F1 that maximizes var(F11,, FI1)var(F1)
  • An important statistic is the percentage of
    original variance that is included in the factor,
    i.e. var(F1)/var(X1,,X5)

16
Specifically, how does Factor Analysis work?
  • The objective of the factor analysis is to
    represent each of the original variables (X) as a
    linear combination of a smaller set of factors
    (F)
  • That is, in the b-school example
  • In the banking example, we have (in the case of
    two factors)
  • The lnn are called the factor loadings

Some people have called FA regression where you
dont know the Xs.
17
Bank Example revisited
Construct 1
Construct 2
Item 1
Item 2
Item 3
Item 4
Item 5
There are J5 original items
We are going to try and represent them by K2
general factors
The arrows represent the factor loadings
Factor analysis will tell us which items are
linked to which constructs, how many constructs
we need, and an ability to name these constructs.
18
Applying factor analysis to the bank example
  • Goal To reduce the number of variables by
    eliminating redundancy so that the remaining
    variables (i.e., derived factors) can be managed
    and work on easily
  • Input A set of five variable values for each
    individual in the sample.
  • Output (1) The factor loadings and
    variance-explained percentages. (2) A set of
    independent factors, and (3) A set of factor
    scores New Xs.

19
What are the steps in Interpreting Output?
  • STEP 1 Determine the number of factors K

Criterion 1 Accept a factor analysis solution
where variance explained gt Z. (e.g. Z50 or
above)
  • Criterion 2 Keep all factors whose variance
    explained gt 1/J
  • It is doing more than average lifting.
  • Keep all factors with eigenvalues gt 1.
  • Criterion 3 A Scree Plot.
  • Plot on Y-axis the variance explained or the
    eigenvalues and
  • on the x-axis the number of factors. Look for a
    kink in the plot.

20
Assigning Xs to factors
  • STEP 2 Assign original attributes (X) to factors
    (F)

(1) Look at factor loading matrix L
(2) Determine a critical value L (e.g. 0.4)
(3) Say all Xs with L ? L load on factor F
  • (4) 0/1 code the factor loadings
  • L ? L -gt L1if positive L-1 if negative
  • L lt L -gt L0
  • Done for managerial interpretation as it leads to
    factors being
  • SUMS of the original Xs.

21
STEP 3 Name the factors See which questions load
on which factors based on step 2. Read the
questions. Name the factors.
1. Small banks charge less than large banks
(X1) 2. Large banks are more likely to make
mistakes than small banks (X2) 3. Tellers do not
need to be extremely courteous and friendly, its
enough for them simply to be civil (X3) 4. I
want to be known personally at my bank and be
treated with special courtesy (X4) 5. If a
financial institution treated me in a impersonal
or uncaring way, I would never patronize
that organization again (X5)
Price/value
service
22
STEP 4 Determine the quality of the fitOverall
and for each X
The of variance explained by your K factor
solution gives you an indication of the quality
of the overall fit. It is akin to an R2 measure
from regression.
For each variable X, you get a reported
uniqueness score. That is, XL F
error The error is how unique is each X, i.e.
not explainable by the general Factor
solution. A large uniqueness (say gt0.4)
indicates a poor fit.
23
STEP 5 Recoding the data
  • Compute factor scores for each individual on each
    factor

Once you run a factor analysis, if it adequately
fits, you no longer use the original Xs
(Xi1,Xi5) in subsequent analyses, but you
use Fi1,FiK
Uncorrelated, measure general constructs, have
managerial action and interpretation after
naming.
24
Factor Analysis OutputStatistics Multivariate
Factor Analysis
Factor1 Factor2 1 -0.8581021 1.1573563 2
0.9912370 -0.3467979 3 -1.0464547 -2.1894628 4
1.7296603 -0.3569697 5 -0.4764332 1.1771556 6
0.2586998 0.4177715 7 -0.1647823 0.3144445 8
-0.7309114 0.6687987 9 1.6097433 -0.5965211 10
0.5464722 0.2499990 11 -1.1198209 -1.5933829 12
-0.2183903 0.9499942 13 1.4236142 -0.2257015 14
-1.0583693 -0.7279851 15 -0.8861627 1.1013012
Step 1 number of factors, K2 satisfies criterion
Step 5 The factor scores, our Xs from this
point forward
Step 3 Price/value
Step 3 Service
Step 2 Assigning Xs to factors X3,X4,X5 -gt
F1 X1,X2 -gt F2
Step 4 Overall fit 0.833, and variables X1 and
X2 have some unexplainable parts
25
Although uncorrelated, there is a pattern!!!!
High and low service quality have lower
price/value scores
26
Managerial Inferences
  • There are two general factors.
  • First factor is service and is mainly comprised
    of Q3, Q4, Q5.
  • Second factor is price/value and is made up of
    Q1, Q2.
  • People with very high and very low service
    perceptions have lower price/value perceptions.
  • Note that the factor scores could now be used in
    future analyses.

27
An example to wet your appetite for next
classRegression of Bank Volume on Factor Scores
Residual Standard Error 57.0984, Multiple
R-Square 0.7115 N 15, F-statistic 14.7939
on 2 and 12 df, p-value 0.0006
coef std.err t.stat p.value Intercept 196.4864
14.7427 13.3277 0.0000 Service 62.7182 14.8563
4.2217 0.0012 P/V 53.9897 15.1908 3.5541 0.0040
Now, no multicollinearity, R2 adds, and
interpretation is clean.
28
Punch Line
  • Factor analysis is a widely used technique to
    uncover higher order structure.
  • It leads to managerially interesting stories and
    actions.
  • There is an entire branch of measurement, SEM
    Structural Equation Modeling that deals more
    formally with this type of analysis.
  • Widely used technique to CONFIRM an underlying
    structure, i.e. useful for checking questionnaire
    design. Called Confirmatory Factor Analysis.

29
Marketing 756 Marketing Research
A Complete Analysis From Start to Finish
30
Reminders
  • HW 6 due on April 19th, returned by April 26th
    into your mail folders. Answer key distributed
    April 21st so that you can study for the final.
  • Upcoming Lecture Topics
  • Multidimensional Scaling
  • Discriminant Analysis
  • Ocean Spray Cranberries
  • Course Overview

31
Reminder from previous lecture
  • Factor analysis is a technique that allows for
  • Computation of uncorrelated effects.
  • Reduction from J variables to K super-variables
  • Eliminating questions that dont load on any
    variables of interest
  • 0/1 variable coding -gt super variables that are
    summed scores of the original Xs.

Once done, these are your new variables.
In this lecture, we will do an analysis from
Start-gt Finish, acting somewhat as a review but
also showing you how to use these techniques in
combination
32
What is a construct?
Construct K
Construct 2
Construct 1

Item 1
Item 2
Item 3
Item 4
Item J
A construct is comprised of a subset of items
that are measuring the same higher order
value/belief.
33
Data Set for the complete analysisFull Example
Data on the course website
  • DuPont mail survey data for 58 respondents.
  • One set of questions involved satisfaction with
    DuPont in 6 primary areas.
  • An additional set of questions were about
    interests of the company, their size, number of
    employees, etc

34
Complete Analysis from start to finishFull
Example Data
Demographic/Background questions
Exp1 interest in exporting (1L, 2M, 3H) Size
Number of employees in thousands Revenue
Amount sold to that company by DuPont in
MM Years Number of years as a DuPont
customer Numprod Number of products that they
buy from DuPont
Survey Questions
  • Q1-Q4 questions about quality
  • TS1-TS3 questions about tech support
  • SM1-SM2 questions about sales and marketing
    support
  • SD1-SD7 questions about supply and delivery
  • INN1-INN3 questions about innovation
  • Overall overall satisfaction

35
As an example Please rate DuPont on a 1-10
scale on each of the following questions
Likert scale questions, treated as continuous,
grouped into a priori blocks, randomization
across blocks and within blocks.
36
Managerial Goals of the survey
  • Assess satisfaction in each of a number of key
    areas.
  • Can be done via means and cross-tabs
  • Confirm the question structure as given by the
    six question areas.
  • Can be done by a confirmatory factor analysis
  • Understand heterogeneity across market segments.
  • Can be done via clustering
  • Understand the relationship between satisfaction
    along
  • dimensions and sales.
  • Can be done via regression

37
Step 1 Exploratory analyses viaHistograms and 5
number summaries
Wide variation in size
Most companies have 7 years or less
Less than 4 products
Exp1 Size Revenue
Years Numprod Min 1.0000000
27.00000 0.3000000 4.5000000 2.000000 Mean
2.0000000 42.51724 1.5465517 6.2810345 4.827586
Max 3.0000000 69.00000 4.0000000 9.5000000
11.000000 Std Dev. 0.8583951 10.70209 0.8698085
0.9291492 2.609953
38
Scatterplot Matrix Demographic variables



39
Correlation Matrix of Demographics
Exp1 Size Revenue
Years Numprod Exp1 1.00000000 0.2234370
0.23967042 0.3343459 0.09396939 Size
0.22343700 1.0000000 -0.11910395 -0.2105350
0.56916018 Revenue 0.23967042 -0.1191040
1.00000000 0.2375097 -0.07059154 Years
0.33434595 -0.2105350 0.23750966 1.0000000
-0.16559460 Numprod 0.09396939 0.5691602
-0.07059154 -0.1655946 1.00000000
Hypothesis test for correlation
Export interest and Years significantly
correlated with revenue
40
Analysis of Means of Survey Items
Q1 Q2 Q3 Q4
TS1 TS2 Mean 8.448276 8.551724
8.137931 8.637931 7.862069 8.189655 Std Dev.
1.230781 1.512201 1.549740 1.150107 1.810770
1.648571 TS3 SM1 SM2
SD1 SD2 SD3 Mean 8.224138
8.293103 7.948276 8.672414 8.775862 8.189655 Std
Dev. 1.533750 1.991285 2.258880 1.160580
1.060339 1.721453 SD4 SD5
SD6 SD7 INN1 INN2 Mean
8.189655 8.275862 9.000000 9.017241 7.224138
7.431034 Std Dev. 1.711231 1.598169 1.008734
1.177142 2.449798 2.256469 INN3
OVERALL Mean 7.931034 8.327586 Std Dev.
1.936375 1.647837
DuPont rates relatively poorly in terms of
Innovativeness.
Focus groups after the fact revealed that RD
promises not kept were the cause.
41
Step 2 Run A Factor Analysis On the 19 Itemsto
create the super-variables
Statistics-Multivariate-Factor Analysis
(highlight the 19 items) Type in the number of
factors (six in this case), in general, you can
run an increasing number of factors. Type in
where you want the output saved under the
predictions tab. Under the results tab click
on everything and type in 0.4 (some people use
higher, some lower), for the cutoff loading
value Under the plots click on all the
buttons Hit OK to run.
42
How Many Factors?
Importance of factors Factor1
Factor2 Factor3 Factor4 Factor5 SS
loadings 2.8352286 2.5379725 2.4980395 2.3006915
1.64812986 Proportion Var 0.1492226 0.1335775
0.1314758 0.1210890 0.08674368 Cumulative Var
0.1492226 0.2828001 0.4142758 0.5353649
0.62210853 Factor6 SS
loadings 1.3664325 Proportion Var
0.0719175 Cumulative Var 0.6940260  
(1) Each of the six factors has greater than 1/19
0.053 proportion variance explained
(2) Sums of squares of loadings Factor1
Factor2 Factor3 Factor4 Factor5 Factor6
Factor7 2.815683 2.577082 2.54258 2.089417
1.741779 1.26776 0.7709414   The number of
variables is 19 and the number of observations is
58   Test of the hypothesis that 7 factors are
sufficient versus the alternative that more are
required The chi square statistic is 54.31 on 59
degrees of freedom. The p-value is 0.649
It appears that 7 is not needed.
43
How well does the model fit the variables?
  • 69.4 of the variance explained. Not bad 6/19
    variables have almost
  • 70 of the variance.

II. Uniquenesses Q1 Q2 Q3
Q4 TS1 TS2 TS3 0.5484324
0.3415941 0.3516339 0.4188738 0.238474 0.2168948
0.1659384 SM1 SM2 SD1
SD2 SD3 SD4 0.07444144 0.03017874
0.4193007 0.2150707 0.2188743 0.07769325
SD5 SD6 SD7 INN1 INN2
INN3 0.7253591 0.7646984 0.1970875 0.4627636
0.3462046 4e-010
Factor Analysis solution does not fit Supply and
Delivery Items Well
44
Does the Factor Analysis confirm
theQuestionnaire Design?
  Loadings Factor1 Factor2 Factor3 Factor4
Factor5 Factor6 Q1 0.504
Q2 0.722
Q3 0.731
Q4 0.753
TS1 0.424
0.418 TS2
0.409 0.655 TS3
0.814 SM1
0.898 SM2
0.425 0.791
SD1 0.710
SD2 0.855
SD3 0.709
SD4 0.863
SD5
SD6
SD7 0.863
INN1
0.556 INN2
0.546 INN3
0.891
Quality Factor somewhat confirmed
Technical Support not entirely distinct from
Supply and Delivery Issues
Sales and Marketing somewhat supported
Sales and Marketing fractionated into Complaint
Handling
Technical Support
Innovativeness
45
Interpret and name the factors(Go back to
questions and see what they really are)
  • Factor 1 Quality
  • Factor 2 Meeting Order Needs
  • Factor 3 Sales and Marketing Support
  • Factor 4 Complaint Handling
  • Factor 5 Technical Support
  • Factor 6 Innovativeness
  • Comment This confirms the underlying survey
    design, which is one use of factor analysis. Did
    things cluster the way the study was designed.
  • Called Confirmatory Factor Analysis

46
Which variables are targets for removal?
  Loadings Factor1 Factor2 Factor3 Factor4
Factor5 Factor6 Q1 0.504
Q2 0.722
Q3 0.731
Q4 0.753
TS1 0.424
0.418 TS2
0.409 0.655 TS3
0.814 SM1
0.898 SM2
0.425 0.791
SD1 0.710
SD2 0.855
SD3 0.709
SD4 0.863
SD5
SD6
SD7 0.863
INN1
0.556 INN2
0.546 INN3
0.891
These two questions dont load highly on any
factors.
47
Step 3 Utilize factor scores to form groups
  • Cluster Analyze the factor scores
  • Statistics-Cluster Analysis-Agglomerative
    Hierarchical (run a hierarchical clustering
    routine), on plot option type in clustering tree.

Two clean clusters, remaining points dont join
that closely.
48
Step 4 Run k-means clustering on factor scores
Statistics-Cluster Analysis-K-means Type in 3
clusters, and under results, save cluster
membership
K-Means Clustering Centers
Factor1 Factor2 Factor3 Factor4
Factor5 1, -0.19500655 -0.4244374 -0.4746552
-1.6262781 -0.8575811 2, 0.02633563 0.1068013
0.2572054 0.3013399 0.1181485 3, 0.25864208
-0.5998617 -3.9083794 0.2367636 1.0826261
Factor6 1, 0.11797696 2,
0.02920225 3, -1.21714908   Clustering vector
1 1 2 2 1 2 2 2 2 1 2 2 1 2 1 2 2 2 2 2 2 2 2 2
1 2 2 2 2 2 2 2 2 2 2 35 1 2 2 2 2 2 2 2 1 2 2
2 2 2 2 2 2 2 1 2 2 3 2 3   Within cluster sum of
squares 1 82.280812 141.551988
9.547397   Cluster sizes 1 9 47 2
One large cluster and two minor ones.
Cluster one is dissatisfied with all factors
Cluster two is mostly satisfied with all
49
Step 5 Profile the clusters based onthe
demographics
  • Now we profile the groups based on Exp1, Size ,
    Revenue, Years, Numprod
  • Statistics-Data Summaries-Summary Statistics
  • Highlight the five variables above and use
    cluster-id
  • as the grouping variable

Compare to overall norms to assess the cluster.
--------------------------------------------------
---- cluster.id2 Exp1 Size
Revenue Years Numprod Min
1.0000000 27.000000 0.3000000 4.5000000
2.0000000 1st Qu. 1.0000000 33.500000
0.9000000 5.5000000 3.0000000 Mean
1.9574468 41.191489 1.4702128 6.2617021
4.6382979 Median 2.0000000 39.000000
1.0000000 6.0000000 4.0000000 3rd Qu.
3.0000000 48.500000 2.0000000 7.0000000
6.0000000 Max 3.0000000 69.000000
3.6000000 9.5000000 11.0000000 Total N
47.0000000 47.000000 47.0000000 47.0000000
47.0000000 NA's 0.0000000 0.000000
0.0000000 0.0000000 0.0000000 Std Dev.
0.8586503 10.237693 0.7857190 0.9449070
2.6078938 SE Mean 0.1252470 1.493321
0.1146089 0.1378288 0.3804004 LCL Mean
1.7053376 38.185590 1.2395170 5.9842670
3.8725916 UCL Mean 2.2095560 44.197389
1.7009085 6.5391372 5.4040042 ------------------
------------------------------------
50
Understand the drivers of revenue in each cluster
Ran a regression with DV Revenue and the
demographics and factor scores as independent
variables for persons in Cluster 2.
Value Std. Error t value Pr(gtt)
(Intercept) 1.2465 0.2728 4.5687 0.0000
Exp1 0.1928 0.1252 1.5405 0.1308
Factor1 0.1810 0.1194 1.5162 0.1368
Factor3 0.6163 0.2283 2.6992 0.0099
  Residual standard error 0.7243 on 43 degrees
of freedom Multiple R-Squared 0.2056
F-statistic 3.709 on 3 and 43 degrees of
freedom, the p-value is 0.0185  
Ran a stepwise regression with all covariates and
six factor scores
Export interest, and Quality and Sales and
Marketing Support are key drivers of revenue
Normally, due this cluster by cluster.
51
Summary
  • Do exploratory analyses on the data (face
    validity, low hanging fruit)
  • Create Super-variables/Constructs via factor
    analysis
  • Name factors to make them meaningful and
    managerially actionable
  • Form Market Segments via cluster analysis of
    factor scores
  • Profile the clusters based on demographics/behavio
    rs using simple cross-tabs
  • Run regressions cluster-by-cluster to see
    significant drivers.

52
Punch Line
  • Factor Analysis can be used in Marketing Research
    to
  • Help focus decision making on a few key
    constructs
  • Create independent constructs
  • Understand the relationships between variables
  • Act as variables in future studies

53
Lets see what we knowabout Marketing Research
  • http//www.gactr.uga.edu/is/mr/quiz/index.phtml
Write a Comment
User Comments (0)
About PowerShow.com