Title: Regression Analysis in Marketing Research
1Regression Analysis in Marketing Research
- Dr. John T. Drea
- Professor of Marketing
- Western Illinois University
2Prediction
- Two approaches to prediction
- Extrapolation using past events to predict
future events (like steering a canoe by looking
behind you) - Predictive modeling using one or more other
variables to predict another variable - Ex Could you predict success in a course if you
knew the hours spent studying, number of other
semester hours taken, and hours spent working? - The difference between a predicted value and an
actual value is called a residual.
3Variables
- The dependent variable is what we are trying to
predict - it is typically represented by Y. - The independent variable is a variable used to
predict the dependent variable - it is typically
represented by x. - Note that independent variable predicts the
dependent variable - it cannot be stated that the
independent variable (x) causes changes in the
dependent variable (Y). - Regression typically uses interval/ratio scales
variables as the independent and dependent
variable. You can also use dummy coding (1, 0)
for nominally scaled measures (a 1 if a
characteristic is present, a 0 if that
characteristic is absent.
4Bivariate Regression
- Bivariate linear regression (simple regression)
investigates a straight line relationship of the
type - Regression basically fits the data to a straight
line, where a is the intercept point and b is the
slope of the line.
Y a bx e
a
where Y is the dependent variable, x is the
independent variable, and a and b are two
constants to be estimated.
SPSS fits the line to minimize vertical distances
between points and the regression line. This is
called the least squares criterion.
5Bivariate Regression in SPSS
Step 1
This is Y
Step 2
This is x
6Bivariate Regression in SPSS Results
19.0 of the variation in BIAmtrak can be
accounted for by AAmtrak., meaning 81 of the
variation is unaccounted for.
The equation is significantly better than chance,
as evidenced by the F-value
The significant t-value suggests that Aamtrak
belongs in the equation. The significant
constant indicates there is considerable
variation unexplained.
The unstandardized equation would be Y 3.132
.507(Aamtrak) Thus, if a subject had an Aamtrak
score of 2, the equation would predict Y 3.132
.507(2) 4.146
7Multiple Regression
- Multiple regression allows for the simultaneous
investigation of two or more independent
variables and a single dependent variable. - Multiple regression is quite useful - it is
likely that several variables are related to an
independent variable. - Regression is useful when we want to explain,
predict, or control a dependent variable. - The use of the unstandardized coefficients allows
you to use the equation in a very practical way.
The form for an unstandardized equation is Y a
b1x1 b2x2 bixi
8Multiple Regression
- Each coefficient in multiple regression is also
known as a coefficient of partial regression - it
assesses the relationship between itself (Xi) and
the dependent variable (Y) not accounted for by
other variables in the model. - Each variable introduced into the equation needs
to account for variation in Y that has not be
accounted for by any of the X variables already
entered. - We typically assume that the X variables are
uncorrelated with one another. If they are not
uncorrelated, we have a problem of
multicollinearity.
9Multiple regression
- Multicollinearity is a problem in regression - it
occurs when the independent variables are highly
correlated with one another. - Multicollinearity does not affect the models
overall ability to predict, but it can impact the
interpretation of individual coefficients. - Multicollinearity can be assessed through the use
of a statistic, the variance inflation factor
(VIF) - If VIF lt 10, multicollinearity is not a problem.
- If VIF gt 10, remove the variable from the
independent variables and run the analysis again.
10Interpreting Regression Results
- R2
- It is a coefficient of determination - it
indicates the percentage of of variation in Y
explained by the variation in the independent
variables (Xi). It determines the goodness of
fit for your model (regression equation). It
ranges from 0-1.0. - Std. error of the estimate
- It measures the accuracy of predictions using the
regression equation. - The smaller the std. error of the estimate, the
smaller the confidence interval (the more precise
the prediction)
11Interpreting Regression Results
- F-values
- The F-value determines whether the equation is
better than chance. A p-value of .05 or lower
indicates we would reject the null hypothesis
that the independent variables are not related to
the dependent variable. - The F-value does not measure whether your model
does a good job of predicting - only that it is
better than chance. - T-tests
- Examine the t-values to determine whether to
include additional variables into the model.
T-values should be statistically significant to
be included in your analysis.
12Interpreting Regression Results
- Unstandardized coefficients (abbreviated as B)
- These are written in the metric of the measure,
which makes them useful for prediction. - Standardized coefficients (beta)
- These are written in a standardized form, ranging
from 0 to 1. - The higher the value of the standardized
coefficient, the more important the predictor is
to the model. (i.e., the more unique variation in
Y than can be accounted for by that variable) - Introducing more variables into an equation
typically explains more variation (increases R2),
but each variable must be a significant
contributor of otherwise unexplained variation to
include in the model (see T-test results to
determine this.)
13Multiple Regression in SPSS
Step 1
Step 2
14Multiple Regression in SPSS Results
- Note that the circled t-values for two of the
variables are not significant these do not
supply any unique variation to the prediction of
the dependent variable, so they should be removed
from analysis. - Note the standardized coefficients (beta) the
greater the beta, the more important a variable
is to the prediction of the dependent variable. - Finally, not the size of the t-value for the
constant this suggests the model still has
considerable unexplained variation.
Y 3.219 .235(Aamtrak, Good/Bad)
.245(Aamtrak, like/dislike) - .0638(Aauto,
goob/bad)
15Multiple Regression in SPSS Results
The model indicates that the five predictors
account for 21.5 of the variation in Aamtrak.
The F-value suggests that the equation is
significantly better than chance.
16Multiple Regression
- Example Toy Manufacturer Sales Hypothesis How
are weekly toy sales affected by - changes in levels of advertising,
- the use of sales reps vs. agents for calling on
retailers, and - local school enrollments?
Toy Sales Advertising(X1) sales rep/agent(X2)
school enrollment(X3) e To do this, we need to
dummy code sales rep 1 or agent 0. This
produces the following equation Y 102.18
3.87X1 115.2X2 6.73X3 R2 0.845 So what
does this mean?
17Multiple Regression
- So what do those coefficients mean?
Y 102.18 3.87X1 115.2X2 6.73X3 e If the
other variables are held constant, you could
state X1 1 spent in advertising yields 3.87
in sales. X2 The use of a salesperson instead of
an agent contributes 115.20 in additional
sales. X3 Each additional school enrollment
yields 6.73 in toy sales. These three variables
explain 84.5 of the variation in toy sales. If
we spent 1000 in advertising, used a sales rep.,
and there are 500 children in the local schools,
what would sales be? Y 102.18 3.87(1000)
115.2(1) 6.73(500) Y 7350.2