Chapter 11 Multiple Regression - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Chapter 11 Multiple Regression

Description:

... a large retailer was dissatisfied with their new store development experience. ... The director wanted to develop better criterion for choosing store locations. ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 37
Provided by: hui
Category:

less

Transcript and Presenter's Notes

Title: Chapter 11 Multiple Regression


1
Chapter 11Multiple Regression
2
Simple Regression
inferences
Regression Analysis
Extension
Multiple Regression
3
Introduction
Multiple regression enables us to determine the
simultaneous effect of several independent
variables on a dependent variable using the least
squares principle.
4
Example
  • The director of planning for a large retailer was
    dissatisfied with their new store development
    experience. In the past four years 25 of new
    stores failed to obtain their projected sales
    within the two-year trial period and were closed
    with substantial economic losses. The director
    wanted to develop better criterion for choosing
    store locations.
  • If you were hired as the consultant, after
    studying the historical experience of successful
    and unsuccessful stores, what kind of variables
    you would include in your model?

5
  • Size of a store
  • Traffic volume
  • Competing stores
  • Per capita income
  • Number of residence

6
Regression Objectives
  • Multiple regression provides two important
    results
  • A linear equation that predicts the dependent
    variable, Y, as a function of K independent
    variables, xji, j 1 , . . K.
  • The marginal change in the dependent variable, Y,
    that is related to a change in the independent
    variables measured by the partial coefficients,
    bjs.
  • The coefficients bj indicates the change in Y
    given a unit change in xj while controlling for
    the simultaneous effect of the other independent
    variables. (In some problems both results are
    equally important. However, usually one will
    predominate.)

7
Tests for Zero Population Correlation(continued)
2. The population multiple regression model
defines the relationship between a dependent or
endogenous variable, Y, and a set of independent
or exogenous variables, xj, j1, . . , K. The
xjis are assumed to be fixed numbers and Y is a
random variable, defined for each observation, i,
where i 1, . . ., n and n is the number of
observations. The model is defined as Where
the ?js are constant coefficients and the ?s
are random variables with mean 0 and variance ?2.
8
Assumptions
  • The population multiple regression model is
  • and we assume that n sets of observations are
    available. The following standard assumptions
    are made for the model.
  • The xs are fixed numbers, or they are
    realizations of random variables, Xji that are
    independent of the error terms, ?is. In the
    later case, inference is carried out
    conditionally on the observed values of the
    xjis.
  • The error terms are random variables with mean 0
    and the same variance, ?2. The later is called
    homoscedasticity or uniform variance.

9
Assumptions (cont.)
  • The random error terms, ?i , are not correlated
    with one another, so that
  • It is not possible to find a set of numbers, c0,
    c1, . . . , ck, such that
  • This is the property of no linear relation for
    the Xjs.

10
Least Square Method and Sample Multiple Regression
We begin with a sample of n observations denoted
as (x1i, x2i, . . ., xKi, yi , i 1, . . ,n)
measured for a process whose population multiple
regression model is The least-squares procedure
obtains estimates of the coefficients, ?1, ?2, .
. .,?K are the values b0 , b1, . . ., bK, for
which the sum of the squared deviations is a
minimum. The resulting equation is the sample
multiple regression of Y on X1, X2, . . ., XK.
11
Coefficient Estimators for k2
12
Example
The regression equation is Y Profit Margin
1.56 0.237 X1 Revenue 0.249 X2 Office Space
b0
b1
b2
13
Explanatory Power of a Multiple Regression
Given the multiple regression model fitted by
least squares Where the bjs are the least
squares estimates of the coefficients of the
population regression model and es are the
residuals from the estimated regression
model. The model variability can be partitioned
into the components Where Total Sum of
Squares
14
Error Sum of Squares Regression Sum of
Squares This decomposition can be interpreted
as
15
The coefficient of determination, R2, of the
fitted regression is defined as the proportion of
the total sample variability explained by the
regression and is and it follows that
16
Estimation of Error
Given the population regression model And the
standard regression assumptions, let ??2 denote
the common variance of the error term ?i. Then
an unbiased estimate of that variance is The
square root of the variance, Se is also called
the standard error of the estimate.
17
Adjusted R square
The adjusted coefficient of determination, R2,
is defined as We use this measure to correct
for the fact that non-relevant independent
variables will result in some small reduction in
the error sum of squares. Thus the adjusted R2
provides a better comparison between multiple
regression models with different numbers of
independent variables.
18
The regression equation is Y Profit Margin
1.56 0.237 X1 Revenue 0.25 X2 Office Space
R2
se
SSR
SSE
19
Coefficient of Multiple Regression
The coefficient of multiple correlation, is the
correlation between the predicted value and the
observed value of the dependent variable and is
equal to the square root of the multiple
coefficient of determination. We use R as
another measure of the strength of the linear
relationship between the dependent variable and
the independent variables. Thus it is comparable
to the correlation between Y and X in simple
regression.
20
Basis for Inference
Let the population regression model be Let b0,
b1 , . . , bK be the least squares estimates of
the population parameters and sb0, sb1, . . .,
sbK be the estimated standard deviations of the
least squares estimators. Then if the standard
regression assumptions hold and if the error
terms ?i are normally distributed, the random
variables corresponding to are distributed as
Students t with (n K 1) degrees of freedom.
21
Confidence Interval for Partial Regression
Coefficients
  • If the regression errors ?i , are normally
    distributed and the standard regression
    assumptions hold, the 100(1 - ?) confidence
    intervals for the partial regression coefficients
    ?j, are given by
  • Where t(n K - 1, ?/2) is the number for which
  • And the random variable t(n K - 1) follows a
    Students t distribution with (n K - 1) degrees
    of freedom.

22
The regression equation is Y Profit Margin
1.56 0.237 X1 Revenue 0.249 X2 Office Space
tb2
tb1
b1
b2
23
Example
  • Use the results from savings and loan regression
    model. Obtain 99 confidence intervals for ß1
    and ß2. (I will do ß1)
  • Step1 write down all the relevant information.
  • b1 Sb1
    t(n-k-1), a/2t22,0.0052.819
  • Step2 substitute all the information into the
    formula
  • Step3 simplify the upper limit and lower limit.
  • 0.081ltß1lt0.394

24
Test for Hypotheses
  • If the regression errors ?i are normally
    distributed and the standard least squares
    assumptions hold, the following tests have
    significance level ?
  • To test either null hypothesis
  • against the alternative
  • the decision rule is

25
  • 2. To test either null hypothesis
  • against the alternative
  • the decision rule is

26
  • 3. To test the null hypothesis
  • Against the two-sided alternative
  • the decision rule is

27
Example
  • Test if total revenue has a significant effect on
    increasing profit while controlling for the
    effect of the number of offices.
  • Thus, reject null. Data support.

28
Test on All the Parameters of a Regression Model
  • Consider the multiple regression model
  • To test the null hypothesis
  • against the alternative hypothesis
  • At a significance level ? we can use the decision
    rule
  • Where F K,n K 1,? is the critical value of F
    from Table 7 in the appendix for which
  • The computed F K,n K 1 follows an F
    distribution with numerator degrees of freedom k
    and denominator degrees of freedom (n K 1)

29
Test on subset of the Parameters
We compare the error sum of squares for the
complete model with the error sum of squares for
the restricted model. 1. run a regression for
the complete model that includes all the
independent variables and obtain SSE. 2. run a
restricted regression that excludes the Z
variables whose coefficients are the ?s -the
number of variables excluded is r. From this
regression obtain the restricted error sum of
squares SSE (r). 3. The compute the F statistic
and apply the decision rule for a significance
level ?
30
Predictions from the Multiple Regression Models
Given that the population regression
model holds and that the standard regression
assumptions are valid. Let b0, b1, . . . , bK be
the least squares estimates of the model
coefficients, ?j, j 1, 2, . . . ,K, based on
the x1i, x2i, . . . , xKi, yi (i 1, 2, . . .
n) data points. Then given a new observation of
a data point, x1,n1, x 2,n1, . . . , x K,n1
the best linear unbiased forecast of Y n1
is It is very risky to obtain forecasts that
are based on X values outside the range of the
data used to estimate the model coefficients,
because we do not have data evidence to support
the linear model at those points.
31
Quadratic Model
The quadratic function can be transformed into
a linear multiple regression model by defining
new variables And then specifying the model
as Which is linear in the transformed
variables. Transformed quadratic variables can
be combined with other variables in a multiple
regression model. Thus we could fit a multiple
quadratic regression using transformed variables.
32
Exponential Model
Coefficients for exponential models of the
form Can be estimated by first taking the
logarithm of both sides to obtain an equation
that is linear in the logarithms of the
variables Using this form we can regress the
logarithm of Y on the logarithm of the two X
variables and obtain estimates for the
coefficients ?1, ?2 directly from the regression
analysis. Note that this estimation procedure
requires that the random errors are
multiplicative in the original exponential model.
Thus the error term, ?, is expressed as a
percentage increase or decrease instead of the
addition or subtraction of a random error as we
have seen for linear regression models.
33
Dummy Variables
The relationship between Y and X1 can shift in
response to a changed condition. The shift
effect can be estimated by using a dummy variable
which has values of 0 (condition not present) and
1 (condition present). All of the observations
from one set of data have dummy variable X2 1,
and the observations for the other set of data
have X2 0. In these cases the relationship
between Y and X1 is specified by the regression
model
34
The functions for each set of points
are and In the first function the constant
is b0, while in the second the constant is b0
b2. Dummy variables are also called indicator
variables.
35
(No Transcript)
36
The resulting regression model is now linear
with three variables. The new variable x1x2 is
often called an interaction variable. Note that
when the dummy variable x2 0 this variable has
a value of 0, but when x2 1 this variable has
the value of x1. The coefficient b3 is an
estimate of the difference in the coefficient of
x1 when x2 1 compared to when x2 0. Thus the
t statistic for b3 can be used to test the
hypothesis If we reject the null hypothesis we
conclude that there is a difference in the slope
coefficient for the two subgroups. In many cases
we will be interested in both the difference in
the constant and difference in the slope and will
test both of the hypotheses presented in this
section.
Write a Comment
User Comments (0)
About PowerShow.com