Multiple Regression - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Multiple Regression

Description:

The statistic for the test is We can also test whether a particular coefficient j is zero ... If two people have the same education and gender, ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 34
Provided by: KarenL99
Category:

less

Transcript and Presenter's Notes

Title: Multiple Regression


1
Multiple Regression
2
In the previous section, we examined simple
regression, which has just one independent
variable on the right side of the equation.
In this section, we consider multiple regression,
in which there are two or more independent
variables on the right side of the equation.
3
Simple Regression
Multiple Regression
True Relation
Yi ? ?Xi ?i
Yi ? ?1X1i ?2X2i ?kXki ?i
Estimated Relation
Yi a bXi ei
Yi a b1X1i b2X2i bkXki ei
The number of Xs (independent variables) will
be denoted as k. We are estimating k1
parameters, the k ?s and the constant ?.
4
We have similar assumptions to the ones we used
in simple regression. The assumptions are
  • The Y values are independent of each other.
  • The conditional distributions of Y given the Xs
    are normal.
  • The conditional standard deviations of Y given
    the Xs are equal for all values of the Xs.

5
We continue to use OLS (ordinary least squares).
It is much more difficult to do multiple
regression with a hand calculator than simple
regression is, but computer programs perform it
very easily and quickly.
6
As in simple regression, we have
7
The standard error of the regression or the
standard error of the estimate is
In simple regression, there was only one X, so k
was 1 and our denominator was (n 2) . Here the
denominator is generalized to (n k 1).
8
The Regression ANOVA Table is now
Source of Variation Sum of squares Degrees of freedom Mean square
Regression k MSRSSR/k
Error n k 1 MSESSE/(n k 1)
Total n 1 MSTSST/(n 1)
9
The hypotheses for testing the overall
significance of the regression are
H0 ?1 ?2 ?k 0 (all the slope
coefficients are zero) H1 at least one of the
?s is not zero.
The statistic for the test is
10
We can also test whether a particular coefficient
?j is zero (or any other specified value), using
a t-statistic.
The calculation of sbj is very messy, but sbj is
always given on computer output. We can do
one-tailed and two-tailed tests.
11
Coefficient of determination or R2
R2 adjusted or corrected for degrees of freedom
12
Dummy Variables
Dummy variables enable us to explore the effects
of qualitative rather than quantitative
factors. Side note Cross-sectional data
provides us with information on a number of
households, individuals, firms, etc. at a
particular point in time. Time-series data gives
us information on a particular household, firm,
etc. at various points in time. Suppose, for
example, we have cross-sectional data on income.
Dummy variables can give us an understanding of
how race, gender, residence in an urban area can
affect income. If we have time-series data on
expenditures, dummy variables can tell us about
seasonal effects.
13
To capture the effects of a factor that has m
categories, you need m 1 dummy variables.
Here are some examples.
Gender You are examining SAT scores. Since
there are 2 gender categories, you need 1 gender
variable to capture the effect of gender. If you
include a variable that is 1 for male
observations and 0 for females, the coefficient
on that variable tells how male scores compare to
female scores. In this case, female is the
reference category. Race You are examining
salaries and you have data for 4 races white,
black, Asian, and Native American. You only need
3 dummy variables. You might define a variable
that is 1 for blacks and 0 for non-blacks, a 2nd
variable that is 1 for Asians and 0 for
non-Asians, and a 3rd variable that is 1 for
Native Americans and 0 for non-Native Americans.
Then white would be the reference category and
the coefficients of the 3 race variables would
tell how salaries for those groups compare to
salaries for whites.
14
Coefficient interpretation example You have
estimated the regression
where SALARY is measured in thousands of dollars,
EDUC and EXP are education and experience, each
measured in years, and FEMALE is a dummy variable
equal to 1 for females and 0 for males. The
coefficients of the variables would be
interpreted as follows.
If there are two people with the same experience
and gender, and one has 1 more unit of education
(in this case, a year), that person would be
expected to have a salary that is 1.0 units
higher (in this case, 1.0 thousand dollars
higher). If there are two people with the same
education and gender, and one has 1 more year of
experience, that person would be expected to have
a salary that is 2.0 thousand dollars higher. If
there are two people with the same education and
experience, and one is male and one is female,
the female is expected to have a salary that is
5.0 thousand dollars less.
15
Consider 4 people with the following
characteristics.
education experience female salary
10 5 0
11 5 0
11 6 0
11 6 1
16
Consider 4 people with the following
characteristics.
education experience female salary
10 5 0 10 10 10 0 30
11 5 0
11 6 0
11 6 1
17
Consider 4 people with the following
characteristics.
education experience female salary
10 5 0 10 10 10 0 30
11 5 0 10 11 10 0 31
11 6 0
11 6 1
If two people have the same experience and
gender, the one that has one more year of
education, would be expected to earn 1.0 thousand
dollars more.
18
Consider 4 people with the following
characteristics.
education experience female salary
10 5 0 10 10 10 0 30
11 5 0 10 11 10 0 31
11 6 0 10 11 12 0 33
11 6 1
If two people have the same education and gender,
the one that has one more year of experience,
would be expected to earn 2.0 thousand dollars
more.
19
Consider 4 people with the following
characteristics.
education experience female salary
10 5 0 10 10 10 0 30
11 5 0 10 11 10 0 31
11 6 0 10 11 12 0 33
11 6 1 10 11 12 5 28
If two people have the same education and
experience, the female would be expected to earn
5.0 thousand dollars less than the male.
20
Suppose you have regression results based on
quarterly data for a particular household.
SPENDING and INCOME are in thousands of dollars.
WINTER equals 1 if the quarter is winter and 0 if
it is fall, spring or summer. SPRING is 1 if the
quarter is spring and 0 if it is fall, winter or
summer. SUMMER is 1 if the quarter is summer and
0 if it is fall, spring or winter. Suppose,
household income is 10 thousand dollars for all 4
quarters of a particular year. In the fall,
spending would be expected to be 17 thousand
dollars. In the spring, spending would be
expected to be 2.0 thousand dollars higher than
in fall or 19 thousand dollars. In the winter,
spending would be expected to be 3.0 thousand
dollars higher than in the fall or 20 thousand
dollars. In the summer, spending would be
expected to be 1.0 thousand dollars less than in
the fall or 16 thousand dollars.
21
Example You have run a regression with 30
observations. The dependent variable, WGT, is
weight measured in pounds. The independent
variables are HGT, height measured in inches and
a dummy variable, MALE, which is 1 if the person
is male and 0 if the person is female. The
results are as shown below. Answer the questions
that follow.
source of variation sum of squares degrees of freedom mean square
regression 25,414.01 2 12,707.01
error 8,573.80 27 317.48
total 33,987.81 29 1171.99
variable estimated coefficient estimated std. error
CONSTANT -160.129 50.285
HGT 4.378 1.103
MALE 27.478 9.520
22
1. Interpret the HGT coefficient.
If there are 2 people of the same gender and one
is an inch taller than the other, the taller one
is expected to weigh 4.378 pounds more.
source of variation sum of squares degrees of freedom mean square
regression 25,414.01 2 12,707.01
error 8,573.80 27 317.48
total 33,987.81 29 1171.99
variable estimated coefficient estimated std. error
CONSTANT -160.129 50.285
HGT 4.378 1.103
MALE 27.478 9.520
23
2. Interpret the MALE coefficient.
If there are 2 people of the same height, and one
is male and one is female, the male is expected
to weigh 27.478 pounds more.
source of variation sum of squares degrees of freedom mean square
regression 25,414.01 2 12,707.01
error 8,573.80 27 317.48
total 33,987.81 29 1171.99
variable estimated coefficient estimated std. error
CONSTANT -160.129 50.285
HGT 4.378 1.103
MALE 27.478 9.520
24
3. Calculate and interpret the coefficient of
determination R2. Also calculate the adjusted R2.
About 75 of the variation in weight is explained
by the regression on height and gender.
source of variation sum of squares degrees of freedom mean square
regression 25,414.01 2 12,707.01
error 8,573.80 27 317.48
total 33,987.81 29 1,171.99
variable estimated coefficient estimated std. error
CONSTANT -160.129 50.285
HGT 4.378 1.103
MALE 27.478 9.520
25
4. Test at the 5 level whether the HGT
coefficient is greater than zero. (Note that is
the alternative hypothesis.)
From our t table, we see that for 27 dof, and a
1-tailed 5 critical region, our critical value
is 1.703. Since the value of our statistic is
3.97, we reject H0 and accept H1 the HGT
coefficient is greater than zero.
source of variation sum of squares degrees of freedom mean square
regression 25,414.01 2 12,707.01
error 8,573.80 27 317.48
total 33,987.81 29 1,171.99
variable estimated coefficient estimated std. error
CONSTANT -160.129 50.285
HGT 4.378 1.103
MALE 27.478 9.520
Since our t value of 5.27 is in the critical
region, we reject H0 and accept H1 that the
population correlation ? is not zero.
26
5. Test at the 1 level whether the MALE
coefficient is different from zero. (Note that
is the alternative hypothesis.)
From our t table, we see that for 27 dof, and a
2-tailed 1 critical region, our critical values
are 2.771 and -2.771. Since the value of our
statistic is 2.89, we reject H0 and accept H1
the MALE coefficient is different from zero.
source of variation sum of squares degrees of freedom mean square
regression 25,414.01 2 12,707.01
error 8,573.80 27 317.48
total 33,987.81 29 1,171.99
variable estimated coefficient estimated std. error
CONSTANT -160.129 50.285
HGT 4.378 1.103
MALE 27.478 9.520
Since our t value of 5.27 is in the critical
region, we reject H0 and accept H1 that the
population correlation ? is not zero.
27
6. Test the overall significance of the
regression at the 1 level.
From our F table, we see that for 2 and 27 dof,
and a 1 critical region, our critical value is
5.49. Since the value of our statistic is 40.02,
we reject H0 and accept H1 at least one of the
slope coefficients is not zero.
source of variation sum of squares degrees of freedom mean square
regression 25,414.01 2 12,707.01
error 8,573.80 27 317.48
total 33,987.81 29 1,171.99
variable estimated coefficient estimated std. error
CONSTANT -160.129 50.285
HGT 4.378 1.103
MALE 27.478 9.520
28
Multicollinearity Problem
Multicollinearity arises when independent
variables Xs are highly correlated. Then it is
not possible to separate the effects of the these
variables on the dependent variable Y. The slope
coefficient estimates will tend to be unreliable,
and often are not significantly different from
zero. The simplest solution is to delete one of
the correlated variables.
29
Example You are exploring the factors
influencing the number of children that a couple
has.
You have included as Xs the mothers education
and the fathers education. You find that neither
appears to be statistically significantly
different from zero. This may occur because the
two education variables are highly
correlated. One option is to include only the
education of one parent. Alternatively, you could
use replace the two education variables with just
one variable that might be either the average or
total education of the parents.
30
Problem of Autocorrelation or Serial Correlation
This is a problem that may arise in time-series
data, but generally not in cross-sectional data.
It occurs when successive observations of the
dependent variable Y are not independent of each
other. For example, if you are examining the
weight of a particular person over time, if that
that weight is particularly high in one period,
it is likely to be high in the next period as
well.
Therefore, the residuals tend to be correlated
among themselves (autocorrelated) rather than
independent.
31
You can test for autocorrelation using the
Durbin-Watson statistic
The Durbin-Watson statistic d is always between 0
and 4. When there is extreme negative
autocorrelation, d will be near 4. When there is
extreme positive autocorrelation, d will be near
0. When there is no problem of autocorrelation, d
will be near 2. In many computer statistical
packages you can request that the DurbinWatson
be provided as output. You can look up critical
values in a table that then allows you to
determine if you have an autocorrelation problem.
32
The Durbin-Watson table provides two numbers dL
and dU corresponding to the number n of
observations and the number k of explanatory
variables (Xs).
Your textbook provides one-tailed values, so you
can test for positive autocorrelation or
negative autocorrelation but not positive or
negative autocorrelation at the same time. The
null hypothesis is that there is no
autocorrelation. The diagram below indicates
which regions are indicative of positive
autocorrelation, negative autocorrelation, no
autocorrelation, or are inconclusive.
33
Example You have run a time-series regression
with 25 observations and 4 independent variables.
Your Durbin-Watson statistic d 0.70 . Test at
the 1 level whether you have a positive
autocorrelation problem.
The Durbin-Watson table indicates that for 25
observations and 4 independent variables, dL
0.83 and dU 1.52 . This implies the following
diagram.
0.83 1.52 2.48
3.17
You reject H0 no autocorrelation and accept H1
there is a positive autocorrelation
problem. There are techniques for handling
autocorrelation problems, but they are beyond the
scope of this course.
Write a Comment
User Comments (0)
About PowerShow.com