Title: Multiple Regression Fitting Models for Multiple Independent Variables
1Multiple Regression Fitting Models for Multiple
Independent Variables
2If you wanted to predict someones weight based
on their height, you would collect data by
recording the height and weight and fit a model.
3If you wanted to predict someones weight based
on their height, you would collect data by
recording the height and weight and fit a
model. Lets say our population are males ages
16-25, and this is a table of collected data...
4If you wanted to predict someones weight based
on their height, you would collect data by
recording the height and weight and fit a
model. Lets say our population are males ages
16-25, and this is a table of collected data...
5Next, we graph the data..
6Next, we graph the data..
7Next, we graph the data..
And because the data looks linear, fit an LSR
line
8Next, we graph the data..
And because the data looks linear, fit an LSR
line
9But weight isnt the only factor that has an
impact on someones height. The height of
someones parents may be another predictor.
With multiple regression you may have more then
one independent variable, so you could use
someone's weight and his parents height to
predict his own height.
10Our new table, with the data, the average height
of a subjects parents, looks like this
11This data cant be graphed like simple linear
regression, because there are two independent
variables.
12This data cant be graphed like simple linear
regression, because there are two independent
variables.
There is software, however, such as Minitab, that
can analyze data with multiple independent
variable. Lets take a look at a Minitab output
for our data
13Predictor Coef Stdev t-ratio
p Constant 25.028 4.326 5.79
0.000 weight 0.24020 0.03140 7.65
0.000 parenth 0.11493 0.09035
1.27 0.227 Â s 1.165 R-sq 92.6
R-sq(adj) 91.4  Analysis of Variance  SOURCE
DF SS MS F
p Regression 2 205.31 102.65
75.62 0.000 Error 12 16.29
1.36 Total 14 221.60 Â
What does all this mean?
14First, Lets look at the multiple regression
model
The general model for multiple regression is
similar to the model for simple linear regression.
Simple linear regression model
Multiple regression model
15Just like linear regression, when you fit a
multiple regression to data, the terms in the
model equation are statistics not parameters.
A multiple regression model using statistical
notation looks like...
where k is the number of independent variables.
16The multiple regression model for our data is
We get the coefficient values from the Minitab
output
Predictor Coef Stdev t-ratio p Constant
25.028 4.326 5.79 0.000 weight
0.24020 0.03140 7.65 0.000 parenth
0.11493 0.09035 1.27 0.227
17Once the regression is fitted, we need to know
how well the model fits the data
- First, we check and see if there is a good
overall fit.
- Then, we test the significance of each
independent variable. You will notice that this
is the same way we test for significance in a
simple linear regression.
18 The Overall Test
Hypotheses
19 The Overall Test
Hypotheses
All independent variables are unimportant for
predicting y
20 The Overall Test
Hypotheses
All independent variables are unimportant for
predicting y
At least one
At least one independent variable is useful for
predicting y
21What type of test should be used?
The distribution used is called the Fischer
distribution. The F-Statistic is used with this
distribution.
lt-- Fischer Distribution
22How do you calculate the F-statistic?
23How do you calculate the F-statistic? It can
easily be found in the Minitab output, along with
the p-value
24How do you calculate the F-statistic? It can
easily be found in the Minitab output, along with
the p-value SOURCE DF SS MS F
p Regress 2 205.31 102.65 75.62 0.000 Error
12 16.29 1.36 Total 14 221.60
Or you can calculate it by hand
25But, before you can calculate the F-statistic,
you need to be introduced to some other terms.
26But, before you can calculate the F-statistic,
you need to be introduced to some other
terms. Regression sum of squares (regression SS)
- the variation in Y accounted for by the
regression model with respect to the mean model
27But, before you can calculate the F-statistic,
you need to be introduced to some other
terms. Regression sum of squares (regression SS)
- the variation in Y accounted for by the
regression model with respect to the mean
model Error sum of squares (error SS) - the
variation in Y not accounted for by the
regression model.
28But, before you can calculate the F-statistic,
you need to be introduced to some other
terms. Regression sum of squares (regression SS)
- the variation in Y accounted for by the
regression model with respect to the mean
model Error sum of squares (error SS) - the
variation in Y not accounted for by the
regression model. Total sum of squares (total SS)
- the total variation in Y
29Now that we understand these terms we need to
know how to calculate them
30Now that we understand these terms we need to
know how to calculate them
Regression SS Error SS Total SS
Total SS Regression SS Error SS
31There are also regression mean of squares, error
mean of squares, and total mean of squares
(abbreviated MS).
32There are also regression mean of squares, error
mean of squares, and total mean of squares
(abbreviated MS). To calculate these terms, you
divide the sum of squares by its respective
degrees of freedom
33There are also regression mean of squares, error
mean of squares, and total mean of squares
(abbreviated MS). To calculate these terms, you
divide the sum of squares by its respective
degrees of freedom Regression d.f. k Error
d.f. n-k-1 Total d.f. n-1
34There are also regression mean of squares, error
mean of squares, and total mean of squares
(abbreviated MS). To calculate these terms, you
divide the sum of squares by its respective
degrees of freedom Regression d.f. k Error
d.f. n-k-1 Total d.f. n-1 Where k is the
number of independent variables and n is the
total number of observations used to calculate
the regression
35So Regression MS Error MS Total MS
and Regression MS Error MS Total MS
36Both sum of squares and mean squares values can
be found in Minitab
37Both sum of squares and mean squares values can
be found in Minitab
SOURCE DF SS MS F p Regress 2
205.31 102.65 75.62 0.000 Error 12 16.29
1.36 Total 14 221.60
38Both sum of squares and mean squares values can
be found in Minitab
SOURCE DF SS MS F p Regress 2
205.31 102.65 75.62 0.000 Error 12 16.29
1.36 Total 14 221.60
Now we can calculate the F-statistic.
39Test Statistic and Distribution
F model mean square error mean
square F 102.65 1.36 F
75.48 Which is very close to F-statistic from
Minitab ( 75.62)
40The p-value for the F-statistic is then found in
a F-Distribution Table. As you saw before, it
can also be easily calculated by software. A
small p-value rejects the null hypothesis that
none of the independent variables are
significant. That is to say, at least one of the
independent variables are significant.
41The conclusion in the context of our data is We
have strong evidence (p is approx. 0) to reject
the null hypothesis. That is to say either
someones weight or their average parents height
is significant in predicting his height.
Once you know that at least one independent
variable is significant, you can go on to test
each independent variable separately.
42Testing Individual Terms
If an independent variable does not contribute
significantly to predicting the value of Y, the
coefficient of that variable will be 0. The
test of the these hypotheses determines whether
the estimated coefficient is significantly
different from 0. From this, we can tell
whether an independent variable is important for
predicting the dependent variable.
43Test for Individual Terms
44Test for Individual Terms
HO
45Test for Individual Terms
HO The independent variable, xj, is not
important for predicting y
46Test for Individual Terms
HO The independent variable, xj, is not
important for predicting y HA
47Test for Individual Terms
HO The independent variable, xj, is not
important for predicting y HA The independent
variable, xj, is important for predicting y
48Test for Individual Terms
HO The independent variable, xj, is not
important for predicting y HA The independent
variable, xj, is important for predicting
y where j represents a specified random variable
49Test Statistic t
50Test Statistic t d.f. n-k-1
51Test Statistic t d.f. n-k-1 Remember,
this test is only to be performed, if the overall
model of the test is significant.
52? T-distribution
Tests of individual terms for significance are
the same as a test of significance in simple
linear regression
53A small p-value means that the independent
variable is significant. Predictor Coef Stdev
t-ratio p Constant 25.028 4.326 5.79
0.000 weight 0.24020 0.03140 7.65
0.000 parenth 0.11493 0.09035 1.27 0.227
This test of significance shows that weight is a
significant independent variable for predicting
height, but average parent height is not.
54- Now that you know how to do tests of significance
for multiple regression, there are many other
things that you can learn. Such as - How to create confidence intervals
- How to use categorical variables in multiple
regression - How to test for significance in groups of
independent variables