Multiple Regression Fitting Models for Multiple Independent Variables - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

Multiple Regression Fitting Models for Multiple Independent Variables

Description:

If you wanted to predict someone's weight based on their height, you would ... The height of someone's parents may be another predictor. ... – PowerPoint PPT presentation

Number of Views:330
Avg rating:3.0/5.0
Slides: 55
Provided by: BB1
Category:

less

Transcript and Presenter's Notes

Title: Multiple Regression Fitting Models for Multiple Independent Variables


1
Multiple Regression Fitting Models for Multiple
Independent Variables
  • By Ellen Ludlow

2
If you wanted to predict someones weight based
on their height, you would collect data by
recording the height and weight and fit a model.
3
If you wanted to predict someones weight based
on their height, you would collect data by
recording the height and weight and fit a
model. Lets say our population are males ages
16-25, and this is a table of collected data...
4
If you wanted to predict someones weight based
on their height, you would collect data by
recording the height and weight and fit a
model. Lets say our population are males ages
16-25, and this is a table of collected data...
5
Next, we graph the data..
6
Next, we graph the data..
7
Next, we graph the data..
And because the data looks linear, fit an LSR
line
8
Next, we graph the data..
And because the data looks linear, fit an LSR
line
9
But weight isnt the only factor that has an
impact on someones height. The height of
someones parents may be another predictor.
With multiple regression you may have more then
one independent variable, so you could use
someone's weight and his parents height to
predict his own height.
10
Our new table, with the data, the average height
of a subjects parents, looks like this
11
This data cant be graphed like simple linear
regression, because there are two independent
variables.
12
This data cant be graphed like simple linear
regression, because there are two independent
variables.
There is software, however, such as Minitab, that
can analyze data with multiple independent
variable. Lets take a look at a Minitab output
for our data
13
Predictor Coef Stdev t-ratio
p Constant 25.028 4.326 5.79
0.000 weight 0.24020 0.03140 7.65
0.000 parenth 0.11493 0.09035
1.27 0.227   s 1.165 R-sq 92.6
R-sq(adj) 91.4   Analysis of Variance   SOURCE
DF SS MS F
p Regression 2 205.31 102.65
75.62 0.000 Error 12 16.29
1.36 Total 14 221.60  
What does all this mean?
14
First, Lets look at the multiple regression
model
The general model for multiple regression is
similar to the model for simple linear regression.
Simple linear regression model
Multiple regression model
15
Just like linear regression, when you fit a
multiple regression to data, the terms in the
model equation are statistics not parameters.
A multiple regression model using statistical
notation looks like...
where k is the number of independent variables.
16
The multiple regression model for our data is
We get the coefficient values from the Minitab
output
Predictor Coef Stdev t-ratio p Constant
25.028 4.326 5.79 0.000 weight
0.24020 0.03140 7.65 0.000 parenth
0.11493 0.09035 1.27 0.227
17
Once the regression is fitted, we need to know
how well the model fits the data
  • First, we check and see if there is a good
    overall fit.
  • Then, we test the significance of each
    independent variable. You will notice that this
    is the same way we test for significance in a
    simple linear regression.

18
The Overall Test
Hypotheses
19
The Overall Test
Hypotheses
All independent variables are unimportant for
predicting y
20
The Overall Test
Hypotheses
All independent variables are unimportant for
predicting y
At least one
At least one independent variable is useful for
predicting y
21
What type of test should be used?
The distribution used is called the Fischer
distribution. The F-Statistic is used with this
distribution.
lt-- Fischer Distribution
22
How do you calculate the F-statistic?
23
How do you calculate the F-statistic? It can
easily be found in the Minitab output, along with
the p-value
24
How do you calculate the F-statistic? It can
easily be found in the Minitab output, along with
the p-value SOURCE DF SS MS F
p Regress 2 205.31 102.65 75.62 0.000 Error
12 16.29 1.36 Total 14 221.60
Or you can calculate it by hand
25
But, before you can calculate the F-statistic,
you need to be introduced to some other terms.

26
But, before you can calculate the F-statistic,
you need to be introduced to some other
terms. Regression sum of squares (regression SS)
- the variation in Y accounted for by the
regression model with respect to the mean model

27
But, before you can calculate the F-statistic,
you need to be introduced to some other
terms. Regression sum of squares (regression SS)
- the variation in Y accounted for by the
regression model with respect to the mean
model Error sum of squares (error SS) - the
variation in Y not accounted for by the
regression model.

28
But, before you can calculate the F-statistic,
you need to be introduced to some other
terms. Regression sum of squares (regression SS)
- the variation in Y accounted for by the
regression model with respect to the mean
model Error sum of squares (error SS) - the
variation in Y not accounted for by the
regression model. Total sum of squares (total SS)
- the total variation in Y

29
Now that we understand these terms we need to
know how to calculate them
30
Now that we understand these terms we need to
know how to calculate them
Regression SS Error SS Total SS
Total SS Regression SS Error SS
31
There are also regression mean of squares, error
mean of squares, and total mean of squares
(abbreviated MS).
32
There are also regression mean of squares, error
mean of squares, and total mean of squares
(abbreviated MS). To calculate these terms, you
divide the sum of squares by its respective
degrees of freedom
33
There are also regression mean of squares, error
mean of squares, and total mean of squares
(abbreviated MS). To calculate these terms, you
divide the sum of squares by its respective
degrees of freedom Regression d.f. k Error
d.f. n-k-1 Total d.f. n-1
34
There are also regression mean of squares, error
mean of squares, and total mean of squares
(abbreviated MS). To calculate these terms, you
divide the sum of squares by its respective
degrees of freedom Regression d.f. k Error
d.f. n-k-1 Total d.f. n-1 Where k is the
number of independent variables and n is the
total number of observations used to calculate
the regression
35
So Regression MS Error MS Total MS
and Regression MS Error MS Total MS
36
Both sum of squares and mean squares values can
be found in Minitab
37
Both sum of squares and mean squares values can
be found in Minitab
SOURCE DF SS MS F p Regress 2
205.31 102.65 75.62 0.000 Error 12 16.29
1.36 Total 14 221.60
38
Both sum of squares and mean squares values can
be found in Minitab
SOURCE DF SS MS F p Regress 2
205.31 102.65 75.62 0.000 Error 12 16.29
1.36 Total 14 221.60
Now we can calculate the F-statistic.
39
Test Statistic and Distribution
  • Test statistic

F model mean square error mean
square F 102.65 1.36 F
75.48 Which is very close to F-statistic from
Minitab ( 75.62)
40
The p-value for the F-statistic is then found in
a F-Distribution Table. As you saw before, it
can also be easily calculated by software. A
small p-value rejects the null hypothesis that
none of the independent variables are
significant. That is to say, at least one of the
independent variables are significant.
41
The conclusion in the context of our data is We
have strong evidence (p is approx. 0) to reject
the null hypothesis. That is to say either
someones weight or their average parents height
is significant in predicting his height.
Once you know that at least one independent
variable is significant, you can go on to test
each independent variable separately.
42
Testing Individual Terms
If an independent variable does not contribute
significantly to predicting the value of Y, the
coefficient of that variable will be 0. The
test of the these hypotheses determines whether
the estimated coefficient is significantly
different from 0. From this, we can tell
whether an independent variable is important for
predicting the dependent variable.
43
Test for Individual Terms
44
Test for Individual Terms
HO
45
Test for Individual Terms
HO The independent variable, xj, is not
important for predicting y
46
Test for Individual Terms
HO The independent variable, xj, is not
important for predicting y HA
47
Test for Individual Terms
HO The independent variable, xj, is not
important for predicting y HA The independent
variable, xj, is important for predicting y
48
Test for Individual Terms
HO The independent variable, xj, is not
important for predicting y HA The independent
variable, xj, is important for predicting
y where j represents a specified random variable
49
Test Statistic t
50
Test Statistic t d.f. n-k-1
51
Test Statistic t d.f. n-k-1 Remember,
this test is only to be performed, if the overall
model of the test is significant.
52
? T-distribution
Tests of individual terms for significance are
the same as a test of significance in simple
linear regression
53
A small p-value means that the independent
variable is significant. Predictor Coef Stdev
t-ratio p Constant 25.028 4.326 5.79
0.000 weight 0.24020 0.03140 7.65
0.000 parenth 0.11493 0.09035 1.27 0.227
This test of significance shows that weight is a
significant independent variable for predicting
height, but average parent height is not.
54
  • Now that you know how to do tests of significance
    for multiple regression, there are many other
    things that you can learn. Such as
  • How to create confidence intervals
  • How to use categorical variables in multiple
    regression
  • How to test for significance in groups of
    independent variables
Write a Comment
User Comments (0)
About PowerShow.com