Title: Inference in Regression
1Inference in Regression
- In Regression there are a number of questions of
interest - Is the Intercept zero?
- Could it be some other value of scientific
interest? - Is the slope zero?
- Could it be some other value of scientific
interest? - How accurate are predictions?
- The Minitab output can be used to answer these
questions
2Regression Analysis R. Wing Lth versus L. Wing
Lth The regression equation is R. Wing Lth
16.5 0.957 L. Wing Lth 689 cases used, 33
cases contain missing values Predictor
Coef SE Coef T P Constant 16.482
4.736 3.48 0.001 L. Wing Lth 0.95741
0.01213 78.90 0.000 S 3.16301 R-Sq
90.1 R-Sq(adj) 90.0 Analysis of
Variance Source DF SS MS
F P Regression 1 62285 62285
6225.61 0.000 Residual Error 687 6873
10 Total 688 69158
3- S 3.16301 R-Sq 90.1 R-Sq(adj) 90.0
- Analysis of Variance
- Source DF SS MS F P
- Regression 1 62285 62285 6225.61 0.000
- Residual Error 687 6873 10
- Total 688 69158
4Unusual Observations
- Minitab reports two kinds of Unusual
Observations
5Multiple Regression
- Often more than one variable may have an
influence on the response - EG Blood Pressure of a child may be related to
both of their parents
6Categorical Variables
- Initially we will look at a simple binary
variable - Sex of Student
7Minitab Output
- Regression Analysis Result versus Study Time
- The regression equation is
- Result 32.1 1.83 Study Time
- Predictor Coef SE Coef T P
- Constant 32.107 5.191 6.19 0.000
- Study Time 1.8281 0.4334 4.22 0.001
- S 10.0276 R-Sq 49.7 R-Sq(adj) 46.9
- Analysis of Variance
- Source DF SS MS F P
- Regression 1 1788.6 1788.6 17.79 0.001
- Residual Error 18 1809.9 100.6
8We Might Propose two Lines
- We need to add a 0 1 Variable
- The model is now
- The output is now
9- Regression Analysis Result versus Study Time,
Sex - The regression equation is
- Result 19.6 2.45 Study Time 11.5 Sex
- Predictor Coef SE Coef T P
- Constant 19.600 6.980 2.81 0.012
- Study Time 2.4526 0.4659 5.26 0.000
- Sex 11.525 4.821 2.39 0.029
- S 8.92625 R-Sq 62.4 R-Sq(adj) 57.9
- Analysis of Variance
- Source DF SS MS F P
- Regression 2 2244.0 1122.0 14.08 0.000
10Different Slopes
- We can now add an INTERACTION
- This is constructed by multiplying the study
time and sex variables together - The model is now
-
- b2 is a change in the intercept
- b3 is a change in the slope
- Ie Intercept for sex 0 is b0, intercept for sex 1
is b0 b2 - Slope for sex 0 is b1, slope for sex 1 is b1 b3
11- Result Study Sex interact
- Time
- 64 12 0 0
- 43 14 0 0
- 37 4 1 4
- 50 5 1 5
- 52 6 1 6
- 23 6 0 0
- 60 19 0 0
- 61 13 0 0
- 48 4 1 4
- 72 17 0 0
- 61 12 1 12
- 36 7 1 7
- 53 11 0 0
- 44 10 0 0
- 67 12 1 12
- 51 18 0 0
- 61 17 0 0
12Regression Analysis Result versus Study Time,
Sex, interact The regression equation is Result
21.3 2.33 Study Time 9.2 Sex 0.207
interact Predictor Coef SE Coef T
P Constant 21.26 10.61 2.00 0.062 Study
Time 2.3314 0.7450 3.13 0.006 Sex
9.19 12.07 0.76 0.458 interact 0.2070
0.9736 0.21 0.834 S 9.18799 R-Sq 62.5
R-Sq(adj) 55.4 Analysis of Variance Source
DF SS MS F
P Regression 3 2247.84 749.28 8.88
0.001 Residual Error 16 1350.71 84.42 Total
19 3598.55 Source DF Seq
SS Study Time 1 1788.61 Sex 1
455.41 interact 1 3.82
13(No Transcript)
14Analysis of VarianceANOVA
- For more than two groups multiple t tests to find
differences in means are not appropriate. One in
twenty will be wrong on average.
15ANOVA
16ANOVA
- It can be shown that Within Group
Variability - Between Group Variability
- Total Variability
-
- This is the essence of ANOVA
17Minitab Output
- One-way ANOVA Yield 1 versus Group
- Source DF SS MS F P
- Group 2 40.033 20.017 188.24 0.000
- Error 12 1.276 0.106
- Total 14 41.309
- S 0.3261 R-Sq 96.91 R-Sq(adj) 96.40
- Individual 95 CIs For
Mean Based on - Pooled StDev
- Level N Mean StDev ----------------------
-------------- - 1 5 4.0600 0.3362 (---)
- 2 5 5.9600 0.3050 (---)
- 3 5 8.0600 0.3362
(---) - ----------------------
-------------- - 4.8 6.0
7.2 8.4
18The Model
- The model can be written as
-
- where
- z1 is 1 for group 2 and 0 elsewhere
- z2 is 1 for group 3 and 0 elsewhere
- b1 and b2 are then differences from group 1
- Note we could do this as a regression with two
variables z1 and z2
19So What??
- We have carried out a test of significance F
188.24, p 0.000 - But what was the Null Hypothesis?
- H0 moverall m1 m2 m3
- HA One or more of the pairs of means are not
equal - But this is not very useful
- We are now back to the original problem of lots
of pairwise tests but at least we know with 95
confidence that there is a real effect that we
are looking for. -
20Post Hoc Multiple Comparisons
- Dear ChrisThis is a deep philosophical
debatable issue about what are appropriate
inference procedures if you really do want to
test for differences between all possible pairs
simultaneously -- as you say 120,000 contrasts
at the same time.The 'bad apples' ( or the
excellent juicy ones) may of course be more
simply done by evaluating whether residualÂ
estimates are compatable with zero random effect
( i.e that schools do not differ from the
average)-- in which case there are only 500
possible comparisons.If indeed you do want an
appropriate multiple testing procedure and
practically in my view there would be some
question concerning what this would really tell
you and whether it is at all necessary beyond
standard caterpillar diagrams -- then indeed
some sort of Bonferroni idea would be appropriate
to account for the multiple testing procedure .
You rightly point out the difficulty concerned
with Bonferroni when there are a multiplicity of
tests . It can set the right Type 1 error level
but its Power is really extremely weak. - An article which develops more powerful
procedures based on concepts which are called
Family wise error rates and /or False Discovery
proportions will shortly be published in JRSS
Series A of which I am joint editor-- - Antony FieldingProfessor of Social and
Educational StatisticsDepartment of
EconomicsUniversity of Birmingham ,United
Kingdom
21- A number of possibilities have been suggested
- Bonferroni
- Tukey
- Duncan
- Etc.
- They all work by adjusting the acceptance
probability in some way.
22Assumptions in ANOVA
- The samples are
- Random
- Independent
- From Normally distributed populations
- Variance of populations is constant
23More than one Factor
- Two Factors - This commonly occurs in blocked
experiments - Treatment
- Block
- Remember back to the experimental design
lectures. I described an experiment with 5
paddocks and 3 management regimes.
24The data looked like
25The one way ANOVA
One-way ANOVA Yield versus TrtNo Source DF
SS MS F P TrtNo 2 308846
154423 8.44 0.005 Error 12 219531
18294 Total 14 528377 S 135.3 R-Sq
58.45 R-Sq(adj) 51.53
Individual 95 CIs For Mean Based on Pooled
StDev Level N Mean
StDev ------------------------------------
1 5 1408.7 151.0 (--------------) 2
5 1531.8 132.0
(---------------) 3 5 1755.4 121.1
(---------------)
-------------------------------
----- 1280 1440
1600 1760 Pooled StDev 135.3
26BUT Including the Block Information
Two-way ANOVA Yield versus TrtNo, BlkNo Source
DF SS MS F P TrtNo 2
308846 154423 149.82 0.000 BlkNo 4 211285
52821 51.25 0.000 Error 8 8246
1031 Total 14 528377 S 32.10 R-Sq
98.44 R-Sq(adj) 97.27
27Individual 95 CIs For Mean Based on
Pooled StDev TrtNo Mean
------------------------------------ 1
1408.75 (---) 2 1531.76
(---) 3 1755.39
(---) -----------------------
------------- 1440 1560
1680 1800 Individual
95 CIs For Mean Based on Pooled
StDev BlkNo Mean -------------------------
----------- 6 1741.14
(-----) 9 1397.45 (-----) 11
1559.46 (------) 16 1639.38
(-----) 17 1489.07
(-----) ----------------------
-------------- 1440
1560 1680 1800
28A More Complicated Model
- Data were collected on the length of time
patients were in hospital, which one of three
medicines they were treated with, and which one
of three vitamin supplements they were given. - 36 patients were used and they were randomly
assigned to medicines and vitamins in 9 groups of
4
29A graph of the data
30The Simple ANOVA
One-way ANOVA Days versus Medicine Source
DF SS MS F P Medicine 2
72.00 36.00 3.69 0.036 Error 33 322.00
9.76 Total 35 394.00 S 3.124 R-Sq
18.27 R-Sq(adj) 13.32
Individual 95 CIs For Mean Based on
Pooled StDev Level N Mean
StDev ------------------------------------ 1
12 7.000 2.594 (----------------) 2
12 10.000 4.243
(----------------) 3 12 10.000 2.132
(----------------)
------------------------------------
6.0 8.0 10.0
12.0 Pooled StDev 3.124
31One-way ANOVA Days versus Vitamin Source DF
SS MS F P Vitamin 2 24.0 12.0
1.07 0.355 Error 33 370.0 11.2 Total 35
394.0 S 3.348 R-Sq 6.09 R-Sq(adj)
0.40 Individual 95
CIs For Mean Based on Pooled
StDev Level N Mean StDev
------------------------------------ 1
12 8.000 3.015 (------------------------) 2
12 9.000 2.594
(------------------------) 3 12 10.000
4.221 (------------------------)
------------------
------------------ 6.0
7.5 9.0 10.5 Pooled StDev
3.348
322 way ANOVA
Two-way ANOVA Days versus Medicine, Vitamin
Source DF SS MS F
P Medicine 2 72 36.0000 3.74 0.035 Vitamin
2 24 12.0000 1.25 0.301 Error 31 298
9.6129 Total 35 394 S 3.100 R-Sq
24.37 R-Sq(adj) 14.61
33With the Interaction
Two-way ANOVA Days versus Medicine, Vitamin
Source DF SS MS F
P Medicine 2 72 36.0000 7.04
0.003 Vitamin 2 24 12.0000 2.35
0.115 Interaction 4 160 40.0000 7.83
0.000 Error 27 138 5.1111 Total
35 394 S 2.261 R-Sq 64.97 R-Sq(adj)
54.60
34Individual 95 CIs For Mean Based on
Pooled StDev Medicine Mean
------------------------------------ 1
7 (----------------) 2 10
(----------------) 3 10
(----------------)
------------------------------------
6.0 7.5 9.0 10.5
Individual 95 CIs For Mean Based
on Pooled StDev Vitamin Mean
------------------------------------ 1
8 (--------------------) 2 9
(--------------------) 3 10
(--------------------)
------------------------------------
7.2 8.4 9.6 10.8
35 Individual 95 CIs For Mean
Based on Pooled StDev Medicine
Mean ------------------------------------ 1
7 (----------------) 2 10
(----------------) 3 10
(----------------)
------------------------------------
6.0 8.0 10.0 12.0
Individual 95 CIs For Mean Based on
Pooled StDev Vitamin Mean
------------------------------------ 1
8 (-----------------------) 2 9
(----------------------) 3 10
(-----------------------)
------------------------------------
7.5 9.0 10.5 12.0
36(No Transcript)
37Interaction Plot