Inference in Regression - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Inference in Regression

Description:

Minitab reports two kinds of 'Unusual Observations' 5. Multiple Regression ... The 'bad apples' ( or the excellent juicy ones) may of course be more simply ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 38
Provided by: alasdai1
Category:

less

Transcript and Presenter's Notes

Title: Inference in Regression


1
Inference in Regression
  • In Regression there are a number of questions of
    interest
  • Is the Intercept zero?
  • Could it be some other value of scientific
    interest?
  • Is the slope zero?
  • Could it be some other value of scientific
    interest?
  • How accurate are predictions?
  • The Minitab output can be used to answer these
    questions

2
Regression Analysis R. Wing Lth versus L. Wing
Lth The regression equation is R. Wing Lth
16.5 0.957 L. Wing Lth 689 cases used, 33
cases contain missing values Predictor
Coef SE Coef T P Constant 16.482
4.736 3.48 0.001 L. Wing Lth 0.95741
0.01213 78.90 0.000 S 3.16301 R-Sq
90.1 R-Sq(adj) 90.0 Analysis of
Variance Source DF SS MS
F P Regression 1 62285 62285
6225.61 0.000 Residual Error 687 6873
10 Total 688 69158
3
  • S 3.16301 R-Sq 90.1 R-Sq(adj) 90.0
  • Analysis of Variance
  • Source DF SS MS F P
  • Regression 1 62285 62285 6225.61 0.000
  • Residual Error 687 6873 10
  • Total 688 69158

4
Unusual Observations
  • Minitab reports two kinds of Unusual
    Observations

5
Multiple Regression
  • Often more than one variable may have an
    influence on the response
  • EG Blood Pressure of a child may be related to
    both of their parents

6
Categorical Variables
  • Initially we will look at a simple binary
    variable
  • Sex of Student

7
Minitab Output
  • Regression Analysis Result versus Study Time
  • The regression equation is
  • Result 32.1 1.83 Study Time
  • Predictor Coef SE Coef T P
  • Constant 32.107 5.191 6.19 0.000
  • Study Time 1.8281 0.4334 4.22 0.001
  • S 10.0276 R-Sq 49.7 R-Sq(adj) 46.9
  • Analysis of Variance
  • Source DF SS MS F P
  • Regression 1 1788.6 1788.6 17.79 0.001
  • Residual Error 18 1809.9 100.6

8
We Might Propose two Lines
  • We need to add a 0 1 Variable
  • The model is now
  • The output is now

9
  • Regression Analysis Result versus Study Time,
    Sex
  • The regression equation is
  • Result 19.6 2.45 Study Time 11.5 Sex
  • Predictor Coef SE Coef T P
  • Constant 19.600 6.980 2.81 0.012
  • Study Time 2.4526 0.4659 5.26 0.000
  • Sex 11.525 4.821 2.39 0.029
  • S 8.92625 R-Sq 62.4 R-Sq(adj) 57.9
  • Analysis of Variance
  • Source DF SS MS F P
  • Regression 2 2244.0 1122.0 14.08 0.000

10
Different Slopes
  • We can now add an INTERACTION
  • This is constructed by multiplying the study
    time and sex variables together
  • The model is now
  • b2 is a change in the intercept
  • b3 is a change in the slope
  • Ie Intercept for sex 0 is b0, intercept for sex 1
    is b0 b2
  • Slope for sex 0 is b1, slope for sex 1 is b1 b3

11
  • Result Study Sex interact
  • Time
  • 64 12 0 0
  • 43 14 0 0
  • 37 4 1 4
  • 50 5 1 5
  • 52 6 1 6
  • 23 6 0 0
  • 60 19 0 0
  • 61 13 0 0
  • 48 4 1 4
  • 72 17 0 0
  • 61 12 1 12
  • 36 7 1 7
  • 53 11 0 0
  • 44 10 0 0
  • 67 12 1 12
  • 51 18 0 0
  • 61 17 0 0

12
Regression Analysis Result versus Study Time,
Sex, interact The regression equation is Result
21.3 2.33 Study Time 9.2 Sex 0.207
interact Predictor Coef SE Coef T
P Constant 21.26 10.61 2.00 0.062 Study
Time 2.3314 0.7450 3.13 0.006 Sex
9.19 12.07 0.76 0.458 interact 0.2070
0.9736 0.21 0.834 S 9.18799 R-Sq 62.5
R-Sq(adj) 55.4 Analysis of Variance Source
DF SS MS F
P Regression 3 2247.84 749.28 8.88
0.001 Residual Error 16 1350.71 84.42 Total
19 3598.55 Source DF Seq
SS Study Time 1 1788.61 Sex 1
455.41 interact 1 3.82
13
(No Transcript)
14
Analysis of VarianceANOVA
  • For more than two groups multiple t tests to find
    differences in means are not appropriate. One in
    twenty will be wrong on average.

15
ANOVA
  • Consider three groups

16
ANOVA
  • It can be shown that Within Group
    Variability
  • Between Group Variability
  • Total Variability
  • This is the essence of ANOVA

17
Minitab Output
  • One-way ANOVA Yield 1 versus Group
  • Source DF SS MS F P
  • Group 2 40.033 20.017 188.24 0.000
  • Error 12 1.276 0.106
  • Total 14 41.309
  • S 0.3261 R-Sq 96.91 R-Sq(adj) 96.40
  • Individual 95 CIs For
    Mean Based on
  • Pooled StDev
  • Level N Mean StDev ----------------------
    --------------
  • 1 5 4.0600 0.3362 (---)
  • 2 5 5.9600 0.3050 (---)
  • 3 5 8.0600 0.3362
    (---)
  • ----------------------
    --------------
  • 4.8 6.0
    7.2 8.4

18
The Model
  • The model can be written as
  • where
  • z1 is 1 for group 2 and 0 elsewhere
  • z2 is 1 for group 3 and 0 elsewhere
  • b1 and b2 are then differences from group 1
  • Note we could do this as a regression with two
    variables z1 and z2

19
So What??
  • We have carried out a test of significance F
    188.24, p 0.000
  • But what was the Null Hypothesis?
  • H0 moverall m1 m2 m3
  • HA One or more of the pairs of means are not
    equal
  • But this is not very useful
  • We are now back to the original problem of lots
    of pairwise tests but at least we know with 95
    confidence that there is a real effect that we
    are looking for.

20
Post Hoc Multiple Comparisons
  • Dear ChrisThis is a  deep philosophical
    debatable issue about what are appropriate
    inference  procedures  if you really do want to
    test for differences between all possible pairs
    simultaneously  -- as you say 120,000  contrasts
    at the same time.The 'bad apples' ( or the
    excellent juicy ones) may of course be more
    simply done by evaluating whether residual 
    estimates are compatable with zero random effect
    ( i.e  that schools do not differ from the
    average)-- in which case there are only 500
    possible comparisons.If indeed you do want an
    appropriate multiple testing procedure  and
    practically in my view there would be some
    question concerning what this would really tell
    you and whether it is at all necessary beyond
    standard caterpillar diagrams  -- then indeed
    some sort of Bonferroni idea would be appropriate
    to account for  the multiple testing procedure .
    You rightly point out the difficulty concerned
    with Bonferroni when there are a multiplicity of
    tests . It can set the right Type 1 error level
    but its Power is really extremely weak.
  • An article which develops more powerful
    procedures based on concepts which are called
    Family wise error rates and /or False Discovery
    proportions will shortly be published in JRSS
    Series A of which I am joint editor--
  • Antony FieldingProfessor of  Social and
    Educational StatisticsDepartment of
    EconomicsUniversity of Birmingham ,United
    Kingdom

21
  • A number of possibilities have been suggested
  • Bonferroni
  • Tukey
  • Duncan
  • Etc.
  • They all work by adjusting the acceptance
    probability in some way.

22
Assumptions in ANOVA
  • The samples are
  • Random
  • Independent
  • From Normally distributed populations
  • Variance of populations is constant

23
More than one Factor
  • Two Factors - This commonly occurs in blocked
    experiments
  • Treatment
  • Block
  • Remember back to the experimental design
    lectures. I described an experiment with 5
    paddocks and 3 management regimes.

24
The data looked like
25
The one way ANOVA
One-way ANOVA Yield versus TrtNo Source DF
SS MS F P TrtNo 2 308846
154423 8.44 0.005 Error 12 219531
18294 Total 14 528377 S 135.3 R-Sq
58.45 R-Sq(adj) 51.53
Individual 95 CIs For Mean Based on Pooled
StDev Level N Mean
StDev ------------------------------------
1 5 1408.7 151.0 (--------------) 2
5 1531.8 132.0
(---------------) 3 5 1755.4 121.1
(---------------)
-------------------------------
----- 1280 1440
1600 1760 Pooled StDev 135.3
26
BUT Including the Block Information
Two-way ANOVA Yield versus TrtNo, BlkNo Source
DF SS MS F P TrtNo 2
308846 154423 149.82 0.000 BlkNo 4 211285
52821 51.25 0.000 Error 8 8246
1031 Total 14 528377 S 32.10 R-Sq
98.44 R-Sq(adj) 97.27
27
Individual 95 CIs For Mean Based on
Pooled StDev TrtNo Mean
------------------------------------ 1
1408.75 (---) 2 1531.76
(---) 3 1755.39
(---) -----------------------
------------- 1440 1560
1680 1800 Individual
95 CIs For Mean Based on Pooled
StDev BlkNo Mean -------------------------
----------- 6 1741.14
(-----) 9 1397.45 (-----) 11
1559.46 (------) 16 1639.38
(-----) 17 1489.07
(-----) ----------------------
-------------- 1440
1560 1680 1800
28
A More Complicated Model
  • Data were collected on the length of time
    patients were in hospital, which one of three
    medicines they were treated with, and which one
    of three vitamin supplements they were given.
  • 36 patients were used and they were randomly
    assigned to medicines and vitamins in 9 groups of
    4

29
A graph of the data
30
The Simple ANOVA
One-way ANOVA Days versus Medicine Source
DF SS MS F P Medicine 2
72.00 36.00 3.69 0.036 Error 33 322.00
9.76 Total 35 394.00 S 3.124 R-Sq
18.27 R-Sq(adj) 13.32
Individual 95 CIs For Mean Based on
Pooled StDev Level N Mean
StDev ------------------------------------ 1
12 7.000 2.594 (----------------) 2
12 10.000 4.243
(----------------) 3 12 10.000 2.132
(----------------)
------------------------------------
6.0 8.0 10.0
12.0 Pooled StDev 3.124
31
One-way ANOVA Days versus Vitamin Source DF
SS MS F P Vitamin 2 24.0 12.0
1.07 0.355 Error 33 370.0 11.2 Total 35
394.0 S 3.348 R-Sq 6.09 R-Sq(adj)
0.40 Individual 95
CIs For Mean Based on Pooled
StDev Level N Mean StDev
------------------------------------ 1
12 8.000 3.015 (------------------------) 2
12 9.000 2.594
(------------------------) 3 12 10.000
4.221 (------------------------)
------------------
------------------ 6.0
7.5 9.0 10.5 Pooled StDev
3.348
32
2 way ANOVA
Two-way ANOVA Days versus Medicine, Vitamin
Source DF SS MS F
P Medicine 2 72 36.0000 3.74 0.035 Vitamin
2 24 12.0000 1.25 0.301 Error 31 298
9.6129 Total 35 394 S 3.100 R-Sq
24.37 R-Sq(adj) 14.61
33
With the Interaction
Two-way ANOVA Days versus Medicine, Vitamin
Source DF SS MS F
P Medicine 2 72 36.0000 7.04
0.003 Vitamin 2 24 12.0000 2.35
0.115 Interaction 4 160 40.0000 7.83
0.000 Error 27 138 5.1111 Total
35 394 S 2.261 R-Sq 64.97 R-Sq(adj)
54.60
34
Individual 95 CIs For Mean Based on
Pooled StDev Medicine Mean
------------------------------------ 1
7 (----------------) 2 10
(----------------) 3 10
(----------------)
------------------------------------
6.0 7.5 9.0 10.5
Individual 95 CIs For Mean Based
on Pooled StDev Vitamin Mean
------------------------------------ 1
8 (--------------------) 2 9
(--------------------) 3 10
(--------------------)
------------------------------------
7.2 8.4 9.6 10.8
35
Individual 95 CIs For Mean
Based on Pooled StDev Medicine
Mean ------------------------------------ 1
7 (----------------) 2 10
(----------------) 3 10
(----------------)
------------------------------------
6.0 8.0 10.0 12.0
Individual 95 CIs For Mean Based on
Pooled StDev Vitamin Mean
------------------------------------ 1
8 (-----------------------) 2 9
(----------------------) 3 10
(-----------------------)
------------------------------------
7.5 9.0 10.5 12.0
36
(No Transcript)
37
Interaction Plot
Write a Comment
User Comments (0)
About PowerShow.com