Title: Revealing the riddle of REML
1Revealing the riddle of REML
- Mick ONeill
- Faculty of Agriculture, Food Natural Resources,
University of Sydney
2Background
- Biometry 1 and 2 are core units with an applied
stats focus. Many students have only Maths in
Society on entry to the Faculty - Biometry 3 is a Third Year elective
- Biometry 4 is (still) a possible major
- All students are now expected to design and
analyse their fourth year experiments with little
or no help from the Biometry Unit
3Third year Biometry students can
- Design and analyse multi-strata factorial
experiments (split-plots, strip-plots) - Perform binomial ordinal logistic regression,
Poisson regression, - Analyse repeated measures data using REML
4Step 1. What is Maximum Likelihood?
- The likelihood is the prior probability of
obtaining the actual data in your sample - This requires you to assume that the data follow
some distribution, typically - Binomial or Poisson for count data
- Normal or LogNormal for continuous data
5Step 1. What is Maximum Likelihood?
The likelihood is the prior probability of
obtaining the actual data in your sample Each of
these distributions involves at least one unknown
parameter (probability, mean, standard deviation,
) which must be estimated from the data.
6Step 1. What is Maximum Likelihood?
The likelihood is the prior probability of
obtaining the actual data in your
sample Estimation is achieved by finding that
parameter value which maximises the likelihood
(or equivalently the log-likelihood)
7Example 1. Binomial data
- Guess p P(seed germinates)
- Evaluate LogL
- Maximise LogL by varying p
8Example 2. Normal data
- Guess m and s
- Evaluate LogL
- Maximise LogL by varying m and s
9Step 2. What is REML?
It is possible to partition the likelihood into
two terms
- a likelihood that involves m (as well as s2)
- and
- a residual likelihood that involves only s2
10Step 2. What is REML?
It is possible to partition the likelihood into
two terms, in such a way that
- the first likelihood can be maximised to estimate
m (and its solution does not depend on the value
of s2) - the residual likelihood can be maximised to
estimate s2 ? REML estimate
11How?
12Solutions
- ML estimate of variance is
- REML estimate is
- In each case the estimate of m is the sample mean
13sn-1
sn
ML estimate
REML estimate
14Extensions
15(No Transcript)
16Example 3. One-way (no blocking)Fixed effects
17ANOVA vs REML
- ANOVA
- Source of variation d.f. s.s.
m.s. v.r. F pr. - Chick stratum
- Diet 3 0.01847
0.00616 0.10 0.958 - Residual 12 0.73230
0.06103 - Total 15 0.75078
- REML Variance Components Analysis
- Fixed model ConstantDiet
- Random model Chick
- Chick used as residual term
- Residual variance model
- Term Factor Model(order) Parameter
Estimate S.e. - Chick Identity Sigma2
0.0610 0.02491 - Wald tests for fixed effects
18- ANOVA
- Tables of means
- Grand mean 3.129
-
- Diet Diet_1 Diet_2 Diet_3 Diet_4
- 3.107 3.188 3.112 3.107
- Standard errors of differences of means
- Table Diet
- rep. 4
- d.f. 12
- s.e.d. 0.1747
- REML Variance Components Analysis
19Example 4a. One-way (in randomised blocks)
fixed treatments
- ANOVA
- Source of variation d.f. s.s.
m.s. v.r. F pr. -
- Block stratum 5 5.410
1.082 0.29 -
- Block.Units stratum
- Spacing 4 125.661
31.415 8.50 lt.001 - Residual 20 73.919
3.696 -
- REML Variance Components Analysis
- (a) With Block Spacing both fixed effects
- Term Factor Model(order) Parameter
Estimate S.e. - Residual Identity Sigma2
3.696 1.169 - Fixed term Wald statistic d.f.
Wald/d.f. Chi-sq prob - Block 1.46 5
0.29 0.917 - Spacing 34.00 4
8.50 lt0.001
20Random blocks, fixed treatments
- ANOVA
- Source of variation d.f. s.s.
m.s. v.r. F pr. - Block stratum 5 5.410
1.082 0.29 -
- Block.Units stratum
- Spacing 4 125.661
31.415 8.50 lt.001 - Residual 20 73.919
3.696 -
- REML Variance Components Analysis
- (b) With Spacing fixed and Block random
- Estimated Variance Components
- Random term Component S.e.
- Block 0.000 BOUND
-
- Residual variance model
- Term Factor Model(order) Parameter
Estimate S.e. - Residual Identity Sigma2
3.173 0.897 -
Estimated Source
Value Block -0.5228 Error
3.6959
21Model for RCBD
- Yield of soybean Overall mean Block effect
Spacing effect -
Error - Overall mean and Spacing effects are fixed
effects - Block effect is a random term
- Error is a random term
22General Linear Mixed Model
- Yield of soybean Overall mean Block effect
Spacing effect -
Error - Y Fixed effects Random effects Error term
- Y Xt Zu e
- The random effects can be correlated
- The error term can be correlated
- The random effects are uncorrelated with the
error term
23General Linear Mixed Model
- Y Fixed effects Random effects Error term
- Y Xt Zu e
is a scaling factor, often set to 1
24- REML is used as the default to estimate to
variance and covariance parameters of the model - The algorithm does not depend on the data being
balanced
25Furthermore, different choices for the variance
matrices allow for
- an appropriate repeated measures analysis for
normal data - an appropriate spatial analysis for field trials
Nested models can be compared using the change in
deviance which is approximately c2 with df
change in number of parameters
26Example 5. Adjusting thesis marks for random
markers
27For students
For markers
28Example 6. Use of devianceWidths (in mm) of the
dorsal shield of larvae of ticks found on 4
rabbits
29- Minitabs analysis
- Source DF Adj MS F P
- Rabbit 3 602.6 5.26 0.004
- Error 33 114.5
- Estimated
- Source Term Source Value
- 1 Rabbit (2) 9.0090 (1) Rabbit 54.18
- 2 Error (2) Error 114.48
- Rabbit Mean
- 1 372.3
- 2 354.4
- 3 355.3
- 4 361.3 these are sample means
30- GenStats Linear Mixed Models analysis
- Random term Component S.e.
- Rabbit 55.0 55.8
-
- Residual variance model
- Term Model(order) Parameter Estimate
S.e. - Residual Identity Sigma2 114.4
28.2 -
- Table of predicted means for Rabbit (these are
BLUPs) - Rabbit 1 2 3 4
- 369.9 355.5 356.0 361.2
-
- Standard error of differences Average
4.613 - Maximum
5.055 - Minimum
4.133 - Average variance of differences
21.38 -
31- Test H0
- Method drop Rabbit as a random term
- Deviance d.f.
- 221.21 35 for reduced model
- 215.22 34
- Change in deviance 6.0 with 1 df
- P-value 0.014
- The variation in the widths of the dorsal shield
of larvae of ticks found among rabbits differs
significantly across rabbits (P 0.014) - The variance among rabbits is estimated to be
55.0 (? 55.7) compared to the variance within
rabbits, namely 114.4 (? 28.2)
32Example 7 - Repeated Measures
Growth of a fungus (in cm) over time
33Growth of fungus
34- Split plot without Greenhouse-Geisser adjustmment
- (assumes equi-correlation structure among times)
-
- Source of variation d.f. m.s. v.r.
F pr. - Rep.Fungus stratum
- Fungus 2 8.104 97.30
lt.001 - Residual 9 0.083 3.37
-
- Rep.Fungus.Time stratum
- Time 5 55.231 2233.21
lt.001 - Fungus.Time 10 0.933 37.71
lt.001 - Residual 45 0.025
- Estimated stratum variances
- Stratum variance d.f. variance
component - Rep.Fungus 0.0833 9
0.0098 - Rep.Fungus.Time 0.0247 45
0.0247
35- Split plot with Greenhouse-Geisser adjustmment
- (tests equi-correlation structure among times)
-
- Source of variation d.f. m.s. v.r.
F pr. - Rep.Fungus stratum
- Fungus 2 8.104 97.30
lt.001 - Residual 9 0.083 3.37
-
- Rep.Fungus.Time stratum
- Time 5 55.231 2233.21
lt.001 - Fungus.Time 10 0.933 37.71
lt.001 - Residual 45 0.025
- (d.f. are multiplied by the correction factors
before calculating F probabilities) - Box's tests for symmetry of the covariance
matrix - Chi-square 57.47 on 19 df probability
0.000 - F-test 2.93 on 19, 859 df probability
0.000
36 Split plot via REML ignoring changing
variances
- Fixed model ConstantFungusTimeFungus.Tim
e - Random model Rep.FungusRep.Fungus.Time
- Estimated Variance Components
- Random term Component S.e.
- Rep.Fungus 0.00976 0.00660
- Residual variance model
- Term Model(order) Parameter
Estimate S.e. - Rep.Fungus.Time Identity Sigma2
0.0247 0.0052 -
- Deviance d.f.
- -109.90 52
- Fixed term Wald statistic d.f. Wald/d.f.
Chi-sq prob - Fungus 194.60 2 97.30
lt0.001 - Time 11166.05 5 2233.21
lt0.001 - Fungus.Time 377.08 10 37.71
lt0.001
37 Split plot via REML accounting for changing
variances (a)
- Fixed model ConstantFungusTimeFungus.Tim
e - Random model Rep.FungusRep.Fungus.Time
- Estimated Variance Components
- Random term Component S.e.
- Rep.Fungus 0.01053 0.00539
- Residual variance model
- Term Model(order) Parameter
Estimate S.e. - Rep.Fungus.Time Identity Sigma2
0.0082 .00428 - Rep Identity - -
- - Fungus Identity - -
- - Time Diagonal d_1
1.000 FIXED - d_2
1.102 0.815 - d_3
0.227 0.215 - d_4
0.262 0.253 - d_5
1.965 1.443 - d_6
13.550 9.580
38 Split plot via REML accounting for changing
variances (b)
- Fixed model ConstantFungusTimeFungus.Tim
e - Random model Rep.FungusRep.Fungus.Time
- Estimated Variance Components
- Random term Component S.e.
- Rep.Fungus 0.01053 0.00539
- Residual variance model
- Term Model(order) Parameter
Estimate S.e. - Rep.Fungus.Time Identity Sigma2
1.000 FIXED - Rep Identity - -
- - Fungus Identity - -
- - Time Diagonal d_1
0.0082 0.0043 - d_2
0.0091 0.0047 - d_3
0.0019 0.0015 - d_4
0.0022 0.0017 - d_5
0.0162 0.0081 - d_6
0.1116 0.0530
39 Split plot via REML accounting for changing
variances
and an AR(1) correlation structure
- Fixed model ConstantFungusTimeFungus.Tim
e - Random model Rep.FungusRep.Fungus.Time
- Estimated Variance Components
- Random term Component S.e.
- Rep.Fungus 0.010616 0.005572
- Residual variance model
- Term Model(order) Parameter
Estimate S.e. - Rep.Fungus.Time Identity Sigma2
0.0085 0.0045 - Rep Identity - -
- - Fungus Identity - -
- - Time AR(1) het phi_1
0.148 0.209 - d_1
1.000 FIXED - d_2
1.202 0.895 - d_3
0.260 0.249 - d_4
0.264 0.266 - d_5
1.829 1.347 - d_6
13.560 9.620
40 Split plot via REML accounting for changing
variances
and an AR(1) correlation structure
- Deviance d.f.
Change d.f. - Same variance, uncorrelated -109.90 52
- Different variances over time -151.03 47
41.13 5 - AR(1) correlation structure -151.59 46
0.56 1
41Example 8 Spatial analysis
RCBD (fixed) fertilisers Potato
yields (t/ha)
42- Source of variation d.f. m.s.
v.r. F pr. -
- Block stratum 3 2.929
1.43 -
- Block.Treatment stratum
- Treatment 5 92.359
45.07 lt.001 - Residual 15 2.049
- Treatment A B C D
E F - 17.55 25.60 27.67 25.94
30.51 30.63 -
- Standard errors of differences of means
-
- Table Treatment
- rep. 4
- d.f. 15
- s.e.d. 1.012
43Contour plot of residuals
44(No Transcript)
45- REML
-
- Random term Component S.e.
- Block 0.395 0.500
-
- Residual variance model
- Term Factor Model Parameter Estimate
S.e. - Y.X Sigma2 2.849
1.739 - Y AR(1) phi_1
0.7054 0.2078 - X AR(1) phi_1
-0.2508 0.3397 - Deviance d.f.
- 36.54 14
- Treatment A B C D
E F - 17.74 26.29 26.79 26.34
30.41 29.37 -
- Standard error of differences Average
0.7749 - Maximum
0.8942
46(No Transcript)