Title: L12.1
1Lecture 12 Generalized Linear Models (GLM)
- What are they?
- When do we use it?
- The full model
- The ANCOVA model
- The common regression model
- The extra sum of squares principle
- Assumptions
2What are General(ized) Linear Models
Multivariate models
- GLMs are models of the form
- with Y, a vector of dependent variables, b, a
vector of estimated coefficients, X, a vector of
independent variables and e, a vector of error
terms.
Simple linear regression
Multiple regression
Analysis of variance (ANOVA)
Analysis of covariance (ANCOVA)
3Some GLM procedures
either categorical or treated as a categorical
variable
4When do we use ANCOVA?
- to compare the relationship between a dependent
(Y) and independent (X1) variable for different
levels of one or more categorical variables (X2) - e.g. relationship between body mass (Y) and body
size (X1) for different taxonomic groups (birds
mammals, X2)
5When do we use ANCOVA?
- In doing comparisons, we assume that the
qualitative form of the model is the same for all
levels of the categorical variables... - otherwise, one is comparing apples and oranges!
6When do we use ANCOVA?
- ANCOVA is used to compare linear models
- although ANCOVA-like extensions have been
developed for nonlinear models.
7The simple regression model
- The regression model is
- So, all simple regression models are described by
2 parameters, the intercept (a) and slope (b).
a (intercept)
b DY/DX (slope)
8Simple GLMs
- Two linear models may differ as follows
- differences in both intercepts (a) and slopes (b)
- different intercepts but the same slopes (ANCOVA
model)
9Simple GLMs
- Two linear models may also differ as follows
- different slopes (b) but the same intercepts (a)
- same slopes and intercepts (common regression
model)
10Fitting GLMs
- Proceeds in hierarchical fashion fitting the most
complex model first. - Evaluate significance of a term by fitting two
models one with the term in, the other with it
removed. - Test for change in model fit (D MF) associated
with removal of the term in question.
Model A (term in)
D MF
Model B (term out)
Retain term (D large)
Delete term (D small)
11Model fitting evaluating the significance of
model terms
- Fit higher order model (hom) including all
possible terms retain SSresidual and MSresidual
. - Fit reduced model (rm), retain SSresidual .
- Test for significance of removed term by
computing
Higher order model
F
Reduced model
Retain term (p lt .05)
Delete term (p gt .05)
12The full model with 2 independent variables
- The full model is
- bi is the slope of the regression of Y on X1 (the
covariate) estimated for level i of the
categorical variable X2 . - ai is the difference between the mean of each
level i of the categorical variable X2 and the
overall mean.
13The full model null hypotheses
- For the full model with 2 independent variables,
there are 3 null hypotheses
14(No Transcript)
15Assumptions for full model hypothesis testing
- Residuals are independent and normally
distributed. - Residual variance is equal for all values of X
and independent of the value of the categorical
variable (homoscedasticity). - No error in independent variables
- Relationship between Y and covariate is linear.
16Procedure
- Fit full model, test for differences among
slopes. - If H02 rejected, run separate regressions for
each level of categorical variable(s). - If H02 accepted, proceed to fit ANCOVA model.
H02 accepted
H02 rejected
Level 1 of variable X2 Level 2 of variable X2
17The ANCOVA model with 2 independent variables
- The full model is
- b is the slope of the regression of Y on X1 (the
covariate) pooled over levels of the categorical
variable X2 . - ai is the difference between the mean of each
level i of the categorical variable X2 and the
overall mean.
18The ANCOVA model null hypotheses
- For the ANCOVA model with 2 independent
variables, there are 2 null hypotheses
19(No Transcript)
20Assumptions for hypothesis testing in ANCOVA model
- Residuals are independent and normally
distributed. - Residual variance is equal for all values of X
and independent of the value of the categorical
variable (homoscedasticity). - No error in independent variables
- Relationship between Y and covariate is linear.
- The slope of the regression of Y on X1 (the
covariate) is the same for all levels of the
categorical variable X2 (not an assumption for
full model!).
21Procedure
- Fit ANCOVA model test for differences among
intercepts. - If H01 rejected, do multiple comparisons to see
which intercepts differ (if there are more than 2
levels for X2). - If H01 accepted, proceed to fit common regression
model.
H01 accepted
H01 rejected
Level 1 of variable X2 Level 2 of variable X2
22The common regression model with 2 independent
variables
- The model is
- b is the slope of the regression of Y on X1
pooled over levels of the categorical variable X2
. - a is the pooled intercept.
- is the pooled average of X1.
23The common regression model null hypotheses
- For the common regression model, there are 2 null
hypotheses
24Assumptions for hypothesis testing in common
regression model
- Residuals are independent and normally
distributed. - Residual variance is equal for all values of X.
- No error in independent variable
- Relationship between Y and X is linear.
25Example 1 effects of sex and age on sturgeon
size at The Pas
Females
Males
26Analysis
Males
- Log(forklength)(LFKL) is dependent variable
log(age) (LAGE) is the covariate, and sex (SEX)
is the categorical variable (2 levels). - Q1 is slope of regression of LFKL on LAGE the
same for both sexes?
Females
27Effects of sex and age on size of sturgeon at The
Pas
28Analysis
Males
- Conclusion 1 slope of regression of LFKL on LAGE
is the same for both sexes (accept H03 ) since
p(SEXLAGE) gt .05 . - Q2 is intercept the same for both males and
females?
Females
29Effects of sex and age on size of sturgeon at The
Pas (ANCOVA model)
30Analysis
Males
- Conclusion 2 Intercept is the same for both
males and females. H02 is accepted since p(SEX gt
0.05), implying that - best model is common regression model.
- Note that reduction in fit (R2) from full model
to ANCOVA model is negligible (.697 to .696)
indicating that deleting a model term has a
negligible impact on model fit.
Females
31Effects of sex and age on size of sturgeon at The
Pas (common regression)
32Example 2 Effect of location and age on sturgeon
size
LFKL
LFKL
33Analysis
Lake of the Woods
- Log(forklength)(LFKL) is dependent variable
log(age) (LAGE)is the covariate, and location
(SEX) is the categorical variable (2 levels). - Q is slope of regression of LFKL on LAGE the
same at both locations?
Nelson River
LFKL
34Effect of location and age on sturgeon size
35Analysis
Lake of the Woods
- Conclusion slope of regression of LFKL on LAGE
is different at the two locations (reject H03 )
since p(LOCATIONLAGE) lt .05 . - So, should fit individual regressions for each
location.
LFKL
Nelson River
LFKL
36What do you do if?
- More than 2 levels of categorical variable?
- Follow above procedure but if H03 (same slope)
rejected, do pairwise contrasts of individual
slopes. - If H03 accepted but H02 (same intercepts)
rejected, do pairwise comparisons of intercepts. - Always control for experiment-wise Type I error
rate.
37What do you do if?
- Biological hypothesis implies one-tailed null(s)?
- Follow above procedure but if H03 (same slope)
rejected, do one-tailed pairwise contrasts of
individual slopes. - If H03 accepted but H02 (same intercepts)
rejected, do one-tailed pairwise comparisons of
intercepts.
38Power analysis in GLM
- In any GLM, hypotheses are tested by means of an
F-test. - Remember the appropriate SSerror and dferror
depends on the type of analysis and the
hypothesis under investigation. - Knowing F, we can compute R2, the proportion of
the total variance in Y explained by the factor
(source) under consideration.
39Partial and total R2
Proportion of variance accounted for by both
A and B (R2YA,B)
- The total R2 (R2YB) is the proportion of
variance in Y accounted for (explained by) a set
of independent variables B. - The partial R2 (R2YA,B- R2YA ) is the
proportion of variance in Y accounted for by B
when the variance accounted for by another set A
is removed.
Proportion of variance accounted for by A
only (R2YA)(total R2)
Proportion of variance accounted for by
B independent of A (R2YA,B- R2YA ) (partial R2)
40Partial and total R2
Proportion of variance accounted for by
B (R2YB)(total R2)
Proportion of variance independent of
A (R2YA,B- R2YA ) (partial R2)
- The total R2 (R2YB) for set B equals the partial
R2 (R2YA,B- R2YA ) for set B if either (1) the
total R2 for A (R2YA) is zero or (2) if A and B
are independent (in which case R2YA,B R2YA
R2YB).
Equal iff
41Partial and total R2
- In simple linear regression and single-factor
ANOVA, there is only one independent variable X
(either continuous or categorical). - In these cases, set B includes only one variable
X and total R2 (R2YB) total R2 (R2YX) and
the partial and total R2 are the same.
42Partial and total R2
- In ANCOVA and multiple-factor ANOVA, there are
several independent variables X1, X2, ... (either
continuous or categorical), so set B includes
several variables. - In this case, the total and partial R2 may be
very different.
43Example Partial and total R2 in ANCOVA
- Two independent variables X1 (continuous) and X2
(categorical)
44Defining effect size in GLM
- The effect size, denoted f2, is given by the
ratio of the factor (source) R2factor and 1 minus
the appropriate error R2error. - Note both R2factor and R2error depend on the
null hypothesis under investigation.
45Effects of sex and age on size of sturgeon at The
Pas (common regression)
46Defining effect size in GLM case 1
- Case 1 a set B is related to Y, and the total R2
(R2YB) is determined. - The error variance proportion is then
1- R2YB . - H0 R2YB 0
- Example effect of age on sturgeon size at The
Pas - B LAGE
47Effects of sex and age on size of sturgeon at The
Pas
48Effects of sex and age on size of sturgeon at The
Pas (ANCOVA model)
49Defining effect size in GLM case 2
- Case 2 the proportion of variance of Y due to B
over and above that due to A is determined
(R2YA,B- R2YA ). - The error variance proportion is then 1- R2YA,B
. - H0 R2YA,B- R2YA 0
- Example effect of SEXLAGE on sturgeon size at
The Pas - B SEXLAGE, A,B SEX, LAGE, SEXLAGE
50Determining power
- Once f2 has been determined, either a priori (as
an alternate hypothesis) or a posteriori (the
observed effect size), calculate non-central F
parameter f . - Knowing f and factor (source) (n1) and error (n2)
degrees of freedom, we can determine power from
appropriate tables for given a.
51Example effect of pH and nutrient levels on
growth rate of bass
- Sample of 35 lakes
- 3 pH levels acid, circumneutral, basic
- For each lake, an estimate of growth rate is
obtained (e.g. from size-age regression). - What is probability of detecting a true effect
size as large as the sample effect size for pHN
once effects of N and pH have been controlled
for, given a .05?
52Example effect of pH and nutrient levels on
growth rate of bass
- Sample effect size f2 for pH once effects of N
and pHN have been controlled for 0.14 - Source (pH) df n1 2 error df n2 35 - 2 -
2- 1 - 1 29 - Use tables of f based on R2 to get power (NOT
the same tables as for ANOVA).