L12.1 - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

L12.1

Description:

What are they? When do we use it? The full model The ANCOVA model The common regression model The extra sum of squares principle Assumptions What are General(ized ... – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 53
Provided by: C974
Category:

less

Transcript and Presenter's Notes

Title: L12.1


1
Lecture 12 Generalized Linear Models (GLM)
  • What are they?
  • When do we use it?
  • The full model
  • The ANCOVA model
  • The common regression model
  • The extra sum of squares principle
  • Assumptions

2
What are General(ized) Linear Models
Multivariate models
  • GLMs are models of the form
  • with Y, a vector of dependent variables, b, a
    vector of estimated coefficients, X, a vector of
    independent variables and e, a vector of error
    terms.

Simple linear regression
Multiple regression
Analysis of variance (ANOVA)
Analysis of covariance (ANCOVA)
3
Some GLM procedures
either categorical or treated as a categorical
variable
4
When do we use ANCOVA?
  • to compare the relationship between a dependent
    (Y) and independent (X1) variable for different
    levels of one or more categorical variables (X2)
  • e.g. relationship between body mass (Y) and body
    size (X1) for different taxonomic groups (birds
    mammals, X2)

5
When do we use ANCOVA?
  • In doing comparisons, we assume that the
    qualitative form of the model is the same for all
    levels of the categorical variables...
  • otherwise, one is comparing apples and oranges!

6
When do we use ANCOVA?
  • ANCOVA is used to compare linear models
  • although ANCOVA-like extensions have been
    developed for nonlinear models.

7
The simple regression model
  • The regression model is
  • So, all simple regression models are described by
    2 parameters, the intercept (a) and slope (b).

a (intercept)
b DY/DX (slope)
8
Simple GLMs
  • Two linear models may differ as follows
  • differences in both intercepts (a) and slopes (b)
  • different intercepts but the same slopes (ANCOVA
    model)

9
Simple GLMs
  • Two linear models may also differ as follows
  • different slopes (b) but the same intercepts (a)
  • same slopes and intercepts (common regression
    model)

10
Fitting GLMs
  • Proceeds in hierarchical fashion fitting the most
    complex model first.
  • Evaluate significance of a term by fitting two
    models one with the term in, the other with it
    removed.
  • Test for change in model fit (D MF) associated
    with removal of the term in question.

Model A (term in)
D MF
Model B (term out)
Retain term (D large)
Delete term (D small)
11
Model fitting evaluating the significance of
model terms
  • Fit higher order model (hom) including all
    possible terms retain SSresidual and MSresidual
    .
  • Fit reduced model (rm), retain SSresidual .
  • Test for significance of removed term by
    computing

Higher order model
F
Reduced model
Retain term (p lt .05)
Delete term (p gt .05)
12
The full model with 2 independent variables
  • The full model is
  • bi is the slope of the regression of Y on X1 (the
    covariate) estimated for level i of the
    categorical variable X2 .
  • ai is the difference between the mean of each
    level i of the categorical variable X2 and the
    overall mean.

13
The full model null hypotheses
  • For the full model with 2 independent variables,
    there are 3 null hypotheses

14
(No Transcript)
15
Assumptions for full model hypothesis testing
  • Residuals are independent and normally
    distributed.
  • Residual variance is equal for all values of X
    and independent of the value of the categorical
    variable (homoscedasticity).
  • No error in independent variables
  • Relationship between Y and covariate is linear.

16
Procedure
  • Fit full model, test for differences among
    slopes.
  • If H02 rejected, run separate regressions for
    each level of categorical variable(s).
  • If H02 accepted, proceed to fit ANCOVA model.

H02 accepted
H02 rejected
Level 1 of variable X2 Level 2 of variable X2
17
The ANCOVA model with 2 independent variables
  • The full model is
  • b is the slope of the regression of Y on X1 (the
    covariate) pooled over levels of the categorical
    variable X2 .
  • ai is the difference between the mean of each
    level i of the categorical variable X2 and the
    overall mean.

18
The ANCOVA model null hypotheses
  • For the ANCOVA model with 2 independent
    variables, there are 2 null hypotheses

19
(No Transcript)
20
Assumptions for hypothesis testing in ANCOVA model
  • Residuals are independent and normally
    distributed.
  • Residual variance is equal for all values of X
    and independent of the value of the categorical
    variable (homoscedasticity).
  • No error in independent variables
  • Relationship between Y and covariate is linear.
  • The slope of the regression of Y on X1 (the
    covariate) is the same for all levels of the
    categorical variable X2 (not an assumption for
    full model!).

21
Procedure
  • Fit ANCOVA model test for differences among
    intercepts.
  • If H01 rejected, do multiple comparisons to see
    which intercepts differ (if there are more than 2
    levels for X2).
  • If H01 accepted, proceed to fit common regression
    model.

H01 accepted
H01 rejected
Level 1 of variable X2 Level 2 of variable X2
22
The common regression model with 2 independent
variables
  • The model is
  • b is the slope of the regression of Y on X1
    pooled over levels of the categorical variable X2
    .
  • a is the pooled intercept.
  • is the pooled average of X1.

23
The common regression model null hypotheses
  • For the common regression model, there are 2 null
    hypotheses

24
Assumptions for hypothesis testing in common
regression model
  • Residuals are independent and normally
    distributed.
  • Residual variance is equal for all values of X.
  • No error in independent variable
  • Relationship between Y and X is linear.

25
Example 1 effects of sex and age on sturgeon
size at The Pas
Females
Males
26
Analysis
Males
  • Log(forklength)(LFKL) is dependent variable
    log(age) (LAGE) is the covariate, and sex (SEX)
    is the categorical variable (2 levels).
  • Q1 is slope of regression of LFKL on LAGE the
    same for both sexes?

Females
27
Effects of sex and age on size of sturgeon at The
Pas
28
Analysis
Males
  • Conclusion 1 slope of regression of LFKL on LAGE
    is the same for both sexes (accept H03 ) since
    p(SEXLAGE) gt .05 .
  • Q2 is intercept the same for both males and
    females?

Females
29
Effects of sex and age on size of sturgeon at The
Pas (ANCOVA model)
30
Analysis
Males
  • Conclusion 2 Intercept is the same for both
    males and females. H02 is accepted since p(SEX gt
    0.05), implying that
  • best model is common regression model.
  • Note that reduction in fit (R2) from full model
    to ANCOVA model is negligible (.697 to .696)
    indicating that deleting a model term has a
    negligible impact on model fit.

Females
31
Effects of sex and age on size of sturgeon at The
Pas (common regression)
32
Example 2 Effect of location and age on sturgeon
size
LFKL
LFKL
33
Analysis
Lake of the Woods
  • Log(forklength)(LFKL) is dependent variable
    log(age) (LAGE)is the covariate, and location
    (SEX) is the categorical variable (2 levels).
  • Q is slope of regression of LFKL on LAGE the
    same at both locations?

Nelson River
LFKL
34
Effect of location and age on sturgeon size
35
Analysis
Lake of the Woods
  • Conclusion slope of regression of LFKL on LAGE
    is different at the two locations (reject H03 )
    since p(LOCATIONLAGE) lt .05 .
  • So, should fit individual regressions for each
    location.

LFKL
Nelson River
LFKL
36
What do you do if?
  • More than 2 levels of categorical variable?
  • Follow above procedure but if H03 (same slope)
    rejected, do pairwise contrasts of individual
    slopes.
  • If H03 accepted but H02 (same intercepts)
    rejected, do pairwise comparisons of intercepts.
  • Always control for experiment-wise Type I error
    rate.

37
What do you do if?
  • Biological hypothesis implies one-tailed null(s)?
  • Follow above procedure but if H03 (same slope)
    rejected, do one-tailed pairwise contrasts of
    individual slopes.
  • If H03 accepted but H02 (same intercepts)
    rejected, do one-tailed pairwise comparisons of
    intercepts.

38
Power analysis in GLM
  • In any GLM, hypotheses are tested by means of an
    F-test.
  • Remember the appropriate SSerror and dferror
    depends on the type of analysis and the
    hypothesis under investigation.
  • Knowing F, we can compute R2, the proportion of
    the total variance in Y explained by the factor
    (source) under consideration.

39
Partial and total R2
Proportion of variance accounted for by both
A and B (R2YA,B)
  • The total R2 (R2YB) is the proportion of
    variance in Y accounted for (explained by) a set
    of independent variables B.
  • The partial R2 (R2YA,B- R2YA ) is the
    proportion of variance in Y accounted for by B
    when the variance accounted for by another set A
    is removed.

Proportion of variance accounted for by A
only (R2YA)(total R2)
Proportion of variance accounted for by
B independent of A (R2YA,B- R2YA ) (partial R2)
40
Partial and total R2
Proportion of variance accounted for by
B (R2YB)(total R2)
Proportion of variance independent of
A (R2YA,B- R2YA ) (partial R2)
  • The total R2 (R2YB) for set B equals the partial
    R2 (R2YA,B- R2YA ) for set B if either (1) the
    total R2 for A (R2YA) is zero or (2) if A and B
    are independent (in which case R2YA,B R2YA
    R2YB).

Equal iff
41
Partial and total R2
  • In simple linear regression and single-factor
    ANOVA, there is only one independent variable X
    (either continuous or categorical).
  • In these cases, set B includes only one variable
    X and total R2 (R2YB) total R2 (R2YX) and
    the partial and total R2 are the same.

42
Partial and total R2
  • In ANCOVA and multiple-factor ANOVA, there are
    several independent variables X1, X2, ... (either
    continuous or categorical), so set B includes
    several variables.
  • In this case, the total and partial R2 may be
    very different.

43
Example Partial and total R2 in ANCOVA
  • Two independent variables X1 (continuous) and X2
    (categorical)

44
Defining effect size in GLM
  • The effect size, denoted f2, is given by the
    ratio of the factor (source) R2factor and 1 minus
    the appropriate error R2error.
  • Note both R2factor and R2error depend on the
    null hypothesis under investigation.

45
Effects of sex and age on size of sturgeon at The
Pas (common regression)
46
Defining effect size in GLM case 1
  • Case 1 a set B is related to Y, and the total R2
    (R2YB) is determined.
  • The error variance proportion is then
    1- R2YB .
  • H0 R2YB 0
  • Example effect of age on sturgeon size at The
    Pas
  • B LAGE

47
Effects of sex and age on size of sturgeon at The
Pas
48
Effects of sex and age on size of sturgeon at The
Pas (ANCOVA model)
49
Defining effect size in GLM case 2
  • Case 2 the proportion of variance of Y due to B
    over and above that due to A is determined
    (R2YA,B- R2YA ).
  • The error variance proportion is then 1- R2YA,B
    .
  • H0 R2YA,B- R2YA 0
  • Example effect of SEXLAGE on sturgeon size at
    The Pas
  • B SEXLAGE, A,B SEX, LAGE, SEXLAGE

50
Determining power
  • Once f2 has been determined, either a priori (as
    an alternate hypothesis) or a posteriori (the
    observed effect size), calculate non-central F
    parameter f .
  • Knowing f and factor (source) (n1) and error (n2)
    degrees of freedom, we can determine power from
    appropriate tables for given a.

51
Example effect of pH and nutrient levels on
growth rate of bass
  • Sample of 35 lakes
  • 3 pH levels acid, circumneutral, basic
  • For each lake, an estimate of growth rate is
    obtained (e.g. from size-age regression).
  • What is probability of detecting a true effect
    size as large as the sample effect size for pHN
    once effects of N and pH have been controlled
    for, given a .05?

52
Example effect of pH and nutrient levels on
growth rate of bass
  • Sample effect size f2 for pH once effects of N
    and pHN have been controlled for 0.14
  • Source (pH) df n1 2 error df n2 35 - 2 -
    2- 1 - 1 29
  • Use tables of f based on R2 to get power (NOT
    the same tables as for ANOVA).
Write a Comment
User Comments (0)
About PowerShow.com