Title: Regression with NonNormal Data: an Introduction
1Regression with Non-Normal Data an Introduction
- Outline
- A. Review GLMs
- B. Generalized Linear Models General Formulation
- C. GLiM example Logistic Regression
- D. GLiM example Log-Linear (Poisson) Regression
- This material is supported by PB section 7.2
2- A. Review GLMs
- Assumptions data (Y, X1,X2,Xp-1)
- 1. Y has a Normal distribution with mean EY and
variance s2 - 2. EY b0b1X1 b2X2bp-1Xp-1
- for some unknown b0,b1,..,bp-1
- 3. Xs measured with negligible error
- 4. Experimental runs (Yi,Xi1,Xi2,,Xi,p-1) give
independent Yis
3- Comments
- This GLM model includes all classical regression,
multiple regression, and fixed effects ANOVA and
ANCOVA models as special cases. - General Linear Mixed Model (GLMM) The expression
for EY in the GLM is conditional on the
obtained values of any random effects. Then, add
a line specifying the distribution of these
random effects. Also, GLMMs allow for Ys to
have unequal variances, and to be correlated.
4- B. GLiMs General formulation
- Assumptions data (Y, X1,X2,Xp-1)
- 1. Y has distribution __________ (user specifies)
with mean EY m - 2. g(m) b0b1X1 b2X2bp-1Xp-1
- for some unknown b0,b1,..,bp-1 and some
specified monotonic function g(). - 3. Xs measured with negligible error
- 4. There may be other parameters for Y, also,
e.g. scale, shape parameters. - 5. Experimental runs (Yi,Xi1,Xi2,,Xi,p-1) give
independent Yis
5- Comments
- g() is called the link function. Note that
assumption (2) is equivalent to - m g-1(b0b1X1 b2X2bp-1Xp-1)
- The specified distribution must be a member of
the exponential family - assumptions imply a likelihood function for the
unknowns b0,b1,..,bp-1 (plus scale, shape etc
parameters) - Obtain (hopefully) MLEs of unknown parameters by
iterative maximization algorithms (no starting
values needed) - All inference is approximate, using large sample
MLE theory
6- C. GLiM example Logistic Regression
- Assumptions data (Y, X1,X2,Xp-1)
- 1. Y has a Bernoulli distribution (values 0 and
1) with success probability p EY - 2. logp/(1-p) b0b1X1 b2X2bp-1Xp-1
- for some unknown b0,b1,..,bp-1
- 3. Xs measured with negligible error
- 4. There may be a scale parameter for Y
- 5. Experimental runs (Yi,Xi1,Xi2,,Xi,p-1) give
independent Yis
7- Comments
- Here, we see the most popular link function for
the case of Bernoulli Y, the logit function - This implies that EY is a logistic function of
the Xs
81
9- Comments
- The modeling of dichotomous Y using the logit
link function has come to be called logistic
regression - Includes ANOVA and ANCOVA models for the
log-odds, logp/(1-p) - Parameter meaning
- ?j change in the log-odds of success
logep/(1-p) per unit increase in Xj,
j1,2,..,p-1. - OR...in other words...
- (e?j-1)x100 percent increase in odds of
success per unit increase in Xj - e?0 odds of success when all Xjs0 (if this is
not an extrapolation).
10Logistic Regression Example 7.4
- X dose of tBHQ (6 choices)
- success if micronuclei formed
- Y number of successes in 2000 independent (?)
cells - The text Figs. 7.11-12 shows a SAS program and
output for simple logistic analysis using PROC
LOGISTIC - b0 -5.8067 (0.2345)
- b1 0.00283 (0.000412)
11- Comments
- Note syntax for fitting this model when data are
summarized as total counts over n independent
trials at each X - Note PROC GENMOD syntax
- The LR/deviance test for H0 ?1 0 in row
labeled -2 log L. G2CALC 56.584 on 1 df,
Plt0.0001 - The score statistic for H0 ?1 0 in row labeled
Score T2CALC 53.427 on 1 df, Plt.0001 - The Wald test for H0 ?1 0 in MLE table W2CALC
47.2266 on 1 df, Plt.0001 however, the Wald
test is not recommended here (suffers from
certain instabilities)
12- Estimated Parameter meaning...
- b1 0.00283 estimated change in the log-odds
of success (micronuclei formation) per unit
increase in dose X1 - In other words...
- (e?j-1)x100 (1.0028-1)x100 0.28 estimated
percent change in odds of micronuclei formation
per unit increase in dose X1 - ORfor a 100-unit increase in X (more typical),
- (e100?j-1)x100 (1.327-1) x100 33
- estimated percent change in odds of micronuclei
formation per 100 unit increase in dose X1 - Can back-transform a C.I. for ?1, also.
13- Estimated Parameter meaning...
- eb0 0.0030 estimated odds of micronuclei
formation when dose X0 - When all else fails, make a picturehere in R
gtdosegridlt-seq(0,800,10) gtlogoddshatlt-(-5.8067)
(0.00283dosegrid) gtpihatgridlt-1/(1exp(-logodd
shat)) gtplot(dosegrid,pihatgrid,type"l",
lwd3,xlab"tBHQ dose",ylab"") gt
title("Estimated Probability of Micronuclei
Formation")
14- Dichotomous Response Epilogue...
- Goodness of Fit? Take 775 or CYFNS
- Model Comparison? Take 775 or CYFNS
- Other inferences (CIs Pis etc)? Take 775 or...
- Other link functions are possible. For example,
if the standard Normal quantile (probit)
function is used, we are doing Probit
Regression. - Caveat All proportions
- (successes/trials)
- are not necessarily binomial...
15What is a Poisson(m) R.V.?
D. GLiM example Log-Linear (Poisson) Regression
- This kind of random variable is often used to
model the number of times an event occurs in a
fixed amount of time (or space), e.g. - number of gamma ray emissions in a millisecond
- number of sightings of a rare species in a fixed
region and timeframe - number of fish of a given species caught in a
single trawl
16What is a Poisson(m) R.V.?
17- Log-Linear (Poisson) Regression
- Assumptions data (Y, X1,X2,Xp-1)
- 1. Y has a Poisson distribution (counts of some
event in a fixed time / space) with mean m - 2. logm b0b1X1 b2X2bp-1Xp-1
- for some unknown b0,b1,..,bp-1
- 3. Xs measured with negligible error
- 4. There may also be a scale parameter for Y
- 5. Experimental runs (Yi,Xi1,Xi2,,Xi,p-1) give
independent Yis
18- Comments
- Here, we see the most popular link function for
the case of Poisson Y, the log-link function - This implies that EY is an exponential function
of the Xs
19- Comments
- Includes ANOVA and ANCOVA models for logm
- Parameter meaning
- ?j change in the logarithm of EY per unit
increase in Xj, j1,2,..,p-1. - OR...in other words...
- (e?j-1)x100 percent increase in EY per unit
increase in Xj (all other Xs fixed) - e?0 EY when all Xjs0 (if this is not an
extrapolation).
20Log-Linear Regression Example 7.5
- X dose of nitrofen (toxic)
- Y number of C. Dubia offspring
- The text Figs. 7.13-15 shows a SAS program and
output for log-linear and log-quadratic models
using PROC GENMOD. These are enhanced by a
handout. - Discussion on the board...
21- Log-linear Regression Epilogue...
- Goodness of Fit? Take 775 or CYFNS
- Model Comparison? Take 775 or CYFNS
- Other inferences (CIs Pis etc)? Take 775 or...
- Other link functions are possible
- Caveat all counts are not necessarily Poisson...