Regression with NonNormal Data: an Introduction

About This Presentation

Title:

Regression with NonNormal Data: an Introduction

Description:

GLiM example: Log-Linear (Poisson) Regression. This material is supported ... Assumptions: data ... if the standard Normal quantile ('probit') function is used, ... – PowerPoint PPT presentation

Number of Views:57

Avg rating:3.0/5.0

Slides: 22

Provided by: DEdw7

Category:

more less

Transcript and Presenter's Notes

Title: Regression with NonNormal Data: an Introduction

1
Regression with Non-Normal Data an Introduction

Outline
A. Review GLMs
B. Generalized Linear Models General Formulation
C. GLiM example Logistic Regression
D. GLiM example Log-Linear (Poisson) Regression
This material is supported by PB section 7.2

A. Review GLMs
Assumptions data (Y, X1,X2,Xp-1)
1. Y has a Normal distribution with mean EY and
variance s2
2. EY b0b1X1 b2X2bp-1Xp-1
for some unknown b0,b1,..,bp-1
3. Xs measured with negligible error
4. Experimental runs (Yi,Xi1,Xi2,,Xi,p-1) give
independent Yis

Comments
This GLM model includes all classical regression,
multiple regression, and fixed effects ANOVA and
ANCOVA models as special cases.
General Linear Mixed Model (GLMM) The expression
for EY in the GLM is conditional on the
obtained values of any random effects. Then, add
a line specifying the distribution of these
random effects. Also, GLMMs allow for Ys to
have unequal variances, and to be correlated.

B. GLiMs General formulation
Assumptions data (Y, X1,X2,Xp-1)
1. Y has distribution __________ (user specifies)
with mean EY m
2. g(m) b0b1X1 b2X2bp-1Xp-1
for some unknown b0,b1,..,bp-1 and some
specified monotonic function g().
3. Xs measured with negligible error
4. There may be other parameters for Y, also,
e.g. scale, shape parameters.
5. Experimental runs (Yi,Xi1,Xi2,,Xi,p-1) give
independent Yis

Comments
g() is called the link function. Note that
assumption (2) is equivalent to
m g-1(b0b1X1 b2X2bp-1Xp-1)
The specified distribution must be a member of
the exponential family
assumptions imply a likelihood function for the
unknowns b0,b1,..,bp-1 (plus scale, shape etc
parameters)
Obtain (hopefully) MLEs of unknown parameters by
iterative maximization algorithms (no starting
values needed)
All inference is approximate, using large sample
MLE theory

C. GLiM example Logistic Regression
Assumptions data (Y, X1,X2,Xp-1)
1. Y has a Bernoulli distribution (values 0 and
1) with success probability p EY
2. logp/(1-p) b0b1X1 b2X2bp-1Xp-1
for some unknown b0,b1,..,bp-1
3. Xs measured with negligible error
4. There may be a scale parameter for Y
5. Experimental runs (Yi,Xi1,Xi2,,Xi,p-1) give
independent Yis

Comments
Here, we see the most popular link function for
the case of Bernoulli Y, the logit function
This implies that EY is a logistic function of
the Xs

8
1
9

Comments
The modeling of dichotomous Y using the logit
link function has come to be called logistic
regression
Includes ANOVA and ANCOVA models for the
log-odds, logp/(1-p)
Parameter meaning
?j change in the log-odds of success
logep/(1-p) per unit increase in Xj,
j1,2,..,p-1.
OR...in other words...
(e?j-1)x100 percent increase in odds of
success per unit increase in Xj
e?0 odds of success when all Xjs0 (if this is
not an extrapolation).

10
Logistic Regression Example 7.4

X dose of tBHQ (6 choices)
success if micronuclei formed
Y number of successes in 2000 independent (?)
cells
The text Figs. 7.11-12 shows a SAS program and
output for simple logistic analysis using PROC
LOGISTIC
b0 -5.8067 (0.2345)
b1 0.00283 (0.000412)

Comments
Note syntax for fitting this model when data are
summarized as total counts over n independent
trials at each X
Note PROC GENMOD syntax
The LR/deviance test for H0 ?1 0 in row
labeled -2 log L. G2CALC 56.584 on 1 df,
Plt0.0001
The score statistic for H0 ?1 0 in row labeled
Score T2CALC 53.427 on 1 df, Plt.0001
The Wald test for H0 ?1 0 in MLE table W2CALC
47.2266 on 1 df, Plt.0001 however, the Wald
test is not recommended here (suffers from
certain instabilities)

Estimated Parameter meaning...
b1 0.00283 estimated change in the log-odds
of success (micronuclei formation) per unit
increase in dose X1
In other words...
(e?j-1)x100 (1.0028-1)x100 0.28 estimated
percent change in odds of micronuclei formation
per unit increase in dose X1
ORfor a 100-unit increase in X (more typical),
(e100?j-1)x100 (1.327-1) x100 33
estimated percent change in odds of micronuclei
formation per 100 unit increase in dose X1
Can back-transform a C.I. for ?1, also.

Estimated Parameter meaning...
eb0 0.0030 estimated odds of micronuclei
formation when dose X0
When all else fails, make a picturehere in R

gtdosegridlt-seq(0,800,10) gtlogoddshatlt-(-5.8067)
(0.00283dosegrid) gtpihatgridlt-1/(1exp(-logodd
shat)) gtplot(dosegrid,pihatgrid,type"l",
lwd3,xlab"tBHQ dose",ylab"") gt
title("Estimated Probability of Micronuclei
Formation")
14

Dichotomous Response Epilogue...
Goodness of Fit? Take 775 or CYFNS
Model Comparison? Take 775 or CYFNS
Other inferences (CIs Pis etc)? Take 775 or...
Other link functions are possible. For example,
if the standard Normal quantile (probit)
function is used, we are doing Probit
Regression.
Caveat All proportions
(successes/trials)
are not necessarily binomial...

15
What is a Poisson(m) R.V.?
D. GLiM example Log-Linear (Poisson) Regression

This kind of random variable is often used to
model the number of times an event occurs in a
fixed amount of time (or space), e.g.
number of gamma ray emissions in a millisecond
number of sightings of a rare species in a fixed
region and timeframe
number of fish of a given species caught in a
single trawl

16
What is a Poisson(m) R.V.?
17

Log-Linear (Poisson) Regression
Assumptions data (Y, X1,X2,Xp-1)
1. Y has a Poisson distribution (counts of some
event in a fixed time / space) with mean m
2. logm b0b1X1 b2X2bp-1Xp-1
for some unknown b0,b1,..,bp-1
3. Xs measured with negligible error
4. There may also be a scale parameter for Y
5. Experimental runs (Yi,Xi1,Xi2,,Xi,p-1) give
independent Yis

Comments
Here, we see the most popular link function for
the case of Poisson Y, the log-link function
This implies that EY is an exponential function
of the Xs

Comments
Includes ANOVA and ANCOVA models for logm
Parameter meaning
?j change in the logarithm of EY per unit
increase in Xj, j1,2,..,p-1.
OR...in other words...
(e?j-1)x100 percent increase in EY per unit
increase in Xj (all other Xs fixed)
e?0 EY when all Xjs0 (if this is not an
extrapolation).

20
Log-Linear Regression Example 7.5

X dose of nitrofen (toxic)
Y number of C. Dubia offspring
The text Figs. 7.13-15 shows a SAS program and
output for log-linear and log-quadratic models
using PROC GENMOD. These are enhanced by a
handout.
Discussion on the board...

Log-linear Regression Epilogue...
Goodness of Fit? Take 775 or CYFNS
Model Comparison? Take 775 or CYFNS
Other inferences (CIs Pis etc)? Take 775 or...
Other link functions are possible
Caveat all counts are not necessarily Poisson...

Write a Comment

User Comments (0)

About PowerShow.com

Regression with NonNormal Data: an Introduction - PowerPoint PPT Presentation

Regression with NonNormal Data: an Introduction

GLiM example: Log-Linear (Poisson) Regression. This material is supported ... Assumptions: data ... if the standard Normal quantile ('probit') function is used, ... – PowerPoint PPT presentation