Regression with NonNormal Data: an Introduction - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Regression with NonNormal Data: an Introduction

Description:

GLiM example: Log-Linear (Poisson) Regression. This material is supported ... Assumptions: data ... if the standard Normal quantile ('probit') function is used, ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 22
Provided by: DEdw7
Category:

less

Transcript and Presenter's Notes

Title: Regression with NonNormal Data: an Introduction


1
Regression with Non-Normal Data an Introduction
  • Outline
  • A. Review GLMs
  • B. Generalized Linear Models General Formulation
  • C. GLiM example Logistic Regression
  • D. GLiM example Log-Linear (Poisson) Regression
  • This material is supported by PB section 7.2

2
  • A. Review GLMs
  • Assumptions data (Y, X1,X2,Xp-1)
  • 1. Y has a Normal distribution with mean EY and
    variance s2
  • 2. EY b0b1X1 b2X2bp-1Xp-1
  • for some unknown b0,b1,..,bp-1
  • 3. Xs measured with negligible error
  • 4. Experimental runs (Yi,Xi1,Xi2,,Xi,p-1) give
    independent Yis

3
  • Comments
  • This GLM model includes all classical regression,
    multiple regression, and fixed effects ANOVA and
    ANCOVA models as special cases.
  • General Linear Mixed Model (GLMM) The expression
    for EY in the GLM is conditional on the
    obtained values of any random effects. Then, add
    a line specifying the distribution of these
    random effects. Also, GLMMs allow for Ys to
    have unequal variances, and to be correlated.

4
  • B. GLiMs General formulation
  • Assumptions data (Y, X1,X2,Xp-1)
  • 1. Y has distribution __________ (user specifies)
    with mean EY m
  • 2. g(m) b0b1X1 b2X2bp-1Xp-1
  • for some unknown b0,b1,..,bp-1 and some
    specified monotonic function g().
  • 3. Xs measured with negligible error
  • 4. There may be other parameters for Y, also,
    e.g. scale, shape parameters.
  • 5. Experimental runs (Yi,Xi1,Xi2,,Xi,p-1) give
    independent Yis

5
  • Comments
  • g() is called the link function. Note that
    assumption (2) is equivalent to
  • m g-1(b0b1X1 b2X2bp-1Xp-1)
  • The specified distribution must be a member of
    the exponential family
  • assumptions imply a likelihood function for the
    unknowns b0,b1,..,bp-1 (plus scale, shape etc
    parameters)
  • Obtain (hopefully) MLEs of unknown parameters by
    iterative maximization algorithms (no starting
    values needed)
  • All inference is approximate, using large sample
    MLE theory

6
  • C. GLiM example Logistic Regression
  • Assumptions data (Y, X1,X2,Xp-1)
  • 1. Y has a Bernoulli distribution (values 0 and
    1) with success probability p EY
  • 2. logp/(1-p) b0b1X1 b2X2bp-1Xp-1
  • for some unknown b0,b1,..,bp-1
  • 3. Xs measured with negligible error
  • 4. There may be a scale parameter for Y
  • 5. Experimental runs (Yi,Xi1,Xi2,,Xi,p-1) give
    independent Yis

7
  • Comments
  • Here, we see the most popular link function for
    the case of Bernoulli Y, the logit function
  • This implies that EY is a logistic function of
    the Xs

8
1
9
  • Comments
  • The modeling of dichotomous Y using the logit
    link function has come to be called logistic
    regression
  • Includes ANOVA and ANCOVA models for the
    log-odds, logp/(1-p)
  • Parameter meaning
  • ?j change in the log-odds of success
    logep/(1-p) per unit increase in Xj,
    j1,2,..,p-1.
  • OR...in other words...
  • (e?j-1)x100 percent increase in odds of
    success per unit increase in Xj
  • e?0 odds of success when all Xjs0 (if this is
    not an extrapolation).

10
Logistic Regression Example 7.4
  • X dose of tBHQ (6 choices)
  • success if micronuclei formed
  • Y number of successes in 2000 independent (?)
    cells
  • The text Figs. 7.11-12 shows a SAS program and
    output for simple logistic analysis using PROC
    LOGISTIC
  • b0 -5.8067 (0.2345)
  • b1 0.00283 (0.000412)

11
  • Comments
  • Note syntax for fitting this model when data are
    summarized as total counts over n independent
    trials at each X
  • Note PROC GENMOD syntax
  • The LR/deviance test for H0 ?1 0 in row
    labeled -2 log L. G2CALC 56.584 on 1 df,
    Plt0.0001
  • The score statistic for H0 ?1 0 in row labeled
    Score T2CALC 53.427 on 1 df, Plt.0001
  • The Wald test for H0 ?1 0 in MLE table W2CALC
    47.2266 on 1 df, Plt.0001 however, the Wald
    test is not recommended here (suffers from
    certain instabilities)

12
  • Estimated Parameter meaning...
  • b1 0.00283 estimated change in the log-odds
    of success (micronuclei formation) per unit
    increase in dose X1
  • In other words...
  • (e?j-1)x100 (1.0028-1)x100 0.28 estimated
    percent change in odds of micronuclei formation
    per unit increase in dose X1
  • ORfor a 100-unit increase in X (more typical),
  • (e100?j-1)x100 (1.327-1) x100 33
  • estimated percent change in odds of micronuclei
    formation per 100 unit increase in dose X1
  • Can back-transform a C.I. for ?1, also.

13
  • Estimated Parameter meaning...
  • eb0 0.0030 estimated odds of micronuclei
    formation when dose X0
  • When all else fails, make a picturehere in R

gtdosegridlt-seq(0,800,10) gtlogoddshatlt-(-5.8067)
(0.00283dosegrid) gtpihatgridlt-1/(1exp(-logodd
shat)) gtplot(dosegrid,pihatgrid,type"l",
lwd3,xlab"tBHQ dose",ylab"") gt
title("Estimated Probability of Micronuclei
Formation")
14
  • Dichotomous Response Epilogue...
  • Goodness of Fit? Take 775 or CYFNS
  • Model Comparison? Take 775 or CYFNS
  • Other inferences (CIs Pis etc)? Take 775 or...
  • Other link functions are possible. For example,
    if the standard Normal quantile (probit)
    function is used, we are doing Probit
    Regression.
  • Caveat All proportions
  • (successes/trials)
  • are not necessarily binomial...

15
What is a Poisson(m) R.V.?
D. GLiM example Log-Linear (Poisson) Regression
  • This kind of random variable is often used to
    model the number of times an event occurs in a
    fixed amount of time (or space), e.g.
  • number of gamma ray emissions in a millisecond
  • number of sightings of a rare species in a fixed
    region and timeframe
  • number of fish of a given species caught in a
    single trawl

16
What is a Poisson(m) R.V.?
17
  • Log-Linear (Poisson) Regression
  • Assumptions data (Y, X1,X2,Xp-1)
  • 1. Y has a Poisson distribution (counts of some
    event in a fixed time / space) with mean m
  • 2. logm b0b1X1 b2X2bp-1Xp-1
  • for some unknown b0,b1,..,bp-1
  • 3. Xs measured with negligible error
  • 4. There may also be a scale parameter for Y
  • 5. Experimental runs (Yi,Xi1,Xi2,,Xi,p-1) give
    independent Yis

18
  • Comments
  • Here, we see the most popular link function for
    the case of Poisson Y, the log-link function
  • This implies that EY is an exponential function
    of the Xs

19
  • Comments
  • Includes ANOVA and ANCOVA models for logm
  • Parameter meaning
  • ?j change in the logarithm of EY per unit
    increase in Xj, j1,2,..,p-1.
  • OR...in other words...
  • (e?j-1)x100 percent increase in EY per unit
    increase in Xj (all other Xs fixed)
  • e?0 EY when all Xjs0 (if this is not an
    extrapolation).

20
Log-Linear Regression Example 7.5
  • X dose of nitrofen (toxic)
  • Y number of C. Dubia offspring
  • The text Figs. 7.13-15 shows a SAS program and
    output for log-linear and log-quadratic models
    using PROC GENMOD. These are enhanced by a
    handout.
  • Discussion on the board...

21
  • Log-linear Regression Epilogue...
  • Goodness of Fit? Take 775 or CYFNS
  • Model Comparison? Take 775 or CYFNS
  • Other inferences (CIs Pis etc)? Take 775 or...
  • Other link functions are possible
  • Caveat all counts are not necessarily Poisson...
Write a Comment
User Comments (0)
About PowerShow.com