Poisson Regression - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Poisson Regression

Description:

Measure of fit is Pearson chi-square/d.f. If 1, there's a ' ... Model Fit Statistics - Implications. Deviance and Pearson Chi-square can be large when ... – PowerPoint PPT presentation

Number of Views:107
Avg rating:3.0/5.0
Slides: 35
Provided by: www2U
Category:

less

Transcript and Presenter's Notes

Title: Poisson Regression


1
Poisson Regression
  • Lecture 8

2
Poisson Distribution
  • Count Data are often modeled using the Poisson
    distribution
  • The mean and variance of the count y equals ?
  • ? is the rate parameter the number of events
    one would expect to see for a particular unit of
    time or space
  • Often used as an approximation for the binomial
    distribution for rare events

3
Poisson Regression
  • Poisson regression models expected counts as
    follows
  • log(?)ß0ß1x
  • Taking the exponential of both sides,
  • What is the interpretation of ß?
  • If ß0, then exp(ß)1 and
  • If ßgt0, then exp(ß)gt1 and
  • If ßlt0, then exp(ß)lt1 and

4
Ideal Poisson Example
  • Data were simulated from a poisson regression
    model.

5
Seizure Example
  • A study examined the effectiveness of progabide
    on the number of seizures experienced by
    epileptics. Upon beginning either placebo or
    progabide treatment, patients made four biweekly
    visits to the doctor. The data provided gives
    the number of seizures reported in the two weeks
    preceding the fourth visit. Other variables
    measured include the age of the patient as well
    as how many seizures the patient experienced in
    the 8 weeks prior to entering the study.

6
Results Seizure Example
  • The prediction equation based on the Poisson
    Regression Model is
  • Interpreting the parameters
  • .0140
  • .2693

7
Seizure E.G. SAS program
  • DATA prog
  • INPUT subject seizs visit trt base age
  • IF visit ne 4 THEN delete
  • IF subject207 THEN delete
  • CARDS
  • 104 5 1 0 11 31
  • 104 3 2 0 11 31
  • 104 3 3 0 11 31
  • 104 3 4 0 11 31
  • 106 3 1 0 11 30
  • Etc.
  • 236 2 4 1 12 37
  • RUN
  • PROC GENMOD
  • CLASS trt
  • MODEL seizs trt base age/distpoisson
    linklog
  • RUN

8
Seizure E.G. SAS output
  • The GENMOD Procedure
  • Model Information
  • Data Set
    WORK.PROG
  • Distribution
    Poisson
  • Link Function
    Log
  • Dependent Variable
    seizs
  • Observations Used
    58
  • Criteria For Assessing
    Goodness Of Fit
  • Criterion DF
    Value Value/DF
  • Deviance 54
    147.0210 2.7226
  • Scaled Deviance 54
    147.0210 2.7226
  • Pearson Chi-Square 54
    136.6080 2.5298
  • Scaled Pearson X2 54
    136.6080 2.5298
  • Log Likelihood
    392.6703

9
Seizure E.G. SAS output, cont.
  • Analysis Of Parameter Estimates
  • Standard Wald
    95 Chi-
  • Parameter DF Estimate Error Confidence
    Limits Square Pr gt ChiSq
  • Intercept 1 0.5050 0.2638 -0.0119
    1.0220 3.67 0.0555
  • trt 0 1 0.2693 0.1134 0.0470
    0.4916 5.64 0.0176
  • trt 1 0 0.0000 0.0000 0.0000
    0.0000 . .
  • base 1 0.0221 0.0017 0.0187
    0.0255 161.15 lt.0001
  • age 1 0.0140 0.0086 -0.0028
    0.0309 2.66 0.1029
  • Scale 0 1.0000 0.0000 1.0000
    1.0000
  • NOTE The scale parameter was held fixed.

10
Are these results valid?
  • Assumptions for Poisson Regression Model
  • Random, Independent Sample
  • Mean is equal to variance
  • Standard deviation square root of mean
  • Explanatory variable is linearly linked to the
    log of the mean response
  • If a variable is continuous, then increasing it
    by 1 unit has a multiplicative effect on the
    expected response

11
Linearity with respect to log(µ)
  • Need only check for continuous predictors
  • For baseline counts and age

12
Checking Standard Deviations
  • Easiest to do when looking at categorical
    variables.
  • Group observations by categories.
  • Are the means approximately equal to the
    variances for each group?
  • From proc univariate,
  • For the progabide group, mean4.83333, std
    dev.4.2838
  • For the placebo group, mean7.9643, std
    dev.7.6278

13
Std devs more
  • Schematic Plots
  • 30
  • 0
  • 25 0
  • 0
  • 20

  • 0
  • 15
    0


14
Checking model with Deviances
  • Deviance defined as the difference between the
    -2log-likelihood of specified model and one that
    perfectly fits the data
  • Specified model
  • Perfect fit
  • Deviance
  • Deviance for model is 147.0210 with 54 degrees of
    freedom.
  • Measure of fit is deviance/d.f. If gtgt1, theres
    a overdispersion. (Here 2.7226.)

15
Checking model with Pearson Chi-square
  • Pearson goodness of fit statistic is
  • Measure of fit is Pearson chi-square/d.f. If
    gtgt1, theres a overdispersion. (Here 2.7226.)

16
Model Fit Statistics - Implications
  • Deviance and Pearson Chi-square can be large when
  • All appropriate covariates are not included in
    model (correctly)
  • When distributional assumptions are not correct
    (assumption variancemean)

17
Overdispersion
  • Overdispersion occurs when the responses have
    greater variability than when expected given the
    Poisson distribution.
  • In this case, standard errors for the regression
    parameters will be too small.
  • We can adjust the standard errors to account for
    this extra variability in the response
  • In particular, the overdispersion
    parameterX2/df.
  • Multiply the standard errors by the square root
    of this.
  • In the seizure example, square root(X2/df)1.6500.
  • This indicates that we increase the standard
    errors by 65.

18
SAS program with overdispersion
  • PROC GENMOD
  • CLASS trt
  • MODEL seizs trt base age/distpoisson
    linklog scalepearson
  • RUN

19
SAS output with overdispersion
  • The GENMOD Procedure
  • Criteria For Assessing Goodness Of Fit
  • Criterion DF
    Value Value/DF
  • Deviance 54
    147.0210 2.7226
  • Scaled Deviance 54
    58.1162 1.0762
  • Pearson Chi-Square 54
    136.6080 2.5298
  • Scaled Pearson X2 54
    54.0000 1.0000
  • Log Likelihood
    155.2193
  • Standard Wald
    95 Chi-
  • Parameter DF Estimate Error Confidence
    Limits Square Pr gt ChiSq
  • Intercept 1 0.5050 0.4195 -0.3172
    1.3273 1.45 0.2287
  • trt 0 1 0.2693 0.1804 -0.0842
    0.6228 2.23 0.1355
  • trt 1 0 0.0000 0.0000 0.0000
    0.0000 . .
  • base 1 0.0221 0.0028 0.0167
    0.0275 63.70 lt.0001
  • age 1 0.0140 0.0137 -0.0128
    0.0408 1.05 0.3051

20
Caution about overdispersion
  • We can get deviance statistics to look good
    just by allowing for a overdispersion parameter.
  • Should still examine linearity.
  • Can sometimes examine linearity via the Pearson
    residuals.
  • (observed-expected)/std.dev.
  • proc sort
  • by base
  • run
  • PROC GENMOD
  • CLASS trt
  • MODEL seizs trt base age/distpoisson
    linklog
  • scalepearson
    residuals
  • RUN

21
Interpreting
  • Pearson and Deviance Residuals are given by
    reschi and resdev
  • and
  • Plots of residuals vs. continuous predictors

22
Adding quadratic terms for continuous predictors
  • Analysis Of Parameter Estimates
  • Standard Wald
    95 Chi-
  • Parameter DF Estimate Error Confidence
    Limits Square Pr gt ChiSq
  • Intercept 1 -3.8873 1.6566 -7.1341
    -0.6404 5.51 0.0189
  • trt 0 1 0.2964 0.1537 -0.0049
    0.5978 3.72 0.0538
  • trt 1 0 0.0000 0.0000 0.0000
    0.0000 . .
  • base 1 0.0623 0.0096 0.0435
    0.0812 42.05 lt.0001
  • age 1 0.2623 0.1104 0.0459
    0.4787 5.64 0.0175
  • basesq 1 -0.0004 0.0001 -0.0005
    -0.0002 18.57 lt.0001
  • agesq 1 -0.0041 0.0019 -0.0077
    -0.0005 4.90 0.0268
  • Scale 0 1.3423 0.0000 1.3423
    1.3423
  • NOTE The scale parameter was estimated by the
    square root of Pearson's
  • Chi-Square/DOF.

23
Residuals for new model
  • The quadratic terms are statistically
    significant.
  • With the addition of these terms, the
    significance of the progabide treatment has gone
    from 0.1355 to 0.0538.

24
SAS Results for Simulated Data
  • Criteria For Assessing Goodness Of Fit
  • Criterion DF
    Value Value/DF
  • Deviance 98
    110.1826 1.1243
  • Scaled Deviance 98
    110.1826 1.1243
  • Pearson Chi-Square 98
    107.7462 1.0995
  • Scaled Pearson X2 98
    107.7462 1.0995
  • Log Likelihood
    1044.2169
  • Algorithm converged.
  • Analysis Of Parameter
    Estimates
  • Standard Wald
    95 Chi-
  • Parameter DF Estimate Error Confidence
    Limits Square Pr gt ChiSq
  • Intercept 1 0.9300 0.0973 0.7392
    1.1207 91.28 lt.0001
  • x 1 1.0496 0.0719 0.9088
    1.1905 213.30 lt.0001
  • Scale 0 1.0000 0.0000 1.0000
    1.0000

25
Residual Plot for Simulated Data
26
Poisson Regression for Rates
  • We are typically interested in the number of
    events that happen within a particular unit of
    time.
  • Suppose that we counted the number of times that
    a graduate student eats pizza during a week.
    However, some students were observed for 2 weeks
    and others for 3 weeks. How do we model the
    data?
  • Suppose that ? represents the mean number of
    times that a student eats pizza during a one week
    period.
  • Suppose that if we count over one week, this
    count would follow a Poisson distribution and
    have mean ?.
  • A count over two weeks would also follow a
    Poisson distribution with mean 2?.
  • A count over three weeks Poisson(3?)

27
Poisson for Rates, cont.
  • Therefore, we model
  • Or
  • Or
  • log(ti) is called the offset term.
  • The offset term is added to a poisson regression
    in SAS by adding the option offsetvariable-name.

28
Poisson for Rates, cont.
  • Suppose that you are modeling the number of
    people with asthma in a city. It is not fair to
    combine Los Angelos with Portland, Maine, unless
    you first somehow adjust for the sizes of the
    city.
  • This could be done per square mile of city,
    however
  • We could also count number of cases per
    ten-thousand people.

29
Melanoma
  • See page 354 in Stokes et al.
  • Gail 1978 and Koch, Imrey et al. 1985 reported
    the number of new melanoma cases reported in
    1969-1971 for white males in two regions.
    Researchers were interested in whether the rates
    varied across age groups or region (North/South)
  • The observed rates for each age-group and region
  • We also know the total number of people in each
    age group/region.
  • We would like to directly model counts
    (count/total).

30
Melanoma, cont.
  • The sample sizes used to calculate the rates in
    each category
  • We cannot just run linear regression on rates.
  • Heterogeneity of variances exists because
  • Counts follow a Poisson distribution, where
    variance is related to mean
  • Sample sizes vary between cells (Can think of
    modeling cases/1000. Thousands differ.)

31
Melanoma SAS saturated model
  • First, we run the saturated model with age (as a
    nominal predictor) and region main effects AND
    the interaction.
  • Model Information
  • Data Set
    WORK.MEL
  • Distribution
    Poisson
  • Link Function
    Log
  • Dependent Variable
    cases
  • Offset Variable
    ltotal
  • Observations Used
    12
  • Criteria For Assessing Goodness Of Fit
  • Criterion DF
    Value Value/DF
  • Deviance 0
    0.0000 .
  • Scaled Deviance 0
    0.0000 .
  • Pearson Chi-Square 0
    0.0000 .
  • Scaled Pearson X2 0
    0.0000 .
  • Log Likelihood
    2698.0337

32
Melanoma Saturated parameters
  • The GENMOD Procedure
  • LR Statistics For Type 3
    Analysis
  • Chi-
  • Source DF Square
    Pr gt ChiSq
  • age 5 715.99
    lt.0001
  • region 1 108.19
    lt.0001
  • ageregion 5 6.21
    0.2859
  • The interaction is not significant. Therefore,
    we need to consider a simpler model.
  • Type 3 LR statistics are obtained by adding the
    option type3 to the model statement.

33
Melanoma without interaction
  • Criteria For Assessing Goodness Of Fit
  • Criterion DF
    Value Value/DF
  • Deviance 5
    6.2149 1.2430
  • Scaled Deviance 5
    6.2149 1.2430
  • Pearson Chi-Square 5
    6.1151 1.2230
  • Scaled Pearson X2 5
    6.1151 1.2230
  • Log Likelihood
    2694.9262
  • LR Statistics For Type 3
    Analysis
  • Chi-
  • Source DF Square
    Pr gt ChiSq
  • age 5 796.74
    lt.0001
  • region 1 124.22
    lt.0001

34
Final Model Melanoma
  • Age and region as predictors.
  • The offset term was population size in number of
    10 thousands.
  • What does the value 2.3162 represent?
  • What does the value 1.0316 represent?
  • What does the value 0.8195 represent?
  • We can also predict the number of cases per 10
    thousand people for each age group and region
    combination.
Write a Comment
User Comments (0)
About PowerShow.com