Lecture 15: Logistic Regression: Inference and link functions - PowerPoint PPT Presentation

About This Presentation
Title:

Lecture 15: Logistic Regression: Inference and link functions

Description:

Lecture 15: Logistic Regression: Inference and link functions BMTRY 701 Biostatistical Methods II More on our example Other covariates: Simple logistic models What is ... – PowerPoint PPT presentation

Number of Views:235
Avg rating:3.0/5.0
Slides: 24
Provided by: peopleMu1
Learn more at: http://people.musc.edu
Category:

less

Transcript and Presenter's Notes

Title: Lecture 15: Logistic Regression: Inference and link functions


1
Lecture 15Logistic Regression Inference and
link functions
  • BMTRY 701Biostatistical Methods II

2
More on our example
gt pros5.reg lt- glm(cap.inv log(psa) gleason,
familybinomial) gt summary(pros5.reg) Call glm(f
ormula cap.inv log(psa) gleason, family
binomial) Coefficients Estimate
Std. Error z value Pr(gtz) (Intercept)
-8.1061 0.9916 -8.174 2.97e-16 log(psa)
0.4812 0.1448 3.323 0.000892
gleason 1.0229 0.1595 6.412
1.43e-10 --- Signif. codes 0 0.001
0.01 0.05 . 0.1 1 (Dispersion
parameter for binomial family taken to be 1)
Null deviance 512.29 on 379 degrees of
freedom Residual deviance 403.90 on 377
degrees of freedom AIC 409.9
3
Other covariates Simple logistic models
Covariate Beta exp(Beta) Z
Age -0.0082 0.99 -0.51
Race -0.054 0.95 -0.15
Vol -0.014 0.99 -2.26
Dig Exam (vs. no nodule)
Unilobar left 0.88 2.41 2.81
Unilobar right 1.56 4.76 4.78
Bilobar 2.10 8.17 5.44
Detection in RE 1.71 5.53 4.48
LogPSA 0.87 2.39 6.62
Gleason 1.24 3.46 8.12
4
What is a good multiple regression model?
  • Principles of model building are analogous to
    linear regression
  • We use the same approach
  • Look for significant covariates in simple models
  • consider multicollinearity
  • look for confounding (i.e. change in betas when a
    covariate is removed)

5
Multiple regression model proposal
  • Gleason, logPSA, Volume, Digital Exam result,
    detection in RE
  • But, what about collinearity? 5 choose 2 pairs.

gleason log.psa. vol gleason 1.00
0.46 -0.06 log.psa. 0.46 1.00 0.05 vol
-0.06 0.05 1.00

6
Categorical pairs
gt dpros.dcaps lt- epitab(dpros, dcaps) gt
dpros.dcapstab Outcome Predictor 1
p0 2 p1 oddsratio lower
upper 1 95 0.2802360 4 0.09756098
1.000000 NA NA 2 123
0.3628319 9 0.21951220 1.737805 0.5193327
5.815089 3 84 0.2477876 12 0.29268293
3.392857 1.0540422 10.921270 4 37
0.1091445 16 0.39024390 10.270270 3.2208157
32.748987 Outcome Predictor
p.value 1 NA 2
4.050642e-01 3 3.777900e-02 4
1.271225e-05 gt fisher.test(table(dpros, dcaps))
Fisher's Exact Test for Count Data data
table(dpros, dcaps) p-value 2.520e-05 alternati
ve hypothesis two.sided
7
Categorical vs. continuous
  • t-tests and anova means by category

gt summary(lm(log(psa)dcaps)) Coefficients
Estimate Std. Error t value Pr(gtt)
(Intercept) 1.2506 0.1877 6.662 9.55e-11
dcaps 0.8647 0.1632 5.300
1.97e-07 --- Residual standard error 0.9868
on 378 degrees of freedom Multiple R-squared
0.06917, Adjusted R-squared 0.06671
F-statistic 28.09 on 1 and 378 DF, p-value
1.974e-07 gt summary(lm(log(psa)factor(dpros)))
Coefficients Estimate Std.
Error t value Pr(gtt) (Intercept)
2.1418087 0.0992064 21.589 lt 2e-16
factor(dpros)2 -0.1060634 0.1312377 -0.808
0.419 factor(dpros)3 0.0001465 0.1413909
0.001 0.999 factor(dpros)4 0.7431101
0.1680055 4.423 1.28e-05 --- Residual
standard error 0.9871 on 376 degrees of
freedom Multiple R-squared 0.07348, Adjusted
R-squared 0.06609 F-statistic 9.94 on 3 and
376 DF, p-value 2.547e-06
8
Categorical vs. continuous
gt summary(lm(voldcaps)) Coefficients
Estimate Std. Error t value Pr(gtt)
(Intercept) 22.905 3.477 6.587 1.51e-10
dcaps -6.362 3.022 -2.106
0.0359 --- Residual standard error 18.27 on
377 degrees of freedom (1 observation deleted
due to missingness) Multiple R-squared 0.01162,
Adjusted R-squared 0.009003 F-statistic
4.434 on 1 and 377 DF, p-value 0.03589 gt
summary(lm(volfactor(dpros))) Coefficients
Estimate Std. Error t value Pr(gtt)
(Intercept) 17.417 1.858 9.374
lt2e-16 factor(dpros)2 -1.638 2.453
-0.668 0.505 factor(dpros)3 -1.976
2.641 -0.748 0.455 factor(dpros)4
-3.513 3.136 -1.120 0.263
--- Residual standard error 18.39 on 375
degrees of freedom (1 observation deleted due
to missingness) Multiple R-squared 0.003598,
Adjusted R-squared -0.004373 F-statistic
0.4514 on 3 and 375 DF, p-value 0.7164
9
Categorical vs. continuous
gt summary(lm(gleasondcaps)) Coefficients
Estimate Std. Error t value Pr(gtt)
(Intercept) 5.2560 0.1991 26.401 lt 2e-16
dcaps 1.0183 0.1730 5.885
8.78e-09 --- Residual standard error 1.047
on 378 degrees of freedom Multiple R-squared
0.08394, Adjusted R-squared 0.08151
F-statistic 34.63 on 1 and 378 DF, p-value
8.776e-09 gt summary(lm(gleasonfactor(dpros)))
Coefficients Estimate Std. Error
t value Pr(gtt) (Intercept) 5.9798
0.1060 56.402 lt 2e-16 factor(dpros)2
0.4217 0.1403 3.007 0.00282
factor(dpros)3 0.4890 0.1511 3.236
0.00132 factor(dpros)4 0.9636 0.1795
5.367 1.40e-07 --- Residual standard error
1.055 on 376 degrees of freedom Multiple
R-squared 0.07411, Adjusted R-squared
0.06672 F-statistic 10.03 on 3 and 376 DF,
p-value 2.251e-06
10
Lots of correlation between covariates
  • We should expect that there will be
    insignificance and confounding.
  • Still, try the full model and see what happens

11
Full model results
gt mreg lt- glm(cap.inv gleason log(psa) vol
dcaps factor(dpros), familybinomial) gt gt
summary(mreg) Coefficients
Estimate Std. Error z value Pr(gtz)
(Intercept) -8.617036 1.102909 -7.813
5.58e-15 gleason 0.908424 0.166317
5.462 4.71e-08 log(psa) 0.514200
0.156739 3.281 0.00104 vol
-0.014171 0.007712 -1.838 0.06612 . dcaps
0.464952 0.456868 1.018 0.30882
factor(dpros)2 0.753759 0.355762 2.119
0.03411 factor(dpros)3 1.517838 0.372366
4.076 4.58e-05 factor(dpros)4 1.384887
0.453127 3.056 0.00224 --- Null
deviance 511.26 on 378 degrees of
freedom Residual deviance 376.00 on 371
degrees of freedom (1 observation deleted due
to missingness) AIC 392
12
What next?
  • Drop or retain?
  • How to interpret?

13
Likelihood Ratio Test
  • Recall testing multiple coefficients in linear
    regression
  • Approach ANOVA
  • We dont have ANOVA for logistic
  • More general approach Likelihood Ratio Test
  • Based on the likelihood (or log-likelihood) for
    competing nested models

14
Likelihood Ratio Test
  • Ho small model
  • Ha large model
  • Example

15
Recall the likelihood function
16
Estimating the log-likelihood
  • Recall that we use the log-likelihood because it
    is simpler (back to linear regression)
  • MLEs
  • Betas are selected to maximize the likelihood
  • Betas also maximize the log-likelihood
  • If we plus the estimated betas, we get our
    maximized log-likelihood for that model
  • We compare the log-likelihoods from competing
    (nested) models

17
Likelihood Ratio Test
  • LR statistic G2 -2(LogL(H0)-LogL(H1))
  • Under the null G2 ?2(p-q)
  • If G2 lt ?2(p-q),1-a, conclude H0
  • If G2 gt ?2(p-q),1-a conclude H1

18
LRT in R
  • -2LogL Residual Deviance
  • So, G2 Dev(0) - Dev(1)
  • Fit two models

19
gt mreg1 lt- glm(cap.inv gleason log(psa) vol
factor(dpros), familybinomial) gt mreg0 lt-
glm(cap.inv gleason log(psa) vol,
familybinomial) gt mreg1 Coefficients
(Intercept) gleason log(psa)
vol -8.31383 0.93147
0.53422 -0.01507 factor(dpros)2
factor(dpros)3 factor(dpros)4 0.76840
1.55109 1.44743 Degrees of
Freedom 378 Total (i.e. Null) 372 Residual
(1 observation deleted due to missingness) Null
Deviance 511.3 Residual Deviance 377.1
AIC 391.1 gt mreg0 Coefficients (Intercept)
gleason log(psa) vol
-7.76759 0.99931 0.50406 -0.01583
Degrees of Freedom 378 Total (i.e. Null) 375
Residual (1 observation deleted due to
missingness) Null Deviance 511.3 Residual
Deviance 399 AIC 407
20
Testing DPROS
  • Dev(0) Dev(1)
  • p q
  • ?2(p-q),1-a,
  • Conclusion?
  • p-value?

21
More in R
qchisq(0.95,3) -2(logLik(mreg0) -
logLik(mreg1)) 1-pchisq(21.96, 3) gt
anova(mreg0, mreg1) Analysis of Deviance
Table Model 1 cap.inv gleason log(psa)
vol Model 2 cap.inv gleason log(psa) vol
factor(dpros) Resid. Df Resid. Dev Df
Deviance 1 375 399.02 2
372 377.06 3 21.96 gt
22
Notes on LRT
  • Again, models have to be NESTED
  • For comparing models that are not nested, you
    need to use other approaches
  • Examples
  • AIC
  • BIC
  • DIC
  • Next time.

23
For next time, read the following article
Low Diagnostic Yield of Elective Coronary
Angiography Patel, Peterson, Dai et al. NEJM,
362(10). pp. 2886-95 March 11, 2010
http//content.nejm.org/cgi/content/short/362/10/8
86?ssourcemfv
Write a Comment
User Comments (0)
About PowerShow.com