Title: Logistic Regression
1Logistic Regression
- In logistic regression the outcome variable is
binary, and the purpose of the analysis is to
assess the effects of multiple explanatory
variables, which can be numeric and/or
categorical, on the outcome variable.
2Requirements for Logistic Regression
- The Following need to be specified
- An outcome variable with two possible categorical
outcomes (1success 0failure). - A way to estimate the probability P of the
outcome variable. - A way of linking the outcome variable to the
explanatory variables. - A way of estimating the coefficients of the
regression equation, as well as their confidence
intervals. - A way to test the goodness of fit of the
regression model.
3Measuring the Probability of Outcome
- The probability of the outcome is measured by the
odds of occurrence of an event. - If P is the probability of an event, then (1-P)
is the probability of it not occurring. - Odds of success P / 1-P
4The Logistic Regression
- The joint effects of all explanatory variables
put together on the odds is - Odds P/1-P e a ß1X1 ß2X2 ßpXp
- Taking the logarithms of both sides
- LogP/1-P log aß1X1ß2X2ßpXp
- Logit P aß1X1ß2X2..ßpXp
- The coefficients ß1, ß2, ßp are such that the
sums of the squared distance between the observed
and predicted values (i.e. regression line) are
smallest.
5The Logistic Regression
- Logit p a ß1X1 ß2X2 .. ßpXp
- a represents the overall disease risk
- ß1 represents the fraction by which the disease
risk is altered by a unit change in X1 - ß2 is the fraction by which the disease risk is
altered by a unit change in X2 - . and so on.
- What changes is the log odds. The odds themselves
are changed by eß - If ß 1.6 the odds are e1.6 4.95
6Analysis in Logistic Regression - 1
- The study to be analysed is about the use of
radioisotope thallium while the subject is made
to exercise. 100 subjects underwent both thallium
exercise and cardiac catheterisation. Some were
on propranol. Change in heart rate if more than
85 of maximum, E.C.G. and occurrence of pain
during exercise were recorded.
7Interpreting the Computer Printout
Logistic Regression Table
Odds 95
CI Predictor Coef SE Coef Z P
Ratio Lower Upper Constant -0.9349
0.5165 -1.81 0.070 Stn 0.03080
0.01482 2.08 0.038 1.03 1.00
1.06 Propran 0.6000 0.4844 1.24
0.215 1.82 0.71 4.71 HrtRate
-0.4234 0.4735 -0.89 0.371 0.65
0.26 1.66 IscExr -0.6322 0.6601
-0.96 0.338 0.53 0.15 1.94 Sex
-0.2996 0.4780 -0.63 0.531 0.74
0.29 1.89 PnExr 0.6953 0.4009
1.73 0.083 2.00 0.91
4.40 Log-Likelihood -57.650
Test that all slopes are zero G 12.907, DF
6, P-Value 0.045 Goodness-of-Fit Tests Method
Chi-Square DF P Pearson
60.350 38 0.012 Deviance
66.811 38 0.003 Hosmer-Lemeshow
14.243 6 0.027
8Interpreting the Computer Printout - 2
Table of Observed and Expected Frequencies (See
Hosmer-Lemeshow Test for the Pearson Chi-Square
Statistic)
Group Value 1 2 3 4 5 6
7 8 Total 1 Obs 2 1 2
5 5 6 9 4 34 Exp 2.4
4.1 2.9 3.1 3.7 5.0 5.7 7.0 0 Obs
15 18 9 5 5 6 2 6
66 Exp 14.6 14.9 8.1 6.9 6.3
7.0 5.3 3.0
Pairs Number Percent Summary
Measures Concordant 1664 74.2
Somers' D 0.51 Discordant
524 23.4 Goodman-Kruskal Gamma
0.52 Ties 56 2.5
Kendall's Tau-a 0.23 Total
2244 100.0
9Regression Diagnostics
- In logistic regression Residual 1- Estimated
probability. Residuals for each subject are
calculated standardised and plotted against
probability. Eight diagnostic plots are
available, four dealing with residuals and four
with leverage. - These plots are demonstrated in the slides that
follow.
10Diagnostic plots for residuals
11Diagnostic plots for leverage