Title: Logistic Regression III: Advanced topics
1Logistic Regression III Advanced topics
2Conditional Logistic Regression for Matched Data
3Recall Matching
- Matching can control for extraneous sources of
variability and increase the power of a
statistical test. - Match M controls to each case based on potential
confounders, such as age and gender. - If the data are matched, you must account for the
matching in the statistical analysis!!
4Recall Agresti example, diabetes and MI
- Match each MI case to an MI control based on age
and gender. - Ask about history of diabetes to find out if
diabetes increases your risk for MI.
5odds(favors case/discordant pair)
6Conditional Logistic Regression
7The Conditional Likelihood each discordant
stratum (rather than individual) gets 1 term in
the likelihood
For each stratum, we add to the likelihood the
CONDITIONAL probability that the case got disease
and the control did not, given that we have a
case-control pair.
Note the marginal probability of disease may
differ in each age-gender stratum, but we assume
that the (multiplicative) increase in disease
risk due to exposure is constant across strata.
8Recall probability terms
9(No Transcript)
10?The conditional likelihood
11Conditional Logistic Regression
12Example MI and diabetes
13Conditional Logistic Regression
14In SAS
- proc logistic data YourDatamodel MI (event
"Yes") diabetesstrata PairIDrun
15ExamplePrenatal ultrasound examinations and risk
of childhood leukemia case-control study BMJ
2000320282-283
- Could there be an association between exposure to
ultrasound in utero and an increased risk of
childhood malignancies? - Previous studies have found no association, but
they have had poor statistical power to detect an
association. - Swedish researchers performed a nationwide
population based case-control study using
prospectively assembled data on prenatal exposure
to ultrasound.
16ExamplePrenatal ultrasound examinations and risk
of childhood leukemia case-control study BMJ
2000320282-283
- 535 cases all children born and diagnosed as
having myeloid leukemia between 1973 and 1989 in
Swedish registers of birth, cancer, and causes of
death. - 535 matched controls 1 control was randomly
selected for each case from the Swedish Birth
Registry, matched by sex and year and month of
birth.
17115
85
235
100
But this type of analysis is limited to single
dichotomous exposure
18- Used conditional logistic regression to look at
dose-response with number of ultrasounds - Results
- Reference OR 1.0 no ultrasounds
- OR .91 for 1-2 ultrasounds
- OR.64 for gt3 ultrasounds
- Conclusion no evidence of a positive association
between prenatal ultrasound and childhood
leukemia even evidence of inverse association
(which could be explained by reasons for frequent
ultrasound)
19Extension 1M matching
- Each term in the likelihood represents a stratum
of 1M individuals - More complicated likelihood expression!
- Just as easy to implement in SAS as well see
Wednesday
20Ordinal Logistic Regression
21Ordinal Logistic Regression
- What if your outcome variable has more than two
levels? - For ordinal outcomes, use ordinal logistic
regression - Relies on the cumulative logit
- Models the predicted probability of multiple
outcomes - Also known as the proportional odds model
22Ordinal Variable Example Likert Scale
Cumulative outcomes strongly agree vs. the
rest agree or strongly agree vs. neutral or
negative agree or neutral vs. negative the
rest vs. strongly negative
- 1 strongly disagree
- 2 disagree
- 3 neutral
- 4 agree
- 5 strongly agree
Ordinal logistic regression gives you a way to
model these cumulative outcomes all at once!
23Ordinal Variable Example Continuous variable
measured crudely
- 1 breastfed gt6 months
- 2 breastfed 4-5 months
- 3 breastfed 2-3 months
- 4 breastfed lt2 months
The outcome variable, breastfeeding, was only
measured at limited time points. So, may not be
best modeled as continuous variable in linear
regression. Use ordinal logistic!
24Another example, 3 levels
From my data on runners
- 1 eumenorrhea (normal menses) (66.6)
- 2 oligomenorrhea (mild irregularity) (24.6)
- 3 amenorrhea (severe irregularity) (8.6)
25Cumulative logit, 3 groups(2 potential
positive outcomes)
In words The log odds of having amenorrhea
(versus everything else). And the log odds of
having any irregularity (versus normal).
26Corresponding logistic model (no predictors)
- The intercept-only model, no predictors (two
intercepts!) - Log odds (amenorrhea) ?amen
- Log odds (any irregularity) ?amen or oligo
27Fitted model
- Logit of amenorreha
- 8.6 of my sample has amenorrhea
- Odds 8.6/91.4.094
- Ln (.094) -2.3623
- Logit of any irregularity
- 33.3 has any irregularity (24.6 8.6)
- Odds(1/3)/(2/3) 1/2
- Ln(1/2) -.70
- ?Fitted models are Log odds (amenorrhea) -2.36
Log odds (any irregularity) -0.70
28Logistic model with predictors
- Log odds (amenorrhea) ?amen ß1X1 ß2X2
- Log odds (any irregularity) ?amen or oligo
ß1X1 ß2X2 - Note, different intercepts but shared betas
(shared slopes)!
29Odds ratio interpretation (a)
30Odds ratio interpretation (b)
31Odds ratio interpretation
- Interpretation of the betas
- eß adjusted odds ratio
- For every 1-unit increase in X, its the increase
in the odds of any menstrual irregularity
compared with none and its also the increase in
the odds of amenorrhea compared with the other
two categories (adjusted for any other predictors
in the model). - Note proportional odds assumption! The odds
ratios are the same across different levels of
the outcome.
32Example predictor, EDI-A
- Score on the anorexia subscale of the eating
disorder inventory (EDI-A)
33Cumulative logit plot (4 bins)
34Fitted model with EDI-A
Analysis of Maximum Likelihood Estimates
Standard
Wald Parameter DF Estimate
Error Chi-Square Pr gt ChiSq Intercept 1
1 -3.2630 0.3823 72.8648
lt.0001 Intercept 2 1 -1.3888 0.2478
31.4220 lt.0001 EDIA 1
0.1211 0.0265 20.9065
lt.0001 Log odds (amen) -3.2630
0.1211EDI-A Log odds (any irregularity)
-1.3888 0.1211EDI-A
35Fitted Model Predicted logit at every level of
EDI-A
36Compare actual data and fitted model
37Fitted model with EDI-A
Odds Ratio Estimates
Point 95 Wald Effect
Estimate Confidence Limits EDIA
1.129 1.072 1.189 For every 1-unit
increase in EDI-A score, theres a 13 increase
in the odds of being amenorrheic versus the other
two categories and a 13 increase in the odds of
being amenorrheic or oligomenorrheic versus
normal.
38Predictions
Log odds (outcome) -3.2630 -1.3888
0.1211EDIA-1 The model predicts that a woman
with an EDI-A score of 15 would have
39Predictions
Predicted logit.4281 Predicted probability
60.5
Predicted logit-1.446 Predicted probability 19
40Advantages disadvantages
- Ordinal logistic is better than running separate
logistic models for different outcomes (e.g., one
model for amenorrhea, one model for any
irregularity) because of the improvement in
statistical power! - Ordinal logistic prevents you from having to
arbitrarily turn an ordinal variable into a
binary variable! - But does require that you meet the proportional
odds assumption