Title: Labs 6
1Labs 6 7 Case-Control Analysis ----Logistic
Regression Henian Chen, M.D., Ph.D.
2Data Files Today we will use the case-control
study data of esophageal cancer. If you use
infile statement to read the case-control978.
dat file,Please make sure that you corrected the
miscoded values and the two abnormally high
values for alcohol. I corrected
case-control978.dbf, case-control978.wk3, and
case-control978.txt. You are welcome to use one
of them. proc import datafile'acase-control978.
txt' outcase_control978 dbmstab
replace getnamesyes run   proc import
datafile'acase-control978.wk3' out
case_control978 dbmswk3 replace getnamesyes
run  proc import datafile'acase-control978.dbf
' out case_control978 dbmsdbf replace run
3Logistic Regression Model
A regression model in which the dependent
variable is binary (yes, no). A form of the
generalized linear model in which the link
function is the logit, and the regression
parameters are expressed as log odds associated
with unit increase in the predictors. For
ordinal response outcomes (no pain, slight pain,
substantial pain), we can model the cumulative
logits by performing ordered logistic regression
using the proportional odds model For nominal
outcomes (Democrate, Republicans, Independents),
we can model the generalized logits by performing
logistic analysis using the log-linear model
4Logistic Regression for Intercept only SAS
Program proc logistic datacase_control978
descending model status run
Descending to get the probability and OR for
dependent variable1 SAS Output
The LOGISTIC Procedure
Model Information Data
Set WORK.CASE_CONTROL978
Response Variable status
Number of Response Levels 2
Number of Observations 978
Model binary logit
Optimization Technique Fisher's
scoring
5 Logistic Regression for Intercept only
SAS Output
Response
Profile Ordered
Total Value
status Frequency 1
1 200
2 0 778
Probability modeled is status1.
Model Convergence Status
Convergence criterion (GCONV1E-8)
satisfied. -2 Log L
990.8635 Analysis of Maximum
Likelihood Estimates
Standard Wald Parameter DF
Estimate Error Chi-Square Pr gt
ChiSq Intercept 1 -1.3584 0.0793
293.5837 lt.0001
6 Logistic Regression for Intercept only 1.
Calculate the log odds In our model, intercept
(a) -1.3584, -1.3584 is the log odds of cancer
for total sample 2. Take the antilog to get the
odds Oddsexp(-1.3584)0.2571 3. Divide the
odds by (1odds) to get the P (P means
probability in cohort or population, in
case-control study P means proportion) P
0.2571/(10.2571)0.2045 200/(200778) P is
related to a in Logistic Model
7 Logistic Regression for Dichotomous
Predictor Alcohol Consumption (alcgrp) 00-39
gm/day 140 gm/day SAS Program proc logistic
datacase_control978 descending model
statusalcgrp run SAS Output Model Fit
Statistics Criterion Intercept Only
Intercept and Covariates -2 Log L
990.863 901.036 Likelihood Ratio
Test G 990.863 901.036 89.827
df 1 The model with variable alcgrp is
significantly.
8 Logistic Regression for Dichotomous
Predictor SAS Output Analysis
of Maximum Likelihood Estimates
Standard Wald Parameter
DF Estimate Error Chi-Square Pr
gt ChiSq Intercept 1 -2.5911 0.1925
181.1314 lt.0001 alcgrp 1
1.7641 0.2132 68.4372 lt.0001
Odds Ratio Estimates
Point 95 Wald
Effect Estimate Confidence Limits
alcgrp 5.836 3.843
8.864
OR exp(ß) exp(1.7641) 5.836 Heavy
drinkers (alcgrp1) are about 6 times more likely
to get cancer than light drinkers (alcgrp0). OR
is not related to a in Logistic Model
9Logistic Regression for Dichotomous Predictor 1.
Calculate the log odds Light drinkers
(alcgrp0), log odds-2.5911 Heavy drinkers
(alcgrp1), log odds-2.59111.7641-
0.827 2. Take the antilog to get the
odds Light drinkers, Oddsexp(-2.5911)0.0749 Hea
vy drinkers, Oddsexp(-0.827)0.4374 3. Divide
the odds by (1odds) to get the P(x) Light
drinkers, P(x)0.0749/(10.0749)0.0697 Heavy
drinkers, P(x)0.4374/(10.4374)0.3043
10Logistic Regression for Ordinal
Predictor Alcohol Consumption (alcgrp4) 00-39
gm/day 140-79 gm/day
280-119 gm/day
3120 gm/day SAS Program proc logistic
datacase_control978 descending model
statusalcgrp4 run SAS Output
Model Fit Statistics Criterion
Intercept Only Intercept and Covariates -2
Log L 990.863 846.467
Likelihood Ratio Test G 990.863 846.467
144.396 df 1 The model with
variable alcgrp4 is significantly.
11Logistic Regression for Ordinal Predictor SAS
Output Analysis of Maximum
Likelihood Estimates
Standard Wald Parameter DF
Estimate Error Chi-Square Pr gt
ChiSq Intercept 1 -2.4866 0.1459
290.4172 lt.0001 alcgrp4 1
1.0453 0.0934 125.2007 lt.0001
Odds Ratio Estimates
Point 95 Wald
Effect Estimate Confidence Limits
alcgrp4 2.844 2.368
3.416 OR exp(1.0453) 2.844. Men with
alcgrp41 are about 3 times more likely to get
cancer than men with alcgrp40. This OR is also
for alcgrp4 1 vs. alcgrp42 or alcgrp42 vs.
alcgrp43. OR exp(3-1)1.0453 exp(2.0906)
8.090 for alcgrp41 vs. alcgrp43 OR
exp(3-0)1.0453 exp(3.1359) 23.009 for
alcgrp40 vs. alcgrp43
12ORexp(ßx) is a special case when 1. X is a
binary variable 2. No interactions between X and
other variables If X is not a binary
variable ORexpßx(X-X) If X is not a
binary variable, and there is a interaction
between X and W, ORexp(X-X)(ßx ßxwW)
13Logistic Regression for Continuous
Predictor Alcohol Consumption (alcohol) daily
consumption in grams SAS Program proc logistic
datacase_control978 descending model
statusalcohol run SAS Output Analysis of
Maximum Likelihood Estimates
Standard Wald Parameter
DF Estimate Error Chi-Square Pr gt
ChiSq Intercept 1 -2.9741 0.1807
270.9266 lt.0001 alcohol 1
0.0261 0.00232 126.4179 lt.0001
Odds Ratio Estimates
Point 95 Wald
Effect Estimate Confidence Limits
alcohol 1.026 1.022 1.031
14Logistic Regression for Continuous Predictor
OR exp(0.0261) 1.026. The odds of cancer
increase by a factor of 1.026 for each unit in
alcohol consumption OR exp40(0.0261)
exp(1.044) 2.8406 for a 40-grams increase in
alcohol consumption per day OR
exp120(0.0261) 22.825 for a man who drinks
160 grams per day compare with a man who is
similar in other respects but drinks 40 grams
per day.
15 Interaction in Logistic Regression model status
a ß1 alcgrp ß2 tobgrp ß1 the effect of
alcohol on cancer, controlling for tobacco (i.e.,
the same OR across levels of tobacco) ß2 the
effect of tobacco on cancer, controlling for
alcohol (i.e., the same OR across levels of
alcohol) model status a ß1 alcgrp ß2
tobgrp ß3 alcgrptobgrp ß1 the effect of
alcohol on cancer among non-smokers
(tobgrp0) ß2 the effect of tobacco on cancer
among non-drinkers (alcgrp0) ß3 interaction
between smokers and drinkers
16 Interaction in Logistic Regression model
status -3.33 2.28 (alcgrp) 1.38 (tobgrp)
0.98 (alcgrptobgrp)
Log odds odds A
alcgrp0 tobgrp0 2.280 1.380
0.9800 0.00 1.00 B alcgrp1
tobgrp0 2.281 1.380 0.9810 2.28
9.78 C alcgrp0 tobgrp1 2.280
1.381 0.9801 1.38 3.97 D
alcgrp1 tobgrp1 2.281 1.381
0.9811 2.68 14.59 Odds
Ratio A vs. B 9.78 9.78/1.00 A vs.
C 3.97 3.97/1.00 A vs. D 14.59
14.59/1.00 B vs. D 1.49 14.59/9.78 C vs.
D 3.68 14.59/3.97