Title: Binary Logistic Regression
1Binary Logistic Regression To be or not to be,
that is the question..(William Shakespeare,
Hamlet)
2Binary Logistic Regression
- Also known as logistic or sometimes logit
regression - Foundation from which more complex models derived
- e.g., multinomial regression and ordinal logistic
regression
3Dichotomous Variables
- Two categories indicating whether an event has
occurred or some characteristic is present - Sometimes called binary or binomial variables
4Dichotomous DVs
- Placed in foster care or not
- Diagnosed with a disease or not
- Abused or not
- Pregnant or not
- Service provided or not
5Single (Dichotomous) IV Example
- DV continue fostering, 0 no, 1 yes
- Customary to code category of interest 1 and the
other category 0 - IV married, 0 not married, 1 married
- N 131 foster families
- Are two-parent families more likely to continue
fostering than one-parent families?
6Crosstabulation
- Table 2.1
- Relationship between marital status and
continuation is statistically significant ?2(1,
N 131) 5.65, p .017 - A higher percentage of two-parent families
(62.20) than single-parent families (40.82)
planned to continue fostering
7Strength Direction of Relationships
- Different ways to quantify the relationship
between IV(s) and DV - Probabilities
- Odds
- Odds Ratio (OR)
- Also abbreviated as eB, Exp(B) (on SPSS output),
or exp(B) - change
8Roadmap to Computations
9Probabilities
- Percentages in Table 2.1 as probabilities (e.g.,
62.20 as .6220) - p
- Probability that event will occur (continue)
- e.g., probability that one-parent families plan
to continue is .4082 - 1 p
- Probability that event will not occur (not
continue) - e.g., probability that one-parent families do not
plan to continue is .5918 (1 - .4082)
10Odds
- Ratio of probability that event will occur to
probability that it will not - e.g., odds of continuation for one-parent
families are .69 (.4082 / .5918) - Can range from 0 to positive infinity
11Probabilities and Odds
- Table 2.2
- Odds 1
- Both outcomes equally likely
- Odds gt 1
- Probability that event will occur greater than
probability that it will not - Odds lt 1
- Probability that event will occur less than
probability that it will not
12Odds Ratio (OR)
- Odds of the event for one value of the IV
(two-parent families) divided by the odds for a
different value of the IV, usually a value one
unit lower (one-parent families) - e.g., odds of continuing for two-parent families
more than double the odds for one-parent families - OR 1.6455 / .6898 2.39
13OR (contd)
- Plays a central role in quantifying the strength
and direction of relationships between IVs and
DVs in binary, multinomial, and ordinal logistic
regression - OR lt 1 indicates a negative relationship
- OR gt 1 indicates a positive relationship
- OR 1 indicates no linear relationship
14ORs gt 1
- e.g., OR of 2.39
- A one-unit increase in the independent variable
increases the odds of continuing by a factor of
2.39 - The odds of continuing are 2.39 times higher for
two-parent compared to one-parent families
15ORs lt 1
- e.g., OR .50
- A one-unit increase in the independent variable
decreases the odds of continuing by a factor of
.50 - The odds that two-parent families will continue
are .50 (or one-half) of the odds that one-parent
families will continue
16ORs lt 1 (contd)
- Compute reciprocal (i.e., 1 / .50 2.00)
- Express relationship as opposite event of
interest (e.g., discontinuing) - A one-unit increase in the independent variable
increases the odds of discontinuing by a factor
of 2.00 - The odds that two-parent families will
discontinue are 2.00 times (or twice) the odds of
one-parent families
17OR to Percentage Change
- change 100(OR 1)
- Alternative way to express OR
- e.g., A one-unit increase in the independent
variable increases the odds of continuing by
139.00 - 100(2.39 1) 139.00
- e.g., A one-unit increase in the independent
variable decreases the odds of continuing by
50.00 - 100(.50 1) -50.00
18Comparing OR gt 1 and OR lt 1
- Compute reciprocal of one of the ORs
- e.g., OR of 2.00 and an OR of .50
- Reciprocal of .50 is 2.00 (1 / .50 2.00)
- ORs are equal in size (but not in direction of
the relationship)
19Qualitative Descriptors for OR
- Table 2.3
- Use cautiously with IVs that arent dichotomous
20Question Answer
- Are two-parent families more likely to continue
fostering than one-parent families? - Yes. The odds of continuing are 2.39 times (139)
higher for two-parent compared to one-parent
families. The probability of continuing is .41
for one-parent families and .62 for two-parent
families.
21Binary Logistic Regression Example
- DV continue fostering, 0 no, 1 yes
- Customary to code category of interest 1 and the
other category 0 - IV married, 0 not married, 1 married
- N 131 foster families
- Are two-parent families more likely to continue
fostering than one-parent families?
22Statistical Significance
- Table 2.4
- Relationship between marital status and
continuation is statistically significant (Wald
?2 5.544, p .019)
23Direction of Relationship
- B slope
- Positive slope, positive relationship
- OR gt 1
- Negative slope, negative relationship
- OR lt 1
- 0 slope, no linear relationship
- OR 1
24Direction/Strength of Relationship
- Positive relationship between marital status and
continuation - Two-parent families more likely to continue
- B .869
- Exp(B) OR 2.385
- change 100(2.385 - 1) 139
- The odds of continuing are 2.39 times (139)
higher for two-parent compared to one-parent
families
25Roadmap to Computations
26Binary Logistic Regression Model
- ln(p/ (1 - p)) a ?1X1 ? 1X2 ? kXk, or
- ln(p / (1 - p)) ?
- p is the probability of the event
- ? (eta) is the abbreviation for the linear
predictor (right hand side of this equation) - k number of independent variables
27Logit Link
- ln(p / (1 - p))
- Log of the odds that the DV equals 1 (event
occurs) - Connects (i.e., links) DV to linear combination
of IVs
28Estimated Logits (L)
- ln(p / 1 - p) a B1X1 B1X2 BkXk
- ln(p / 1 p)
- Log of the odds that the DV equals 1 (event
occurs) - Estimated logit, L
- Does not have intuitive or substantive meaning
- Useful for examining curvilinear relationships
and interaction effects - Primarily useful for estimating probabilities,
odds, and ORs
29Estimated Logits (L)
- L(Continue) a BMarriedXMarried
- L(Continue) -.372 (.869)(XMarried)
- a intercept
- B slope
30Logit to Odds
- If L 0
- Odds eL e0 1.00
- If L .50
- Odds eL e.50 1.65
- If L 1.00
- Odds eL e1.00 2.72
31Logits to Odds (contd)
- Table 2.4
- One-parent families
- L(Continue) -.372 -.372 (.869)(0)
- Odds of continuing e-.372 .69
- Two-parent families
- L(Continue) .497 -.372 (.869)(1)
- Odds of continuing e.497 1.65
32Odds to OR
- OR 1.65 / .69 2.39, or
- e.869 2.39, labeled Exp(B)
- Table 2.4
33OR to Percentage Change
- change 100(OR 1)
- e.g., A one-unit increase in the independent
variable increases the odds of continuing by
139.00 - 100(2.39 1) 139.00
- e.g., A one-unit increase in the independent
variable decreases the odds of continuing by
50.00 - 100(.50 1) -50.00
34Logits to Probabilities
- One-parent families, L(Continue) -.372
- Two-parent families, L(Continue) .497
35Question Answer
- Are two-parent families more likely to continue
fostering than one-parent families? - Yes. The odds of continuing are 2.39 times (139)
higher for two-parent compared to one-parent
families. The probability of continuing is .41
for one-parent families and .62 for two-parent
families.
36Single (Quantitative) IV Example
- DV continue fostering, 0 no, 1 yes
- Customary to code category of interest 1 and
other category 0 - IV number of resources
- N 131 foster families
- Are foster families with more resources more
likely to continue fostering?
37Statistical Significance
- Table 2.5
- Relationship between resources and continuation
is statistically significant (Wald ?2 4.924, p
.026) - H0 ? 0, ? ? 0, ? 0, same as
- H0 OR 1, OR ? 1, OR 1
- Likelihood ratio ?2 better than Wald
38Direction/Strength of Relationship
- Positive relationship between resources and
continuation - Families with more resources are more likely to
continue - B .212
- Exp(B) OR 1.237
- change 100(1.237 1) 24
- The odds of continuing are 1.24 times (24)
higher for each additional resource
39Estimated Logits
- L(Continue) -1.227 (.212)(X)
40Figures
41Effect of Resources on Continuation (Logits)
42Effect of Resources on Continuation (Odds)
43Effect of Resources on Continuation
(Probabilities)
44Question Answer
- Are foster families with more resources more
likely to continue fostering? - Yes. The odds of continuing are 1.24 times (24)
higher for each additional resource. The
probability of continuing is .31 for families
with two resources, .51 for families with 6
resources, and .71 for families with 10 resources.
45Relationship of Linear Predictor to Logits, Odds
p
- Relationship between linear predictor and logits
is linear - Relationship between linear predictor and odds is
non-linear - Relationship between linear predictor and p is
non-linear - Challenge is to summarize changes in odds and
probabilities associated with changes in IVs in
the most meaningful and parsimonious way
46Logit as Function of Linear Predictor
47Odds as Function of Linear Predictor
48Probabilities as Function of Linear Predictor
49IVs to z-scores
- z-scores (standard scores)
- Only the IV (not DV)--semi-standardized slopes
- One-unit increase in the IV refers to a
one-standard-deviation increase - OR interpreted as expected change in the odds
associated with a one standard deviation increase
in the IV - Conversion to z-scores changes intercept, slope,
and OR, but not associated test statistics - Table 2.6 (compare to Table 2.5)
50Figures
51Effect of zResources on Continuation
(Probabilities)
52Question Answer
- Are foster families with more resources more
likely to continue fostering? - Yes. The odds of continuing are 1.51 times (51)
higher for each one standard deviation (1.93)
increase in resources. The probability of
continuing is .34 for families with resources two
standard deviations below the mean, .54 for
families with the mean number of resources
(6.60), and .73 for families with resources two
standard deviations above the mean.
53IVs Centered
- Centering
- Typically center on mean
- Useful when testing interactions, curvilinear
relationships, or when no meaningful 0 point
(e.g., no family with 0 resources) - Centering doesnt change slope, OR, or associated
test statistics, but does change the intercept - Table 2.7 (compare to Table 2.5)
54Figures
55Effect of cResources on Continuation
(Probabilities)
56Question Answer
- Are foster families with more resources more
likely to continue fostering? - Yes. The odds of continuing are 1.24 times (24)
higher for each additional resource. The
probability of continuing is .34 for families
with 4 resources below the mean, .54 for families
with the mean number of resources (6.60), and .74
for families with 4 resources above the mean.
57Multiple IV Example
- DV continue fostering, 0 no, 1 yes
- Customary to code the category of interest as 1
and the other category as 0 - IV married, 0 not married, 1 married
- IV number of resources (z-scores)
- N 131 foster families
- Are foster families with more resources more
likely to continue fostering, controlling for
marital status?
58Statistical Significance
- Table 2.12
- Relationship between set of IVs and continuation
is statistically significant (?2 6.58, p
.037) - H0 ?1 ?2 ?k 0, same as
- H0 ?1 ?2 ?k 1
- ? (psi) is symbol for population value of OR
59Statistical Significance (contd)
- Table 2.13
- Relationship between resources and continuation
is not statistically significant, controlling for
marital status (?2 .92, p .338) - Relationship between marital status and
continuation is not statistically significant,
controlling for resources (?2 1.42, p .234) - H0 ? 0, ? ? 0, ? 0, same as
- H0 ? 1, ? ? 1, ? 1
- ? (psi) is symbol for population value of OR
- Likelihood ratio ?2 better than Wald
60Statistical Significance (contd)
- Table 2.9
- Relationship between resources and continuation
is not statistically significant, controlling for
marital status (?2 .91, p .340) - Relationship between marital status and
continuation is not statistically significant,
controlling for resources (?2 1.41, p .235) - H0 ? 0, ? ? 0, ? 0, same as
- H0 ? 1, ? ? 1, ? 1
- ? (psi) is symbol for population value of OR
- Wald ?2, but likelihood ratio ?2 better
61Estimated Logits
- L(Continue) -.183 (.228)(XzResources)
(.570)(XMarried)
62ORs Percentage Change
- ORzResources 1.256 (ns)
- The odds of continuing are 1.26 times (26)
higher for each one standard deviation (1.93)
increase in resources, controlling for marital
status - ORMarried 1.769 (ns)
- The odds of continuing are 1.77 times (77)
higher for two-parent compared to one-parent
families, controlling for marital status
63Figures
64Effect of Resources and Marital Status on Plans
to Continue Fostering (Odds)
65Effect of Resources and Marital Status on Plans
to Continue Fostering (Probabilities)
66Presenting Odds and Probabilities in Tables
67Question Answer
- Are foster families with more resources more
likely to continue fostering, controlling for
marital status? - No (ns). The odds of continuing are 1.26 times
(26) higher for each one standard deviation
(1.93) increase in resources, controlling for
marital status. - Contd
68Question Answer (contd)
- For one-parent families the probability of
continuing is .35 for families with resources two
standard deviations below the mean, .45 for
families with the mean number of resources, and
.57 for families with resources two standard
deviations above the mean. For two-parent
families the probability of continuing is .48 for
families with resources two standard deviations
below the mean, .60 for families with the mean
number of resources, and .70 for families with
resources two standard deviations above the mean.
69Comparing the Relative Strength of IVs
- Size of slope and OR depend on how the IV is
measured - When IVs measured the same way (e.g., two
dichotomous IVs or two continuous IVs transformed
to z-scores) relative strength can be compared - Nothing comparable to standardized slope (Beta)
70Nested Models
71Nested Models (contd)
- One regression model is nested within another if
it contains a subset of variables included in the
model within which its nested, and same cases
are analyzed in both models - The more complex model called the full model
- The nested model called the reduced model.
- Comparison of full and reduced models allows you
to examine whether one or more variable(s) in the
full model contribute to explanation of the DV
72Sequential Entry of IVs
- Used to compare full and reduced models
- e.g., family resources entered first, and then
marital status - Fchange used in linear regression
73Sequential Entry of IVs (contd)
- SPSS GZLM doesnt allow sequential of IVs
- Estimate models separately and compare omnibus
likelihood ratio ?2 values - Reduced model ?2(1) 5.168
- Full model ?2(2) 6.585
- ?2 difference 6.585 5.168 1.417
- df difference 2 1
- p .234
- Chi-square Difference.xls
74Assumptions Necessary for Testing Hypotheses
- No assumptions unique to binary logistic
regression other than ones discussed in GZLM
lecture
75Model Evaluation
- Evaluate your model before you test hypotheses or
interpret substantive results - Outliers
- Analogs of R2
76Outliers
- Atypical cases
- Can lead to flawed conclusions
- Can provide theoretical insights
- Common causes
- Data entry errors
- Model misspecification
- Rare events
77Outliers (contd)
- Leverage
- Residuals
- Standardized or unstandardized deviance residuals
- Influence
- Cooks D
78Leverage
- Think of a seesaw
- Leverage value for each case
- Cases with greater leverage can exert a
disproportionately large influence - Leverage value for each case
- No clear benchmarks
- Identify cases with substantially different
leverage values than those of other cases
79Residuals
- Difference between actual and estimated values of
the DV for a case - Residual for each case
- Large residual indicates a case for which model
fits poorly
80Residuals (contd)
- Standardized or unstandardized deviance residuals
- Not normally distributed
- Values less than -2 or greater than 2 warrant
some concern - Values less than -3 or greater than 3 merit
close inspection
81Influence
- Cases whose deletion result in substantial
changes to regression coefficients - Cooks D for each case
- Approximate aggregate change in regression
parameters resulting from deletion of a case - Values of 1.0 or more indicate a problematic
degree of influence for an individual case
82Index Plot
- Scatterplot
- Horizontal axis (X)
- Case id
- Vertical axis (Y)
- Leverage values, or
- Residuals, or
- Cooks D
83Index Plot Leverage Values
84Index Plot Standardized Deviance Residuals
85Index Plot Cooks D
86Analogs of R2
- None in standard use and each may give different
results - Typically much smaller than R2 values in linear
regression - Difficult to interpret
87Multicollinearity
- SPSS GZLM doesnt compute multicollinearity
statistics - Use SPSS linear regression
- Problematic levels
- Tolerance lt .10 or
- VIF gt 10
88Additional Topics
- Polychotomous IVs
- Curvilinear relationships
- Interactions
89Overview of the Process
- Select IVs and decide whether to test curvilinear
relationships or interactions - Carefully screen and clean data
- Transform and code variables as needed
- Estimate regression model
- Examine assumptions necessary to estimate binary
regression model, examine model fit, and revise
model as needed
90Overview of the Process (contd)
- Test hypotheses about the overall model and
specific model parameters, such as ORs - Create tables and graphs to present results in
the most meaningful and parsimonious way - Interpret results of the estimated model in terms
of logits, probabilities, odds, and odds ratios,
as appropriate
91Additional Regression Models for Dichotomous DVs
- Binary probit regression
- Substantive results essentially indistinguishable
from binary logistic regression - Choice between this and binary logistic
regression largely one of convenience and
discipline-specific convention - Many researchers prefer binary logistic
regression because it provides odds ratios
whereas probit regression does not, and binary
logistic regression comes with a wider variety of
fit statistics
92Additional Regression Models for Dichotomous DVs
(contd)
- Complementary log-log (clog-log) and log-log
models - Probability of the event is very small or large
- Loglinear regression
- Limited to categorical IVs
- Discriminant analysis
- Limited to continuous IVs