Logistic regression - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Logistic regression

Description:

Independent variables interval scale variables (and possibly dummy variables) ... SPSS will recode categorical variable with 2 values, but better to do it yourself ... – PowerPoint PPT presentation

Number of Views:181
Avg rating:3.0/5.0
Slides: 41
Provided by: johnrott
Category:

less

Transcript and Presenter's Notes

Title: Logistic regression


1
Logistic regression
  • V506 Class 14
  • December 3, 2009

2
Overview
  • Categorical, binary dependent variable
  • Logistic regression model
  • Simple logistic regression
  • Multiple logistic regression
  • SPSS logistic regression

3
Regression with a categorical, binary dependent
variable
  • Dependent variable that takes on value of 0 or 1,
    e.g., failure or success, no or yes
  • Independent variables interval scale variables
    (and possibly dummy variables)

4
Regression model to predict probability of success
  • While dependent variables is 0 or 1
  • Regression model would predict probability of
    success p with values between 0 and 1

5
Linear probability model
  • Obvious possibility is to use traditional linear
    regression model
  • But this has problems
  • Distribution of dependent variable hardly normal
  • Predicted probabilities cannot be less than 0,
    greater than 1

6
Linear probability model predictions
7
Logistic regression model
  • Instead, use logistic transformation (logit) of
    probability, log of the odds

8
Logistic regression model predictions
9
Estimation of logistic regression model
  • Least-squares no longer best way of estimating
    parameters of logistic regression model
  • Instead use maximum likelihood estimation
  • Finds values of parameters that have greatest
    probability, given data

10
Logistic regression example
  • Analysis of space shuttle data to see if that
    might have provided evidence of risk in launching
    Challenger on its last flight in 1985
  • Based on
  • Tufte, Visual Explanations, 1997
  • Hamilton, Statistics with Stata, 2006

11
Space shuttle data
  • Data on 24 space shuttle launches prior to
    Challenger
  • Dependent variable, whether shuttle flight
    experienced thermal distress incident
  • Independent variables
  • Date whether shuttle changes or age has effect
  • Temperature whether joint temperature on
    booster has effect

12
First modeldate as single independent variable
  • Dependent variable
  • Any thermal distress on launch
  • Independent variable
  • Date (days since 1/1/60)
  • SPSS procedure
  • Regression, Binary logistic

13
Predicted probability of thermal distress using
date
14
Statistical significance of model
  • Chi-squared test calculated using -2 times log
    likelihood

15
Predictive power (fit) of modelpseudo R2s
  • No exact equivalent of R2 for linear regression
    model
  • Different estimates of pseudo R2 range from 0
    (no fit) to 1 (perfect prediction)

16
Predictive power (fit) of modelclassification
improvement
  • Classification tables without using date, using
    model, with date

17
Logistic regression coefficients and tests of
significance
  • Probability positively related to date
  • Logistic regression coefficient for date has
    p-value of 0.051
  • Wald test not as powerful as chi-squared test
    based upon -2 Log Likelihood

18
Interpreting logistic regression coefficient
  • Regression coefficient for date, B 0.002, is
    the change in the logit of the probability
    associated with a change of one day
  • Unfortunately, this does not have a very
    intuitive meaning
  • Instead, look at exponential of B, exp(B), which
    is the value a change in independent variable
    from 0 to 1 increases (or decreases) the odds

19
Exponential of B as change in odds
20
What does odds mean?
  • Odds is the ratio of probability of success to
    probability of failure
  • Like odds on horse races
  • Even odds, odds 1, implies probability equals
    0.5
  • Odds 2 means 2 to 1 in favor of success,
    implies probability of 0.667
  • Odds 0.5 means 1 to 2 in favor (or 2 to 1
    against) success, implies probability of 0.333

21
Interpreting exponential of B as change in odds
  • SPSS reports Exp(B) 1.002 for date
  • This means that the odds of a thermal incident
    are 1.002 higher for each day later in the
    program that the shuttle is launched
  • Doesnt sound like much, but 1.002365 2.074, so
    odds of thermal incident are twice as great for
    each year later that the shuttle is launched

22
Multiple logistic regression
  • Logistic regression can be extended to use
    multiple independent variables exactly like
    linear regression

23
Adding joint temperature to the logistic
regression model
  • Dependent variable
  • Any thermal distress on launch
  • Independent variables
  • Date (days since 1/1/60)
  • Joint temperature, degrees F

24
Overall model results
25
Classification improvement
  • Classification accuracy increased from 73.9
    percent to 78.3 percent over previous model

26
Logistic regression coefficients
27
Interpreting logistic regression coefficients
  • date has p-value of 0.030 and similar Exp(B)
  • Logistic regression coefficient for temp is
    negative
  • As temperature decreases, probability of thermal
    distress increases
  • p-value for temperature is 0.140
  • Exp(B) for temperature is 0.841
  • 20 degree decrease in temperature therefore
    implies 0.841-20 31.92 increase in odds of
    thermal distress

28
Issue of significance, error
  • p-value of 0.140 for temperature would not
    normally result in rejection of null hypothesis
    of no relationship of temperature to probability
    of failure, null hypothesis that B 0 for Joint
    temperature
  • But p-value is probability of Type I error, error
    that arises when rejecting null hypothesis when
    it null hypothesis is true

29
Type II error
  • Type II error is the inverse of Type I error
  • Type II error is the error that arises in not
    rejecting the null hypothesis when it is false
  • In this cases, Type II error is the error in
    failing to conclude that there is a relationship
    between joint temperature and thermal failures
    when there is a relationship

30
Type II error and Challenger
  • In this context, one is more concerned with Type
    II error, failing to conclude that there is a
    relationship of joint failure to temperature when
    that is true
  • Because failure to conclude this led to
    recommendation to launch Challenger under low
    temperature conditions

31
Logistic regression in SPSS
  • Data preparation
  • Dependent variable should have value of 1 or 0
  • SPSS will recode categorical variable with 2
    values, but better to do it yourself
  • Independent variables are interval, scale
    variables or dummy variables, as in linear
    regression

32
Logistic regression in SPSS
  • Analyze, Regression, Binary Logistic
  • Dependent, response variable entered into
    Dependent box
  • Independent variables (interval, scale variables
    and dummy variables) entered into Covariates box
  • Categorical button provides other options for
    handling categorical variables, but we wont deal
    with here

33
SPSS logistic regression output
34
SPSS logistic regression output
Block 0 Beginning Block
Classification accuracy without independent variab
les, model
35
SPSS logistic regression output
Block 1 Method Enter
p-value for significance of entire model
Pseudo-R2 values giving idea of fit of the model
36
SPSS logistic regression output
Classification accuracy for model Can be compared
with classification accuracy without model or for
other models
37
SPSS logistic regression output
p-values for hypothesis tests of whether
regression coefficients not equal to 0
Logistic regression coefficientsPositive,
probability increases as variable increases
negative, probability decreases as
variable increases
Exponential of regression coefficients effects
on odds greater than 1 means odds increase as
variable increases less than 1 means odds
decrease as variable increases
38
Predicting urban trail use in Indianapolis
  • John R. Ottensmann and Greg Lindsey. A use-based
    measure of accessibility to linear features to
    predict urban trail use. Journal of Transport and
    Land Use 1, 1 (2008) 41-63.
  • Survey of residents of Marion County
  • Questions on whether they used any of the
    greenway trails in previous month
  • Logistic regression to predict use, using
  • Individual characteristics from survey
  • Distance to nearest trail and more complex
    measures of trail accessibility

39
Logistic regressions for use of trails (odds
ratios robust standard errors in parentheses)
40
Measure of use-based accessibility to linear
features
  • Sorry, I couldnt help myself!
Write a Comment
User Comments (0)
About PowerShow.com