Logistic Regression - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

Logistic Regression

Description:

... Vegetation: vegetation computed by rotation period. X4 ... Finally, wildfires are influenced by neighborhood conditions. Testing Statistical Signficance ... – PowerPoint PPT presentation

Number of Views:94

Avg rating:3.0/5.0

Slides: 32

Provided by: AJL2

Category:

Tags: by | logistic | regression

more less

Transcript and Presenter's Notes

Title: Logistic Regression

1
Logistic Regression

Often, the spatial phenomenon under investigation
can only be described by a categorical variable.
Wild fires typically depicted with polygons
showing burned vs. not burned
Or, bird distribution indicating presence or
absence of birds
Previous regression technique is not suitable
because the dependent variable is neither
interval or ratio
Logistic regression treats the distribution in a
probabilistic manner, that is, the occurrence of
the study phenomenon is evaluated in terms of
probability

2
Logistic Regression

If the probability of presence of a phenomenon is
Pa, then Pb represents the absence of the
phenomenon and
Pa Pb 1
Ua b0 b1X1 b2X2 bnXn e
Ua is the utility function of event a expressed
as a linear combination of a number of
explanatory variables X1, X2, .., and bn is the
estimated parameter of variable Xn

3
Logistic Regression

A greater value of Ua implies a greater
probability for the event to take place. When Ua
approaches infinity, Pa approaches 1, indicating
a high likelihood for the event to occur. When
Ua approaches negative infinity, Pa approaches 0.
When Ua equals zero, the probability is .50,
implying a 50/50 chance for the event to occur.

4
Logistic Regression Example

Example from Chou
Fires in San Jacinto Ranger District of the San
Bernardino National Forest were examined to map
the distribution of fire occurrence probability.
The basic model consisted of eight independent
variables
Area, perimeter, vegetation, proximity to
buildings, proximity to campgrounds, proximity to
roads, maximum temperature in July, and annual
precipitation

5
Variables in Fire Distribution Study

X1 Area area of geographic unitX2 Perimeter
perimeter of geographic unitX3 Vegetation
vegetation computed by rotation
periodX4 Building proximity to
structuresX5 Campground proximity to
campgroundsX6 Road proximity to
roadsX7 Temperature maximum temperature in
JulyX8 Precipitation annual precipitation
Dependent variable is a code indicating whether
or not a geographic unit is burned or not. Area
and perimeter provide general geometric
characteristics. Vegetation, precipitation, and
temperature represent environmental factors,
while building, campground, and road represent
human-related factors

6
Results of Logistic Regression

The model indicatesthat perimeter, vegetation,
campground, road, and temperature are variables
to be included in the model. Other variables are
not included as they are not statistically
different from 0

7
Results of Logistic Regression

Percentage-correctly-estimated (PCE) index shows
the maximum level of estimation accuracy of a
model.
In this example, PCE is 60, not much better than
a random 50/50 chance.
Therefore, another parameter was evaluated

8
Alternative Model

Included an additional variable to determine
whether it makes any significant difference in
model performance
New variable represents neighborhood effects, or
conditions of the surrounding geographic units
Assumes that fire occurrence probability is not
only affected by the environmental and
human-related variables listed in the basic
model, but by the distribution of fire occurrence
probability of adjacent units
The new spatial term X9 is defined by the
percentage of neighboring units that were burned
during the study period

9
New Results

Results from the new study are quite different
Only two variables are statistically significant
vegetation and neighborhood effects
Vegetation appears to be the determining
environmental variable in the distribution of
wildfires in the study area
Finally, wildfires are influenced by neighborhood
conditions

10
Testing Statistical Signficance

Did the neighborhood effects significantly change
the model? Need to test the chi-square test of
likelihood ratio
Where L0 denotes the likelihood of the basic
model and L1 denotes the likelihood of the study
model
Statistical testing suggests that the
neighborhood variable significantly improved the
performance of the model

11
Procedure for Regression Analysis (Barber, p. 448)

Specify the variables in the model and the exact
form of the relationship between them
Collect data
Estimate the parameters of the model
Statistically test the utility of the developed
model, and check whether the assumptions of the
simple linear regression model are satisfied
Use the model for prediction

12
Example of Data Manipulation and Programming in
ArcView

Manipulating Yield Data with DataManipulation.ave

13
Spatial Prediction of Landslide Hazard Using
Logistic Regression and GIS

Art Lembo
620 Presentation
Based on paper by Gorsevski, Gessler, and Folz

14
Introduction

Landslides are natural geologic processes that
cause different types of damage, causing billions
of dollars in damage and thousands of deaths each
year
95 of landslides occur in developing countries

15
Causes of Landslides

Human activities, such as deforestation and urban
expansion, accelerate the process of landslides
Roads and harvest activities in timberlands
increase the occurrence of landslides
In undisturbed forest, soil erosion is generally
negligible

16
Clearwater National Forest

1995-1996
Major landslides occurred during the winter
following heavy rains, snowmelt, and high river
flow
Over 900 landslides were recorded on the unstable
slopes of the forest
Landslide occurrence was widely distributed and
included artificial slopes such as road cuts and
fills, or natural slopes in clearcut areas

17
(No Transcript)
18
Landslide Data

Within the large remote area, a DEM was used to
generate quantitative topographic attributes
Slope, elevation, aspect, profile, curvature,
tangent curvature, plan curvature, flow path, and
contributing area
Photo interpretation and field inventory
identified landslide areas

19
(No Transcript)
20
Considerations in Creating Hazard Models

Datasets combined and stored in a GIS database
Hazard Model assumptions
Strength of a model depends on the quality of the
data collected
Data driven models are not appropriate to
extrapolate to neighboring areas
Climatic conditions may change so that the past
is not an indicator of the future
Uncertainty exists when a hazard map is derived
from a statistically based model

21
Models Used in Study

Logistic regression was used, which correlated
the environmental attributes and landslide
distribution
Because of the existence of uncertainty, a
Receiver-Operating Curve curve plots the
proportion of false positives against the true
positives at each level of the criterion

22
Assessing Landslide Hazard

Field inspection using a check list to identify
sites susceptible to landsliding
Projection of future patterns of instability from
analysis of landslide inventories
Multivariate analysis of factors characterizing
observed sites of slope instability
Stability ranking based on criteria such as
slope, land forms, or geologic structure
Failure probability analysis based on slope
stability models with stochastic hydrologic
simulation

23
Preparing the Data

Primary and secondary attributes are derived from
a DEM, reducing the high cost of collecting the
data (30m)
Landslides assessed through aerial reconnaissance
Landslide hazard area are then identified based
on spatial correlation between the attributes
Identifying landslide hazard is based on spatial
correlation between the attributes derived from
the DEM
ROC curves used for decision making

24
Data Sampling

15 of non-landslide cells were randomly sampled
for an absence of landslides
Multivariate subset was derived from the
coverages where landslides were absent
The landslide coverage was a point data set
sampled grid cells where landslides were present
Both samples were joined together where the
dependent variable had a binary response (present
or absent)
Final output stored in ASCII and used in SAS

25
Statistical Analysis

Normal plot of data to determine if the data
followed a normal distribution
Plot showed that data points do not fall along a
straight line. The data is not multivariate
normal
Logistic regression is used when the predictor
variables are not normally distributed, and
some predictor variables are categorical
Factor analysis was applied to determine the
number of underlying variables
Only significantly loaded variables were
considered

26
Statistical Analysis

The form of the logistic regression model is
defined as
Where x is the data vector for a randomly
selected experimental unit and y is the value of
the binary outcome variable. Maximum likelihood
was used to estimate B for the predictive
equation
Variables not significant at the .1 level were
eliminated

27
Logit Results

Logit showed that the most important variables
contributing to the slope instability were Flow
Path and mean slope of upland area
log (p/(1-p)) (-2.2642 FACTOR8 0.4969
FLPATH 0.6039) or p exp (-2.2642
FACTOR8 0.4969 FLPATH 0.6039)/(1
exp(-2.2642 FACTOR8 0.4969 FLPATH
0.6039)__________________________________________
____________________p probability of
landslide hazard FACTOR8 factor with
underlying characteristics of aspectFLPATH
Maximum distance of water to the point in the
catchment

28
(No Transcript)
29
Logit Results

Coefficients of Logit model included positive
coefficients. Therefore, higher scores would
increase the probability of landslide hazard.
Logit model assumes a nonlinear relationship
between the probability and the explanatory
variables
Hazard map based on ROC curve technique groups
the hazard into two classes Low Hazard and High
Hazard, showing five classes of probabilities of
landslide hazard

30
Final Results

59.1 of the landslides and 69.8 of non
landslides were correctly determined
Model can be applied to large geographic areas
ROC curves are incorporated as a sophisticated
tool for decision makers for the spatial
prediction of landslide hazard

31
a) Cut-off based on ROC curve technique b)
Probability of landslide hazard

Write a Comment

User Comments (0)