Maximum Entropy modelling of species geographic distributions - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Maximum Entropy modelling of species geographic distributions

Description:

Vireo. 3. Estimating a probability distribution. Given: Map divided into n cells. m localities are samples from an unknown distribution ... – PowerPoint PPT presentation

Number of Views:858
Avg rating:3.0/5.0
Slides: 20
Provided by: AT148
Category:

less

Transcript and Presenter's Notes

Title: Maximum Entropy modelling of species geographic distributions


1
Maximum Entropy modelling of species geographic
distributions
  • Steven Phillips
  • with Miro Dudik Rob Schapire
  • October 28, 2006

2
Modeling species distributions
Yellow-throated Vireo
occurrence localities

environmental variables
3
Estimating a probability distribution
  • Given
  • Map divided into n cells
  • m localities are samples from an unknown
    distribution
  • Our task is to estimate the unknown
    distribution
  • Note
  • The distribution sums to 1 over the whole map
  • Most probability values will be very small
  • Different from estimating probability of presence

4
The Maximum Entropy Method
  • Origins Jaynes 1957, statistical mechanics
  • Recent use machine learning, eg. automatic
    language translation
  • To estimate an unknown distribution
  • Determine what you know (constraints)
  • Among distributions satisfying constraints
  • Output the one with maximum entropy

5
Entropy
Entropy amount of choice involved in the
selection of an event Higher entropy less
constrained Maximum entropy most spread out,
closest to uniform 2nd law of thermodynamics
without external influences, a system will move
towards maximum entropy Aoki, 1989 Schneider
Kay 1994 other uses of entropy in ecology
6
Using Maxent for Species Distributions
  • Features
  • Constraints
  • Regularization

7
Features impose constraints
Feature environmental variable, or function
thereof
find distribution p of maximum entropy such
that for all features f Epf sample average
of f
8
Features
  • Environmental variables or functions thereof.
  • Maxent has these classes of features (others are
    possible)
  • Linear variable itself
  • Quadratic square of variable
  • Product product of two variables
  • Binary (categorical) membership in a category
  • Threshold
  • Hinge

1
0
Environmental variable
1
0
Environmental variable
9
Constraints
Each feature type imposes constraints on output
distribution Linear features mean Quadratic
features variance Product features
covariance Threshold features proportion
above threshold Hinge features mean above
threshold Binary features (categorical)
proportion in each category
10
Regularization
precipitation
sample average
true mean
temperature
11
The Maxent distribution
Solution is always a Gibbs distribution
q?(x) exp(Sj ?jfj(x)) / Z
Z is a scaling factor so distribution sums to
1 Maxent is a maximum likelihood
method Entropy is maximized by q? that
maximizes probability of samples
12
Maximizing gain
Unregularized gain Assume m samples, c cells
UGain(q?) 1/m Si ln(q?(xi)) -
ln(1/c) E.g. if UGain1.5, then average training
sample is exp(1.5) (about 4.5) times more likely
than a random background pixel
Maxent maximizes regularized gain Gain(q?)
UGain(q?) - Sj ßj?j Similar to Akaike
Information Criterion (AIC)
13
Maxent algorithms
Goal maximize the regularized gain Algorithm
Start with uniform distribution
(gain0) Iteratively update ? to increase the
gain
  • The gain is convex
  • Variety of algorithms gradient descent,
    conjugate gradient, Newton, iterative scaling
  • Our algorithm coordinate descent

14
Response curves
  • Show how log of probability depends on each
    variable
  • Simple features ? simpler model
  • Easier interpretation
  • Complex features ? complex model
  • Better fit to data
  • Linear quadratic (top)
  • Threshold features (middle)
  • All feature types (bottom)

15
Interpretation of regularization
16
Performance guarantees
Solution SOL returned by relaxed maxent is almost
as good as the best q?
  • Guarantees should depend on
  • number of samples m
  • number of features n (or complexity of
    features)
  • complexity of the best q?

17
Performance guarantees
(Dudík-Phillips-Schapire 04)
If true mean lies in confidence region then for
all q
18
The NCEAS comparison
  • Elith et al. 2006, Novel methods improve
    prediction of species distributions from
    occurrence data, Ecography
  • 16 methods for predicting species distributions
    from occurrence data
  • 226 diverse species from 6 regions
  • Birds, mammals, reptiles, frogs, plants
  • 2 5822 occurrence records per species
  • 11 13 predictor variables
  • Independent presence/absence test data

19
Average performance on 226 species
Write a Comment
User Comments (0)
About PowerShow.com