Maximum Entropy modelling of species geographic distributions - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

Maximum Entropy modelling of species geographic distributions

Description:

Vireo. 3. Estimating a probability distribution. Given: Map divided into n cells. m localities are samples from an unknown distribution ... – PowerPoint PPT presentation

Number of Views:863

Avg rating:3.0/5.0

Slides: 20

Provided by: AT148

Category:

more less

Transcript and Presenter's Notes

Title: Maximum Entropy modelling of species geographic distributions

1
Maximum Entropy modelling of species geographic
distributions

Steven Phillips
with Miro Dudik Rob Schapire
October 28, 2006

2
Modeling species distributions
Yellow-throated Vireo
occurrence localities

environmental variables
3
Estimating a probability distribution

Given
Map divided into n cells
m localities are samples from an unknown
distribution
Our task is to estimate the unknown
distribution
Note
The distribution sums to 1 over the whole map
Most probability values will be very small
Different from estimating probability of presence

4
The Maximum Entropy Method

Origins Jaynes 1957, statistical mechanics
Recent use machine learning, eg. automatic
language translation
To estimate an unknown distribution
Determine what you know (constraints)
Among distributions satisfying constraints
Output the one with maximum entropy

5
Entropy
Entropy amount of choice involved in the
selection of an event Higher entropy less
constrained Maximum entropy most spread out,
closest to uniform 2nd law of thermodynamics
without external influences, a system will move
towards maximum entropy Aoki, 1989 Schneider
Kay 1994 other uses of entropy in ecology
6
Using Maxent for Species Distributions

Features
Constraints
Regularization

7
Features impose constraints
Feature environmental variable, or function
thereof
find distribution p of maximum entropy such
that for all features f Epf sample average
of f
8
Features

Environmental variables or functions thereof.
Maxent has these classes of features (others are
possible)
Linear variable itself
Quadratic square of variable
Product product of two variables
Binary (categorical) membership in a category
Threshold
Hinge

1
0
Environmental variable
1
0
Environmental variable
9
Constraints
Each feature type imposes constraints on output
distribution Linear features mean Quadratic
features variance Product features
covariance Threshold features proportion
above threshold Hinge features mean above
threshold Binary features (categorical)
proportion in each category
10
Regularization
precipitation
sample average
true mean
temperature
11
The Maxent distribution
Solution is always a Gibbs distribution
q?(x) exp(Sj ?jfj(x)) / Z
Z is a scaling factor so distribution sums to
1 Maxent is a maximum likelihood
method Entropy is maximized by q? that
maximizes probability of samples
12
Maximizing gain
Unregularized gain Assume m samples, c cells
UGain(q?) 1/m Si ln(q?(xi)) -
ln(1/c) E.g. if UGain1.5, then average training
sample is exp(1.5) (about 4.5) times more likely
than a random background pixel
Maxent maximizes regularized gain Gain(q?)
UGain(q?) - Sj ßj?j Similar to Akaike
Information Criterion (AIC)
13
Maxent algorithms
Goal maximize the regularized gain Algorithm
Start with uniform distribution
(gain0) Iteratively update ? to increase the
gain

The gain is convex
Variety of algorithms gradient descent,
conjugate gradient, Newton, iterative scaling
Our algorithm coordinate descent

14
Response curves

Show how log of probability depends on each
variable
Simple features ? simpler model
Easier interpretation
Complex features ? complex model
Better fit to data
Linear quadratic (top)
Threshold features (middle)
All feature types (bottom)

15
Interpretation of regularization
16
Performance guarantees
Solution SOL returned by relaxed maxent is almost
as good as the best q?

Guarantees should depend on
number of samples m
number of features n (or complexity of
features)
complexity of the best q?

17
Performance guarantees
(Dudík-Phillips-Schapire 04)
If true mean lies in confidence region then for
all q
18
The NCEAS comparison

Elith et al. 2006, Novel methods improve
prediction of species distributions from
occurrence data, Ecography
16 methods for predicting species distributions
from occurrence data
226 diverse species from 6 regions
Birds, mammals, reptiles, frogs, plants
2 5822 occurrence records per species
11 13 predictor variables
Independent presence/absence test data

19
Average performance on 226 species

Write a Comment

User Comments (0)