Bayesian Statistics: A Biologist

About This Presentation

Title:

Bayesian Statistics: A Biologist

Description:

Title: PowerPoint Presentation Author: PC Manager Last modified by: PC Manager Created Date: 4/21/2003 2:01:13 AM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:87

Avg rating:3.0/5.0

Slides: 20

Provided by: PCM90

Learn more at: https://homepage.cs.uri.edu

Category:

more less

Transcript and Presenter's Notes

Title: Bayesian Statistics: A Biologist

1
Bayesian Statistics A Biologists Interpretation
Marguerite Pelletier URI Natural Resources
Science / U.S. EPA
2
How have Bayesian Methods been used?

Federal allocation of money Bayesian analysis
of population characteristics such as poverty
in small geographic areas
Microsoft Windows Office Assistant Bayesian
artificial intelligence algorithm
It has been suggested that Bayesian statistics
be used in environmental science because it
addresses questions about the probability of
events occurring, which allows better
decision-making

3
Bayesian Statistics vs. Frequentist Statistics

Frequentist (Traditional) Statistics
Assumes a fixed, true value for parameter of
interest (e.g., mean, std dev)
Expected value average value obtained by
random sampling repeated ad infinitum
Can only reject the null hypothesis (Ho), not
support the alternative hypothesis (Ha)
p-values indicate statistical rareness
Large sample sizes make rejection of Ho more
likely
Confidence intervals generated shows
confidence about value of parameter, not how
likely that parameter is in real life

4
Bayesian Statistics vs. Frequentist Statistics,
cont.

Bayesian Statistics
Assumes parameter of interest (e.g., mean, std
dev) variable and based on the data
Can test the probability of the alternate
hypothesis (Ha) or hypotheses given the data
(which is what most scientists really care
about)
Generates probability for any hypothesis being
true
Sample sizes taken into account large sample
size alone wont cause acceptance of the
hypothesis
Creates credible intervals rather than
confidence intervals tells how likely the
answer is in the real world

5
How do Bayesian Statistics Work?
Posterior probability Fishers Likelihood
function Prior probability
Expected
likelihood function Likelihood function Given
data, with a known (or predicted) distribution
(i.e., Normal, Poisson), a likelihood
function (probability distribution) can be
calculated Prior probability based on
existing data or a subjective indication of
what the investigator believes to be
true Expected likelihood function marginal
distribution of data given hyperparameter
takes sample size into account
Bayes Rule Posterior ? Likelihood Priors
6
Problems with Bayesian Statistics

Computationally intense (integration of complex
functions) Howeverbetter computers and
development of Markov Chain Monte Carlo methods
made techniques more accessible
Not directly applicable for many complex
statistical analyses Can be used for certain
regression techniques and to generate posterior
distn given a prior. Attempts to utilize it in
clustering unsuccessful
Not readily available in most common
statistical software (SPSS, SAS)
Not applicable to very rare events priors
dominate the function so the posterior
doesnt change implies that further study is
not needed/useful

7
So When are Bayesian Statistics Useful?

When limited data available formalizes the
use of Best Professional Judgment (Case
Study 1)
When Bayesian algorithms have been developed
for a statistic e.g., regression (Case
Study 2)
After using more traditional statistical
methods develop a probability
distribution (Case Study 3)
When the answer is a single number rather than
a complex function (e.g., simple calculation
not complex multivariate analysis)

8
Case Study 1 Development of a Bayesian
Probability Network in the Neuse River Estuary,
N.C.
(Borsuk ME, Stow CA, Reckhow KH 2003. An
integrated approach to TMDL development for the
Neuse River estuary using a Bayesian probability
network. Journal of Water Resources Planning and
Management, accepted)
9
Summary of Project

Neuse River estuary impaired due to nitrogen
(eutrophication problems), requiring a Total
Maximum Daily Load (TMDL) to be developed
For development of a TMDL, links must be
developed between pollutant load ( N ), and
water quality impairment
Because of the range of endpoints and the need
to determine probability of impact, a
Bayesian Network was developed
Data for the model came from routine water
quality monitoring and from elicited judgment
of scientific experts

10
River N
River Flow
Algal Density
Pfisteria abundance
CarbonProduction
WaterTemperature
Bayesian Network
Sediment OxygenDemand
System variable
Node or Submodel
Oxygen Concentration
Duration of Stratification
Association
ShellfishAbundance
Days ofHypoxia
Frequency of Cross-Channel Winds
Frequency of Fish Kills
Fish Population Health
11
Use of Bayesian Network (focus on Fish Kills)

Fish kills low bottom D.O. cross-channel
winds (force bottom water fish to shores)
fish health (influences susceptibility)
Two expert fisheries biologists asked about the
likelihood of fish kill given certain
conditions (various wind/hypoxia/fish health
scenarios)
All probabilistic relationships (including fish
kill info) incorporated into Bayesian
network.
Four nitrogen reduction scenarios assessed 0,
15, 30, 45 and 60 (relative to 1991-1995
baseline) using Latin Hypercube sampling
As N inputs decreased, mean chl and exceedance
frequency also reduced.
Fish kills dont change substantially with N
reduction fish kills relatively rare,
effect of reduced C production is damped out
further along the causal chain

12
Case Study 2 Assessing Spatial Population
Viability Models using Bayesian Statistics
(Mac Nally R, Fleishman E, Fay JP, Murphy DD
2003. Modeling butterfly species richness using
mesoscale environmental variables model
construction and validation for the mountain
ranges in the Great Basin of western North
America. Biological Conservation 11021-31.
13
Summary of Project

Species richness ? local environmental
variables
Over large scales these variables hard to
collect
This study (14) environmental variables from
GIS and remote sensing used to predict
butterfly species richness
Poisson regression used to develop appropriate
models from the 28 variables (IV IV2)
Schwartz Information Criteria used for selection
Appropriate variables then used in Bayesian
Poisson model
Model output validated against additional field
data

14
Bayesian Poisson Regression
log ?i ? ? ?kXik ? Yi Poisson (
?i )
where ?i mean (unobservable, true) spp
richness at site i ?, ?k regression
coefficients non-informative priors ? model
error Yi observed spp richness

Markov Chain-Monte Carlo algorithm 1000
iteration burn-in, 3000 iterations to
generate parameter estimates and mean spp
richness estimates
New model run using validation data and
regression-coefficient distn from the 1st
model
Model worked well for same mountain range, but
not for new range

15
Case Study 3 Assessing Spatial Population
Viability Models using Bayesian Statistics
(McCarthy MA, Lindenmayer DB, Possingham HP 2001.
Assessing spatial PVA models of arboreal
marsupials using significance tests and Bayesian
statistics. Biological Conservation 98191-200.
16
Summary of Project

Population Viability Analysis used in
Conservation Biology to assess potential for
species extinction
Many models based on limited data assessed
via significance tests or Bayesian methods
Metapopulation models (for 4 arboreal
marsupials) were developed
2 competing null models also developed
No effect of fragmentation
No dispersal between patches
Models were compared using likelihood and
Bayesian methods

17
Model Comparison

Predicted presence in patches was compared to
observed presence using logistic
regression ln (o/(1 o)) ? ?ln(p/(1 -
p)) where o observed presence p predicted
presence ?, ? regression coefficients
Significant differences between predicted and
observed if ? significantly different from 0
or ? significantly different from 1
Models compared using log-likelihood models
with higher log-likelihood values (closer to
0) more closely match data
Bayesian posterior probabilities used to
compare models higher probabilities more
closely match data prior all 3 models equally
plausible Probability of Model likelihood of
model / sum of all likelihoods

18
Conclusions

Comparison with actual data
Full model best for greater glider,
yellow-bellied glider
No fragmentation model best for mountain
brushtail possum, ringtail possum (but
predicted values ½ observed values)
Log-likelihood values
Confirm no fragmentation model best for 2
possum spp
Confimed full model best for the greater
glider
Yellow bellied glider equally represented by
full model and no dispersal model
Bayesian statistics confirmed log-likelihood
results
Authors indicated that significance tests
useful to assess model accuracy Bayesian
methods useful for comparing models but
computationally intense

19
(No Transcript)

Write a Comment

User Comments (0)

About PowerShow.com

Bayesian Statistics: A Biologist - PowerPoint PPT Presentation

Bayesian Statistics: A Biologist

Title: PowerPoint Presentation Author: PC Manager Last modified by: PC Manager Created Date: 4/21/2003 2:01:13 AM Document presentation format: On-screen Show – PowerPoint PPT presentation