Title: Statistical Inference in Wildlife Science
1Statistical Inference in Wildlife Science
- Goals
- Concerns with Nulls
- Better Approaches?
- Information-theoretic
- Metareplication
- Data dredging
- Important References
2Goal of Wildlife Research
- Gain reliable knowledge (Romesburg 1981)
- Hypothetico-deductive approach is preferred
Research Hypothesis
Predictions
Observed Facts
Reliable Knowledge
Experiment
Induction
Retroduction
Reject
Test of Statistical (Ho) Hypothesis
Fail to Reject
Modify Research Hypothesis
Dogmatic Laws
3Research vs. Statistical Hypotheses
- H-D method includes research and statistical
hypotheses - Research Hypothesis
- Conjecture about a process (how nature works)
based on theory (retroduction) - Statistical Hypothesis
- Conjecture about a class of facts associated with
the process (induction) local questions about a
single population or system
4Statisticians Have Long Debated Hypothesis Testing
- Relative use of research vs. statistical
hypotheses brought a long-standing debate in the
world of statistics to wildlife science - We are overly concerned with testing statistical
hypotheses and not concerned enough with rigorous
development of, and sorting among, research
hypotheses (Anderson et al. 2000)
5Why are Statisticians Concerned with Our
Over-reliance on Statistical Hypothesis Testing?
- Null hypotheses are viewed incorrectly
- Trivial to say there is no difference
- Focus is on rejecting Ho rather than
investigating the size and precision of a
treatment effect - Alpha is arbitrary
- Often only the P-value is reported (naked
P-value) - P-value is not based on data collected, but on
that not collected (probability of an observation
at least as extreme as observed, given Ho) - P-value depends on N, hence rejection is certain
given enough data - P-value does NOT indicate strength of Ha, but
rather degree of consistency (or inconsistency)
with Ho
(Cherry 1998 Johnson 1999, 2002 Anderson et al.
2000 Guthery et al. 2001)
6A Better Approach?
- Focus on estimating effect size and providing a
measure of its precision - Confidence Intervals do this
- Rely on SE not SD, which measures variation
observed in sample, not precision of estimate
7Focus on Getting a Good Set of Biologically
Reasonable Hypotheses
- Embrace the concept of multiple working research
hypotheses (Chamberlin 1890) rather than the
single Ha vs. single Ho - Can protect research from personal bias as
researchers no longer have a single favorite
hypothesis they work to confirm (Guthery et al.
2001) - Formulate each hypothesis as a mathematical model
- Requires close collaboration with statistician to
make sure full complexity of biological
hypotheses can be represented (non-linearity,
etc) - Sort among multiple hypotheses using
information-theoretic approach (Akaike 1973,
1974 Anderson et al. 2000 Burnham and Anderson
1998 Anderson and Burnham 2002 Anderson et al.
2001)
8Sorting Among Research Hypotheses
- Akaikes Information Criterion (AIC)
AIC
Bias
Best Model
Amount
Unexplained Variance
Number of Parameters (k)
(number of parameters)
(goodness of fit)
9Rank Models
- Calculate AIC and rank hypotheses (models) from
best (min AIC) to worse - Single AIC not a useful value, it is relative
value that is important - Akaike weights (wi) quantify the weight of
evidence in favor of a model (evidence that model
is best in defined set sum of wi 1) - Rules of Thumb
- Wi gt0.9 indicates a single, superior model
- Relative importance of model can be indicated by
change in AIC (AICi AICmin). If Change in AIC
for a model is lt10, it should be considered
supported by the data. - Model averaging is a powerful way to estimate
parameters and their precision - Average parameter value is weighted average
(using wi) of parameters (Øaverage sum wiØi )
10Some Issues with I-T Approach (Guthery et al.
2001 Robinson and Wainer 2002)
- Method is parametric
- Requires assumptions about distributions to be
met - Definition of research hypotheses defines
conclusions - One of those in the group of alternative models
will have the minimum AIC - Need to make sure no trivial hypotheses are in
set - Hypotheses should reflect plausible, but
different, ways that nature works (i.e., be true
research hypotheses, not statistical hypotheses) - Null effects are not necessarily trivial, must be
modeled if there is good reason - Frequentist statistics are appropriate for
analysis of well designed experiments
11Better Use of P-value(Fisher 1925 Robinson and
Wainer 2002)
- IF you use frequentist approach, then
- Follow Fishers lead and use p-values to screen
for potentially real or useful associations that
have merit for future investigation, rather than
using them to identify end points (significant
findings to draw conclusions from) of an
investigation
12Better Use of P-value(Fisher 1925 Robinson and
Wainer 2002)
- IF you use frequentist approach, then
- Report actual p-value and effect size plus
measure of precision - Do not make reject / fail-to-reject decisions
- If Plt0.05, report evidence of effect and look to
confirm with other studies - If 0.2gtPgt0.05, report evidence exists for further
testing of hypothesis with improved design
(replication). State result leans in a certain
direction. - If Pgt0.2, report that if there is an effect, it
is too small to detect with the current
experimental design - If you are doing a 1-time experiment, then a
should be reduced well below 0.05 - Do not interpret P as the probability of Ho given
the data, it is the probability of the data,
given a true Ho - If you want to discuss likelihood of a
hypothesis, then I-T or Bayesian approaches are
more appropriate
13Metareplication (Johnson 2002)
- This approach gets away from individual P-values
by focusing on making inference in the context of
prior related findings - A Bayesian approach following Fishers lead
- Search for multiple studies to point in a common
direction rather than a single definitive study
with low p-value to show the direction - Replication of studies (metareplication) is the
key - Exploit value of small studies, each of which may
not be able to make a definitive conclusion - Truth lies at the intersection of independent
lies (Levins 1966) - Although independent studies each may suffer from
various shortcomings (small n, etc.), if they
paint substantially similar pictures, we have
confidence in what we see
14Making Management Recommendations
- Place less emphasis on the significant finding of
an individual study - Use estimates of effect size and precision from
individual studies in meta-analysis to determine
consistent effects before making management
recommendations - Look for truly replicated studies with consistent
findings - Different methods, different locations, different
observers
15Dredging Data Along the Way
- Dredging data is not bad, it is the creative
process, but analyzing dredged data with
traditional statistical methods is a violation of
assumptions - Surprising findings should be heralded and used
to stimulate new hypotheses and experiments - Put dredged findings in Discussion not Results
- Admit it when you dredge
- Use dredging to screen for possible effects to be
considered in future studies
16- Any single study can yield a p-value, but only
consistency among replicated studies will advance
our science (Johnson 2002)
17If YOU Are Doing Research, YOU MUST Read
- Anderson, DR, Link, WA, Johnson, DH, and KP
Burnham. 2001. Suggestions for presenting the
results of data analysis. J. Wildlife Manage.
65373-378. - Anderson, DR, Burnham, KP, and WL Thompson. 2000.
Null hypothesis testing problems, prevalence,
and an alternative. J. Wildlife Manage.
64912-923. - Burnham, KP and DR Anderson. 1998. Model
selection and inference a practical
information-theoretic approach. Springer-Verlag,
New York. - Chamberlin, TC. 1890. The method of multiple
working hypotheses. Science 148754-759
(reprinted there) - Cherry, S. 1998. Statistical tests in
publications of The Wildlife Society. Wildlife
Society Bulletin 26947-953. - Johnson, DH. 1999. The insignificance of
statistical significance testing. J. Wildlife
Management 63763-772. - Robinson, DH and H. Wainer. 2002. On the past and
future of null hypothesis significance testing.
J. Wildlife Management 66263-271. - Fisher, R.A. 1925. Theory of statistical
estimation. Proceedings of the Cambridge
Philosophical Society 22700-725. - Fisher, RA. 1928. Statistical methods for
research workers. 2nd edition. Oliver and Boyd.
London. - Anderson, DR and KP Burnham. 2002. Avoiding
pitfalls when using information-theoretic
methods. J. Wildlife Management 66912-918. - Akaike, H. 1973. Information theory as an
extension of the maximum likelihood principle. Pp
267-281. in. BN Petrov and F Csaki, eds. Second
international symposium on information theory.
Akademiai Kiado, Budapest. - Akaike, H. 1974. A new look at the statistical
model identification. IEEE Transactions on
automatic control AC 19716-723. - Johnson, DG. 2002. The importance of replication
in wildlife research. J. Wildlife Management
66919-932. - Johnson, DG. 2002. The role of hypothesis testing
in wildlife science. J. Wildlife Management
66272-276. - Guthery, FS, JJ Lusk, and MJ Peterson. 2001. The
fall of the null hypothesis liabilities and
opportunities. J. Wildlife Management 65379-384. - Hurlbert, SH. 1984. Pseudoreplication and the
design of ecological field experiments.
Ecological Monographs 54187-211. - Romesburg, HC. 1981. Wildlife science gaining
reliable knowledge. J. Wildlife Management.
45293-313.