Title: Using Statistical Methods for Environmental Science and Management
1Using Statistical Methods for Environmental
Science and Management
- Graham McBride, NIWA, Hamilton
- g.mcbride_at_niwa.co.nz
- Statistics Teachers Day, 25 November 2008
- What do statisticians really do?
2THE ROLE OF STATISTICAL METHODS MY VIEW
- Separate randomness from pattern
- Make inferences about the world, based on data
from samples - Help to design sampling programmes (use resources
efficiently) - Help to establish cause and effect
- Cant prove anything with statistics
3Three kinds of lies Insult, or compliment?
- There are three kinds of lies
- lies, damned lies, and statistics
- Who said that?
- Mark Twain (1835 1910)
- Figures often beguile me, particularly when I
have the arranging of them myself - Benjamin Disraeli (1804 1881)
- Sought to discredit true British soldier casualty
figures in the Crimean War (1853 1856) - Who came first? (Twain cites Disraeli!)
4What you should do
- Establish the context of your work (what do
people want to know, and why do they want to know
that?) - Consult with others, e.g., to discuss whether a
proposed sampling programme can actually be done - Discuss the appropriate burden-of-proof (e.g.,
drinking water standards minimise the consumers
risk, not the producers risk)
5What you should not do
- Confuse association and causation (pp. 267-8 of
Barton, Sigma Mathematics) - Ignore other lines-of-evidence (Bradford-Hill
criteria), such as - Can the cause reach the location of the effect?
- Is the finding plausible?
- Can you explain inconsistencies with other
evidence? - Be ignorant of how statistical procedures work
- The computer said so
6What you should not do
- Believe that there is only one statistically
correct way of analysing data - There are lots of good ways many more bad and
wrong ways too - Not consider bias and imprecision in your data
7Bias and Imprecision
8What you might have to do
- Use non-standard methods, e.g.,
- non-parametric (rank) methods for highly skewed
data (very common in aquatic studies) - e.g., linear trend or monotonic trend?
- Read rather widely
- Statistics is not a cut-and-dried subject there
are still some fundamental debates about
statistical inference, especially the Bayesians
versus the frequentistsboth approaches have
their place
9What you also might have to do
- Answer this question What is P
- Result of a hypothesis test
- Used (over-used!) routinely, so youll need to
know - P Prob(data at least as extreme if the tested
hypothesis is true) - Not the probability of the truth of the
hypothesis - Relate results to confidence intervals
10EXAMPLEIncreasing pressure on freshwaters
Is there evidence of associated deterioration (or
improvements) in rivers?
11600000
4
Total Nitrogen
3.5
Total Phosphorus
500000
Cows
3
400000
2.5
Fertilizer consumption (tonnes)1
Cow numbers (millions)2
300000
2
1.5
200000
1
100000
0.5
0
0
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
Data source 1Fertilizer consumption UN Food
Agriculture Organisation 2Cows Livestock
Improvement NZ Dairy Statistics
12A National River Water Quality Network for New
Zealand (1989)
- GOAL
- To provide scientifically defensible information
on the important physical, chemical, and
biological characteristics of a selection of the
nations rivers as a basis for advising the
Minister of Science and other Ministers of the
Crown of the trends and status of these waters - OBJECTIVES
- Detect significant trends in water quality
- Develop better understanding of water resources,
and hence to better assist their management
13NRWQNstructure
- 77 sites on 35 rivers
- All sites have reliable flow data
- Sites are sampled by regional Field Teams
- 14 WQ parameters (monthly)
- Data available (search for WQIS www.niwa.co.nz
14(No Transcript)
15WQ state land use
Correlations with Pasture Temperature 0.50 Co
nductivity 0.55 pH -0.19 Dissolved
oxygen -0.17 Visual clarity -0.60 NOx-N 0.71
NH4-N 0.77 Total nitrogen 0.84 DRP 0.67
Total phosphorus 0.74 E. coli 0.79
P lt 0.001 Spearman rank correlation
16WQ Trends 1989-2005
- Calculated annual medians from monthly data at
each site for each parameter - Took the 77 datapoints for each year and
calculated the 5th, 50th, and 95th percentile
values - The 50th percentile gives us a picture of what is
happening in a national average river in terms
of annual median water quality data - The 5th and 95th percentiles tell us about
changes over time in our best and worst
rivers. - Trends in these values were assessed using the
Spearman rank correlation coefficient (rS).
17NOx-N Trends 1989-2005
1200
1000
)
3
800
-N (mg/m
600
x
NO
400
200
0
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
Year
Concentrations of NOx-N increased dramatically
between 1989 2005 in our most enriched rivers
18Trends 1989-2005
- Results indicative of
- Warming in our coolest rivers
- Drops in pH
- Increasing nitrogen enrichment
- Decreases in BOD5 most rivers
19Trends 1989-2003
- More formal analysis of trends carried out on
monthly data (1989-2003) at all 77 sites - Seasonal Kendall test
- Data were flow-adjusted using LOWESS (many WQ
parameters can be strongly influenced by
discharge) - Used a binomial test to indicate a national
trend - Discriminate between significant (i.e. P lt
0.05) and meaningful trends (i.e., P lt 0.05 and
slope gt 1 of median value per annum).
20Trends in TN
Total nitrogen exhibited a strong increasing
trend at the national scale during 1989-2003 (P lt
0.001). Increasing trends in TN were
particularly evident in the South Island, where
25 of 33 sites showed meaningful increases.
21Trends in DRP
There was a strong national trend of increasing
DRP concentrations during 1989-2003 (P lt
0.001). This result contrasts with the
relatively weak trends observed for 1989-2005.
22Summary of trends 1989-2003
No significant trend
Significant improving trend
Significant deteriorating trend
23Links between land use and trends
The magnitude of trends in DRP increase with
pastoral land use
24Land use and trends
RSKSE
SKSE
Parameter
0.20
0.19
Temperature
0.40
0.47
Conductivity
-0.28
-0.28
pH
-0.27
-0.27
Dissolved oxygen
-0.11
-0.26
Visual clarity
0.23
0.30
Oxidised nitrogen
0.68
0.29
Ammoniacal nitrogen
-0.01
0.35
Total nitrogen
0.48
0.59
Dissolved reactive phosphorus
0.18
0.31
Total phosphorus
Spearman rank correlation coefficients (bold P lt
0.01)
25Conclusions
- Strong associations between nutrient
concentrations and pastoral land cover at the
national scale (State) - Rivers draining large areas of pastoral land have
deteriorated significantly over the last 17 years
with respect to nitrogen concentrations (Trends) - The magnitude of trends in some parameters is
associated with extent of pastoral land use - Decreasing trends in NH4-N and BOD5 indicative of
improvements in point source management - Increasing trends in nutrients indicative of
increasing pressure from agriculture
26EXAMPLEWater quality-human health risk
assessment, quantitative approach Christchurch
City Wastewater Outfall
27(No Transcript)
28Quantitative Microbial Health Risk Assessment
(QMHRA)
- Identify hazards (pathogens)
- Quantify exposure (swimming, shellfish
consumption) - Assess dose-response
- Characterise risk
29Hazard vs. Risk
- Hazards can cause harm, after exposure
- Risk cannot occur if no exposure
- Can have hazard without risk
- But not vice versa!
30Christchurch hazardsviruses only
- From an extensive list (next slide)
- Swimming
- adenovirus (respiratory)
- rotavirus
- enterovirus (Echovirus 12)
- Shellfish consumption (raw)
- enteroviruses
- rotavirus
- hepatitis A
31(No Transcript)
32Dose-response curves
33Accounting for variability and uncertainty
- Exposure is variable
- e.g., individuals swim duration
- Dose-response is uncertain
- only some pathogen strains in clinical trials
- trials limited to healthy adults
- Describe using statistical distributions in a
Monte Carlo analysis
34Scenariosis!
- 1,000 people 1,000 occasions
- 8 beaches
- 2 influent virus conditions (normal outbreak)
- 2 seasons summer/winter
- 3 viruses for 2 activities
- 2 outfall lengths
- 2 virus inactivation regimes
- 2 UV options (with without)
- ? 1536 x 106 calculations
35Calculation sequence
36Dose-response models
- Constant susceptibilitysimple exponential (d
average dose, Prinf infection prob) - Variable susceptibilitybeta-Poisson
- Calculations performed using _at_RISK (an Excel
plug-in)
37Occasion 1, Individual 1
Volume ingested
Dose
Probability of infection
Binomial distribution
Infected?
38Occasion 1, Individual 2
Volume ingested
Dose
Probability of infection
Binomial distribution
Infected?
39Occasion 1, Individual 3
Volume ingested
Dose
Probability of infection
Binomial distribution
Infected?
40Occasion 1, Individual 1000
Volume ingested
Dose
Probability of infection
Binomial distribution
Infected?
41Occasion 2, Individual 1
Volume ingested
Dose
Probability of infection
Binomial distribution
Infected?
42Occasion 2, Individual 2
Volume ingested
Dose
Probability of infection
Binomial distribution
Infected?
43Occasion 2, Individual 3
Volume ingested
Dose
Probability of infection
Binomial distribution
Infected?
44Characterising the results
- Risk percentilespercent of time the risk is
below a stated value - IIRIndividual Infection Risk (total number of
calculated infections divided by total number of
exposures)
45Results
South New Brighton
Integers are cases per 1000 exposures
46IIR Normal influent, South Brightonadenovirus,
swim
Numbers are percentages. MfE/MoH (2003)
guidelines lt0.3 Very good.
47IIR Normal influent, South Brighton rotavirus,
shellfish
Numbers are percentages.
48IIR Outbreak influent, South Brighton
adenovirus, swim
Numbers are percentages. MfE/MoH (2003)
guidelines 1.9 - 3.9 Fair - Poor.
49IIR Outbreak influent, South Brighton
rotavirus, shellfish
Numbers are percentages.
50IIR Outbreak influent, South Brighton hepatitis
A, shellfish
Numbers are percentages.
51Statistical modelling can reveal important
information gaps
- Bioaccumulation factors for NZ shellfish
- Dose-response for norovirus (new study published)
- Detailed exposure data (ingestion rates etc.)
- Constancy of virulence?
- Campylobacter in shellfish?
- Better methods for uncertainty analysis
- Better models for illness, cf. infection
52Conclusions
- Longer outfall no UV still has higher risk than
shorter outfall with UV - But risks low
- What if UV doesnt work 24/7 (technology
breakdown, power outage,) - Decision longer outfall, no UV
53Semi-Quantitative approach
- Use when hazards and exposures are less
well-defined and more widespread - Paradigm is
- Risk score Likelihood x Consequences
- Use scores as a relative measure of risk.
- Use panel of experts may solicit list of
hazards from affected community
54(No Transcript)
55(No Transcript)
56Hazards
- Pathogens (from humans and animals)
- Chemicals
- Algal toxins
- Physical objects
57End-points (exposures)
- Recreational contact
- Drinking water consumption
- Consumptions of aquatic organisms
- Food? (more difficult)
58The delivery chain
- Can be called hazardous event
- How does the hazard get from its origin to the
point of exposure?
59Likelihood
- Probability of an exposure event (for at least
one person) in a year (cf. any year) to a
sufficient degree to cause harm. Scores
0 Impossible 0 1 Extremely unlikely
1 2 Very unlikely 1 5 4 Unlikely 6
40 6 Even 41 60 8 Likely 61 95 10 Very
likely gt95
60Consequences
Scale Severity Duration 1 lt1 1
Asymptomatic 1 Day 2 15 2 Discomfort 2
Week 3 510 3 Visit doctor 3 Month 4
1020 4 Hospitalisation 4 Year 5 gt20 5
Death 5 Permanent Percent of total community
Refers to health effect
61Typical results
62Conclusions
- Use QRA for well-defined local problems
- Use semi-quantitative methods for broader-scale
problems - Risk assessment identifies many knowledge gaps,
some need urgent attention - Most difficult gap often the delivery chain
- Can update assessments with new data
- Especially useful in ranking risks
63EXAMPLECompliance with Drinking Water Standards
How to assess compliance with microbial limits?
- Cant sample everything
- Need high assurance that supply isnt
contaminated in some assessment period cant be
fully assured - MoH then said We want to be 95 confident that
the water is uncontaminated for 95 of the time.
What should the compliance rule be?
64What kind of a question is this?
- Bayesian
- It asks about the probability of an hypothesis,
given data that we will collect - Frequentist (classical methods) ask about the
probability of data assuming an hypothesis to be
true - Precautionary (not permissive)
- Benefit of doubt goes to the consumer, not to the
supplier - One-sided
- Hypothesis to be tested is breach, not compliance
65Results
66Results
67Policy Implications
- Results in Table 8.2 now incorporated into 2005
Drinking-water Standards for New Zealand - http//www.moh.govt.nz/moh.nsf/0/12F2D7FFADC900A4C
C256FAF0007E8A0/File/drinkingwaterstandardsnz-200
5.pdf
68EXAMPLEEffect of microbial contamination on
swimmers health
Epidemiological study at 7 NZ beaches
69(No Transcript)
70Main Findings
- Using generalized regression models
- Evidence of respiratory illness effects related
to microbial contamination - Human- and animal-waste impacted beaches not
separable in terms of health effects - Both were separable from control beaches
71Policy implications
- Human and animal wastes no longer distinguished
in terms of health risks - Result incorporated into new guidelines
- http//www.mfe.govt.nz/publications/water/microbio
logical-quality-jun03/