Title: IV1
1WMO course-Statistics and Climatology -
Lecture IV
Dr. Bertrand Timbal Acknowledgement
Dr. Neville Nicholls Regional
Meteorological Training Centre, Tehran,
Iran December 2003
2Statistics of the Climate system---
Significance testing and climate research
Statistics and Climatology Lecture IV
- Overview
- Significance testing the null hypothesis
- Example in climate research
- Criticism of null hypothesis significance testing
- Alternatives to null hypothesis significance
testing
3MotivationScientific research and data
exploration is about testing hypothesis.The
confidence to place in the answer obtained is as
important as the answer itself.Procedure for
making rational decisions about the reality of
effects.
4Historical example The Trial of the Pyx
The Pyx Jury counting coins by hand
- Ancient ceremony (since at least 1248), of the
Royal Mint - One coin was taken out of every days production
- Stored in a box called the Pyx (originally in
Westminster Abbey) - Every 3-4 years contents of Pyx counted
assayed. - The mean weight of the coins had to be within a
certain range - Else, the Master will be at the princes mercy
or will in life and members - In 1124 all mint masters had their right hands
cut off
5Historical example Laplace and Atmospheric tides
- Calculated mean change in barometric pressure,
9am to 3pm, 1816-1826, Paris - Compared seasonal differences of this change
with the annual mean change - Found that Feb-Apr Nov-Jan mean change
differed significantly from the overall mean. - But differences of other quarters from the
overall mean could without improbability be
attributed solely to the irregularities of
chance.
6Historic Fisher Test of Significance
- Steps in Fisher hypothesis testing
- Identify the null hypothesis H0
- Determine appropriate test statistic and its
distribution under assumption that H0 is true. - Calculate the test statistic from the data.
- Determine the achieved significance level that
corresponds to the test statistic under the
assumption that H0 is true.
R. A. Fisher 1890-1962
The value for Q is therefore significant on the
higher standard ( 1 per cent) and that for N2 at
the lower standard (5 per cent) - R.A. Fisher
7Historic Neyman and Pearson Hypothesis Testing
Jerzy Neyman 1894-1981
Egon Pearson 1895-1980
- Steps in Neyman-Pearson test
- Identify a hypothesis of interest HB and a
complementary hypothesis HA - Determine appropriate test statistic and its
distribution under assumption that HA is true.
- Specify a significance level (?) and determine
corresponding critical value of the test
statistic under the assumption that HA is true. - Calculate the test statistic from the data.
- Reject HA and accept HB if the test statistic is
further than the critical value from the
expected value. - Otherwise accept HA
8Null Hypothesis Significance TestingThe modern
hybrid
Neither Fisher nor Neyman Pearson would have
been satisfied with this hybrid
- Two hypotheses a null hypothesis (H0) and an
alternative hypothesis (H1). - Test statistic (R) is calculated from the data
and compared with its known distribution under
the assumption H0 is true. - H0 is rejected for p-values less than a
specified ? (e.g., 0.05).
9Test statistic Students-T test
T measures the distance between the sample mean
and the hypothetic mean Ho µXµo The man of the
series is The standard deviation is S For
large value of T, we reject Ho instead Ha µX not
equal to µo When Ho is true Students T test
distribution with n-1 df Both one sided or two
sided tests one sided has more power A large
sample size (n) gives more power
10(No Transcript)
11Test statistic Fishers-F test
F measures the distance between the sample
variance and the hypothetic variance Ho s2X
s2y The standard deviation of each series is S
For value of F far from 1, we reject Ho instead
Ha When Ho is true F has a Fishers F test
distribution with nx-1 df F test is conducted at
the 100 significance level to compensate for low
power A large sample size (n) gives more power
12(No Transcript)
13(No Transcript)
14(No Transcript)
15(No Transcript)
16Other test statistic
Test on the difference of mean with or without
equal variance (T test statistic) Paired
difference test e.g. single out year within two
samples
- Summary
- Test on mean one side, two sides, two means,
equal variance or not - are based on the T statistic
- Test on the variance equality of variance on
the F statistic - Test relies on a hidden assumption the model
(R) - Exploratory test will single out that something
is unusual - Confirmatory test can operate at fix
significance level - Sample size and side number will impact on the
power of the test - These are parametric, non parametric tests exist
- Serial correlation is a problem with all tests
effective sample
17Statistics of the Climate system---
Significance testing and climate research
- Overview
- Significance testing the null hypothesis
- Example in climate research
- Criticism of null hypothesis significance testing
- Alternatives to null hypothesis significance
testing
18Significance testing Climate examples
Correlations between Darwin MSLP in June-August
and Australian region tropical cyclone numbers.
Data from 1950-74.
Nicholls, N. A possible method for predicting
seasonal tropical cyclone activity in the
Australian region. Mon. Weath. Rev., 107,
1221-1224
19Significance testing Climate examples
Frederiksen et al., 2001. Dynamical seasonal
forecasts during the 1997/98 El Niño using
persisted SST anomalies. Journal of Climate
Skill of 3-month precipitation forecasts with
BMRC climate model forced with persisted SSTs.
Period April-June 1997 through April-June 1998.
Green indicates no significant skill
20Significance testing Climate examples
The sign of the linear trend (1961-1998) is
indicated by /- symbols at each station bold
indicates significant trends (.05 level)
Manton, M. J. and 26 others, 2001. Trends in
extreme daily rainfall and temperature in
Southeast Asia and the South Pacific 1961-1998.
International Journal of Climatology.
21Statistics of the Climate system---
Significance testing and climate research
- Overview
- Significance testing the null hypothesis
- Example in climate research
- Criticism of null hypothesis significance testing
- Alternatives to null hypothesis significance
testing
22Criticisms of NHST in other scientific fields
- The significance test as currently used is a
disaster
Hunter, J. E., 1997 Needed A ban on the
significance test. Psychological Science, 8, 3-7
Statistical significance testing has involved
more fantasy than fact
Carver, R. P., 1978 The case against statistical
significance testing. Harvard Educational Review,
48, 378-398
The Earth is round (plt.05)
Cohen, J., 1994. American Psychologist, 49,
997-1003
23Some criticisms of NHST
- Arbitrary surely God loves the .06 nearly as
much as the .05 (Rosnell and Rosenthal (1989). - A nil H0 (e.g., correlation is 0.0) is usually
silly or trivial. - Incorrectly concluding that H0 is true if it
is not rejected. - Publication bias
- Power depends on size of sample probability of
(correctly) rejecting H0 with typically-sized
climate data sets, is low. - Confusion of the inverse significance tests do
not tell us what we really want to know, i.e.,
Given these data, what is the probability that
H0 is true? - Trends samples or populations?
24Criticism Publication Bias
- In a field dominated by significance testing
- Research which yields non significant results
may not be published - Such unpublished research remains unknown to
other researchers - Other researchers repeat the research
independently - Eventually a significant result occurs, by
chance, and is published - So the published literature consists of false
conclusions resulting from errors of the first
kind in statistical tests of significance.
Sterling, T. D., 1959. Publication decisions and
their possible effects on inferences drawn from
tests of significance - or vice versa. J. Amer.
Statistical Assoc., 54, 30-34.
25Criticism sample size dependence
- A correlation of 0.60 is not significant
(plt0.05) if n10, but - A correlation of 0.10 would be significant in a
sample of 400
Two-tailed significance levels of correlation
coefficient, for different sample sizes
26Criticism low probability of rejecting H0
- Assume real (population) correlation is 0.30
- Samples of size 50 (typical for climate
problems) - Two-sided significance test with ? 0.05
- In 57 of samples, this would lead to the
conclusion - not significant
- That is, the chance of (wrongly) not rejecting
the null hypothesis is a coin flip.
27Criticism confusion of the inverse
- Reasoning behind significance testing
- If H0 is true, then this result (statistical
significance) would probably not occur - This result (statistical significance) has
occurred - Therefore H0 is probably not true
- This is equivalent to
- If a person is an Australian, then she is
probably not a member of parliament - This person is a member of parliament
- Therefore she is probably not an Australian.
28Criticism confusion of the inverse
- Significance tests do not tell us what we really
want to know, i.e., Given these data, what is
the probability that H0 is true? - General belief that the smaller the p-value, the
greater the probability that is H0 false. - That is, belief that NHST tells us P(H0 /R),
i.e., the probability of H0 being true given the
observed data R. - In fact, NHST tells us the probability of
observing R or more extreme data, assuming that
H0 is true, i.e., P(R/H0). - But, P(H0 /R) P (R/H0)P(H0)/P(R)
- So, P(H0 /R) what we want to know P(R/H0)
only if P(H0) P(R) - No theoretical justification for such an
assumption.
29Criticism confusion of the inverse
B
- Randomly choose one box and blindly draw two
beads from box - H0 The box is type A H1 The box is
type B - Decision rule If two beads are black (R),
reject H0, otherwise H0 cannot be rejected
30Criticism confusion of the inverse
B
- P(R/ H0) ? 0.048 (i.e., approximately 5)
percentage of false rejections, when box really
is of type A - Suppose we drew two black beads, i.e, we reject
H0. What is the probability that we have made an
error? This is P(H0 /R). By Bayes theorem this is
0.43.
31Criticism trend, sample or population?
32Statistics of the Climate system---
Significance testing and climate research
- Overview
- Significance testing the null hypothesis
- Example in climate research
- Criticism of null hypothesis significance testing
- Alternatives to null hypothesis significance
testing
33What are the alternatives?
- Confidence intervals
- Permutation tests
- Cross-validation
- A posteriori testing
34Cross-validation leave one out approach
CROSS-VALIDATION ALGORITHM FOR A SINGLE
PRESCRIBED MODEL
Given a data set of N cases, xi, yi, i1,N
Set aside one case xk, yk
Estimate model parameters from remaining (N-1)
data, eg y ak x bk
Hindcast missing case yk
Continue loop over all N cases
35Conclusions
- NHST tells us little of what we need to know
- NHST is inherently misleading
- There are alternatives available to NHST
- We should be less enthusiastic about insisting on
its use