IV1 - PowerPoint PPT Presentation

1 / 35

About This Presentation

Title:

IV1

Description:

Scientific research and data exploration is about testing hypothesis. ... Permutation tests. Cross-validation. A posteriori testing. IV-34. CROSS-VALIDATION ALGORITHM ... – PowerPoint PPT presentation

Number of Views:26

Avg rating:3.0/5.0

Slides: 36

Provided by: gene360

Category:

more less

Transcript and Presenter's Notes

Title: IV1

1
WMO course-Statistics and Climatology -
Lecture IV
Dr. Bertrand Timbal Acknowledgement
Dr. Neville Nicholls Regional
Meteorological Training Centre, Tehran,
Iran December 2003
2
Statistics of the Climate system---
Significance testing and climate research
Statistics and Climatology Lecture IV

Overview
Significance testing the null hypothesis
Example in climate research
Criticism of null hypothesis significance testing
Alternatives to null hypothesis significance
testing

3
MotivationScientific research and data
exploration is about testing hypothesis.The
confidence to place in the answer obtained is as
important as the answer itself.Procedure for
making rational decisions about the reality of
effects.
4
Historical example The Trial of the Pyx
The Pyx Jury counting coins by hand

Ancient ceremony (since at least 1248), of the
Royal Mint
One coin was taken out of every days production
Stored in a box called the Pyx (originally in
Westminster Abbey)
Every 3-4 years contents of Pyx counted
assayed.
The mean weight of the coins had to be within a
certain range
Else, the Master will be at the princes mercy
or will in life and members
In 1124 all mint masters had their right hands
cut off

5
Historical example Laplace and Atmospheric tides

Calculated mean change in barometric pressure,
9am to 3pm, 1816-1826, Paris
Compared seasonal differences of this change
with the annual mean change
Found that Feb-Apr Nov-Jan mean change
differed significantly from the overall mean.
But differences of other quarters from the
overall mean could without improbability be
attributed solely to the irregularities of
chance.

6
Historic Fisher Test of Significance

Steps in Fisher hypothesis testing
Identify the null hypothesis H0
Determine appropriate test statistic and its
distribution under assumption that H0 is true.
Calculate the test statistic from the data.
Determine the achieved significance level that
corresponds to the test statistic under the
assumption that H0 is true.

R. A. Fisher 1890-1962
The value for Q is therefore significant on the
higher standard ( 1 per cent) and that for N2 at
the lower standard (5 per cent) - R.A. Fisher
7
Historic Neyman and Pearson Hypothesis Testing
Jerzy Neyman 1894-1981
Egon Pearson 1895-1980

Steps in Neyman-Pearson test
Identify a hypothesis of interest HB and a
complementary hypothesis HA
Determine appropriate test statistic and its
distribution under assumption that HA is true.
Specify a significance level (?) and determine
corresponding critical value of the test
statistic under the assumption that HA is true.
Calculate the test statistic from the data.
Reject HA and accept HB if the test statistic is
further than the critical value from the
expected value.
Otherwise accept HA

8
Null Hypothesis Significance TestingThe modern
hybrid
Neither Fisher nor Neyman Pearson would have
been satisfied with this hybrid

Two hypotheses a null hypothesis (H0) and an
alternative hypothesis (H1).
Test statistic (R) is calculated from the data
and compared with its known distribution under
the assumption H0 is true.
H0 is rejected for p-values less than a
specified ? (e.g., 0.05).

9
Test statistic Students-T test
T measures the distance between the sample mean
and the hypothetic mean Ho µXµo The man of the
series is The standard deviation is S For
large value of T, we reject Ho instead Ha µX not
equal to µo When Ho is true Students T test
distribution with n-1 df Both one sided or two
sided tests one sided has more power A large
sample size (n) gives more power
10
(No Transcript)
11
Test statistic Fishers-F test
F measures the distance between the sample
variance and the hypothetic variance Ho s2X
s2y The standard deviation of each series is S
For value of F far from 1, we reject Ho instead
Ha When Ho is true F has a Fishers F test
distribution with nx-1 df F test is conducted at
the 100 significance level to compensate for low
power A large sample size (n) gives more power
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
Other test statistic
Test on the difference of mean with or without
equal variance (T test statistic) Paired
difference test e.g. single out year within two
samples

Summary
Test on mean one side, two sides, two means,
equal variance or not
are based on the T statistic
Test on the variance equality of variance on
the F statistic
Test relies on a hidden assumption the model
(R)
Exploratory test will single out that something
is unusual
Confirmatory test can operate at fix
significance level
Sample size and side number will impact on the
power of the test
These are parametric, non parametric tests exist
Serial correlation is a problem with all tests
effective sample

17
Statistics of the Climate system---
Significance testing and climate research

Overview
Significance testing the null hypothesis
Example in climate research
Criticism of null hypothesis significance testing
Alternatives to null hypothesis significance
testing

18
Significance testing Climate examples
Correlations between Darwin MSLP in June-August
and Australian region tropical cyclone numbers.
Data from 1950-74.
Nicholls, N. A possible method for predicting
seasonal tropical cyclone activity in the
Australian region. Mon. Weath. Rev., 107,
1221-1224
19
Significance testing Climate examples
Frederiksen et al., 2001. Dynamical seasonal
forecasts during the 1997/98 El Niño using
persisted SST anomalies. Journal of Climate
Skill of 3-month precipitation forecasts with
BMRC climate model forced with persisted SSTs.
Period April-June 1997 through April-June 1998.
Green indicates no significant skill
20
Significance testing Climate examples
The sign of the linear trend (1961-1998) is
indicated by /- symbols at each station bold
indicates significant trends (.05 level)
Manton, M. J. and 26 others, 2001. Trends in
extreme daily rainfall and temperature in
Southeast Asia and the South Pacific 1961-1998.
International Journal of Climatology.
21
Statistics of the Climate system---
Significance testing and climate research

Overview
Significance testing the null hypothesis
Example in climate research
Criticism of null hypothesis significance testing
Alternatives to null hypothesis significance
testing

22
Criticisms of NHST in other scientific fields

The significance test as currently used is a
disaster

Hunter, J. E., 1997 Needed A ban on the
significance test. Psychological Science, 8, 3-7
Statistical significance testing has involved
more fantasy than fact
Carver, R. P., 1978 The case against statistical
significance testing. Harvard Educational Review,
48, 378-398
The Earth is round (plt.05)
Cohen, J., 1994. American Psychologist, 49,
997-1003
23
Some criticisms of NHST

Arbitrary surely God loves the .06 nearly as
much as the .05 (Rosnell and Rosenthal (1989).
A nil H0 (e.g., correlation is 0.0) is usually
silly or trivial.
Incorrectly concluding that H0 is true if it
is not rejected.
Publication bias
Power depends on size of sample probability of
(correctly) rejecting H0 with typically-sized
climate data sets, is low.
Confusion of the inverse significance tests do
not tell us what we really want to know, i.e.,
Given these data, what is the probability that
H0 is true?
Trends samples or populations?

24
Criticism Publication Bias

In a field dominated by significance testing
Research which yields non significant results
may not be published
Such unpublished research remains unknown to
other researchers
Other researchers repeat the research
independently
Eventually a significant result occurs, by
chance, and is published
So the published literature consists of false
conclusions resulting from errors of the first
kind in statistical tests of significance.

Sterling, T. D., 1959. Publication decisions and
their possible effects on inferences drawn from
tests of significance - or vice versa. J. Amer.
Statistical Assoc., 54, 30-34.
25
Criticism sample size dependence

A correlation of 0.60 is not significant
(plt0.05) if n10, but
A correlation of 0.10 would be significant in a
sample of 400

Two-tailed significance levels of correlation
coefficient, for different sample sizes
26
Criticism low probability of rejecting H0

Assume real (population) correlation is 0.30
Samples of size 50 (typical for climate
problems)
Two-sided significance test with ? 0.05
In 57 of samples, this would lead to the
conclusion
not significant
That is, the chance of (wrongly) not rejecting
the null hypothesis is a coin flip.

27
Criticism confusion of the inverse

Reasoning behind significance testing
If H0 is true, then this result (statistical
significance) would probably not occur
This result (statistical significance) has
occurred
Therefore H0 is probably not true
This is equivalent to
If a person is an Australian, then she is
probably not a member of parliament
This person is a member of parliament
Therefore she is probably not an Australian.

28
Criticism confusion of the inverse

Significance tests do not tell us what we really
want to know, i.e., Given these data, what is
the probability that H0 is true?
General belief that the smaller the p-value, the
greater the probability that is H0 false.
That is, belief that NHST tells us P(H0 /R),
i.e., the probability of H0 being true given the
observed data R.
In fact, NHST tells us the probability of
observing R or more extreme data, assuming that
H0 is true, i.e., P(R/H0).
But, P(H0 /R) P (R/H0)P(H0)/P(R)
So, P(H0 /R) what we want to know P(R/H0)
only if P(H0) P(R)
No theoretical justification for such an
assumption.

29
Criticism confusion of the inverse
B

Randomly choose one box and blindly draw two
beads from box
H0 The box is type A H1 The box is
type B
Decision rule If two beads are black (R),
reject H0, otherwise H0 cannot be rejected

30
Criticism confusion of the inverse
B

P(R/ H0) ? 0.048 (i.e., approximately 5)
percentage of false rejections, when box really
is of type A
Suppose we drew two black beads, i.e, we reject
H0. What is the probability that we have made an
error? This is P(H0 /R). By Bayes theorem this is
0.43.

31
Criticism trend, sample or population?
32
Statistics of the Climate system---
Significance testing and climate research

Overview
Significance testing the null hypothesis
Example in climate research
Criticism of null hypothesis significance testing
Alternatives to null hypothesis significance
testing

33
What are the alternatives?

Confidence intervals
Permutation tests
Cross-validation
A posteriori testing

34
Cross-validation leave one out approach
CROSS-VALIDATION ALGORITHM FOR A SINGLE
PRESCRIBED MODEL
Given a data set of N cases, xi, yi, i1,N
Set aside one case xk, yk
Estimate model parameters from remaining (N-1)
data, eg y ak x bk
Hindcast missing case yk
Continue loop over all N cases
35
Conclusions