IV1 - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

IV1

Description:

Scientific research and data exploration is about testing hypothesis. ... Permutation tests. Cross-validation. A posteriori testing. IV-34. CROSS-VALIDATION ALGORITHM ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 36
Provided by: gene360
Category:

less

Transcript and Presenter's Notes

Title: IV1


1
WMO course-Statistics and Climatology -
Lecture IV
Dr. Bertrand Timbal Acknowledgement
Dr. Neville Nicholls Regional
Meteorological Training Centre, Tehran,
Iran December 2003
2
Statistics of the Climate system---
Significance testing and climate research
Statistics and Climatology Lecture IV
  • Overview
  • Significance testing the null hypothesis
  • Example in climate research
  • Criticism of null hypothesis significance testing
  • Alternatives to null hypothesis significance
    testing

3
MotivationScientific research and data
exploration is about testing hypothesis.The
confidence to place in the answer obtained is as
important as the answer itself.Procedure for
making rational decisions about the reality of
effects.
4
Historical example The Trial of the Pyx
The Pyx Jury counting coins by hand
  • Ancient ceremony (since at least 1248), of the
    Royal Mint
  • One coin was taken out of every days production
  • Stored in a box called the Pyx (originally in
    Westminster Abbey)
  • Every 3-4 years contents of Pyx counted
    assayed.
  • The mean weight of the coins had to be within a
    certain range
  • Else, the Master will be at the princes mercy
    or will in life and members
  • In 1124 all mint masters had their right hands
    cut off

5
Historical example Laplace and Atmospheric tides
  • Calculated mean change in barometric pressure,
    9am to 3pm, 1816-1826, Paris
  • Compared seasonal differences of this change
    with the annual mean change
  • Found that Feb-Apr Nov-Jan mean change
    differed significantly from the overall mean.
  • But differences of other quarters from the
    overall mean could without improbability be
    attributed solely to the irregularities of
    chance.

6
Historic Fisher Test of Significance
  • Steps in Fisher hypothesis testing
  • Identify the null hypothesis H0
  • Determine appropriate test statistic and its
    distribution under assumption that H0 is true.
  • Calculate the test statistic from the data.
  • Determine the achieved significance level that
    corresponds to the test statistic under the
    assumption that H0 is true.

R. A. Fisher 1890-1962
The value for Q is therefore significant on the
higher standard ( 1 per cent) and that for N2 at
the lower standard (5 per cent) - R.A. Fisher
7
Historic Neyman and Pearson Hypothesis Testing
Jerzy Neyman 1894-1981
Egon Pearson 1895-1980
  • Steps in Neyman-Pearson test
  • Identify a hypothesis of interest HB and a
    complementary hypothesis HA
  • Determine appropriate test statistic and its
    distribution under assumption that HA is true.
  • Specify a significance level (?) and determine
    corresponding critical value of the test
    statistic under the assumption that HA is true.
  • Calculate the test statistic from the data.
  • Reject HA and accept HB if the test statistic is
    further than the critical value from the
    expected value.
  • Otherwise accept HA

8
Null Hypothesis Significance TestingThe modern
hybrid
Neither Fisher nor Neyman Pearson would have
been satisfied with this hybrid
  • Two hypotheses a null hypothesis (H0) and an
    alternative hypothesis (H1).
  • Test statistic (R) is calculated from the data
    and compared with its known distribution under
    the assumption H0 is true.
  • H0 is rejected for p-values less than a
    specified ? (e.g., 0.05).

9
Test statistic Students-T test
T measures the distance between the sample mean
and the hypothetic mean Ho µXµo The man of the
series is The standard deviation is S For
large value of T, we reject Ho instead Ha µX not
equal to µo When Ho is true Students T test
distribution with n-1 df Both one sided or two
sided tests one sided has more power A large
sample size (n) gives more power
10
(No Transcript)
11
Test statistic Fishers-F test
F measures the distance between the sample
variance and the hypothetic variance Ho s2X
s2y The standard deviation of each series is S
For value of F far from 1, we reject Ho instead
Ha When Ho is true F has a Fishers F test
distribution with nx-1 df F test is conducted at
the 100 significance level to compensate for low
power A large sample size (n) gives more power
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
Other test statistic
Test on the difference of mean with or without
equal variance (T test statistic) Paired
difference test e.g. single out year within two
samples
  • Summary
  • Test on mean one side, two sides, two means,
    equal variance or not
  • are based on the T statistic
  • Test on the variance equality of variance on
    the F statistic
  • Test relies on a hidden assumption the model
    (R)
  • Exploratory test will single out that something
    is unusual
  • Confirmatory test can operate at fix
    significance level
  • Sample size and side number will impact on the
    power of the test
  • These are parametric, non parametric tests exist
  • Serial correlation is a problem with all tests
    effective sample

17
Statistics of the Climate system---
Significance testing and climate research
  • Overview
  • Significance testing the null hypothesis
  • Example in climate research
  • Criticism of null hypothesis significance testing
  • Alternatives to null hypothesis significance
    testing

18
Significance testing Climate examples
Correlations between Darwin MSLP in June-August
and Australian region tropical cyclone numbers.
Data from 1950-74.
Nicholls, N. A possible method for predicting
seasonal tropical cyclone activity in the
Australian region. Mon. Weath. Rev., 107,
1221-1224
19
Significance testing Climate examples
Frederiksen et al., 2001. Dynamical seasonal
forecasts during the 1997/98 El Niño using
persisted SST anomalies. Journal of Climate
Skill of 3-month precipitation forecasts with
BMRC climate model forced with persisted SSTs.
Period April-June 1997 through April-June 1998.
Green indicates no significant skill
20
Significance testing Climate examples
The sign of the linear trend (1961-1998) is
indicated by /- symbols at each station bold
indicates significant trends (.05 level)
Manton, M. J. and 26 others, 2001. Trends in
extreme daily rainfall and temperature in
Southeast Asia and the South Pacific 1961-1998.
International Journal of Climatology.
21
Statistics of the Climate system---
Significance testing and climate research
  • Overview
  • Significance testing the null hypothesis
  • Example in climate research
  • Criticism of null hypothesis significance testing
  • Alternatives to null hypothesis significance
    testing

22
Criticisms of NHST in other scientific fields
  • The significance test as currently used is a
    disaster

Hunter, J. E., 1997 Needed A ban on the
significance test. Psychological Science, 8, 3-7
Statistical significance testing has involved
more fantasy than fact
Carver, R. P., 1978 The case against statistical
significance testing. Harvard Educational Review,
48, 378-398
The Earth is round (plt.05)
Cohen, J., 1994. American Psychologist, 49,
997-1003
23
Some criticisms of NHST
  • Arbitrary surely God loves the .06 nearly as
    much as the .05 (Rosnell and Rosenthal (1989).
  • A nil H0 (e.g., correlation is 0.0) is usually
    silly or trivial.
  • Incorrectly concluding that H0 is true if it
    is not rejected.
  • Publication bias
  • Power depends on size of sample probability of
    (correctly) rejecting H0 with typically-sized
    climate data sets, is low.
  • Confusion of the inverse significance tests do
    not tell us what we really want to know, i.e.,
    Given these data, what is the probability that
    H0 is true?
  • Trends samples or populations?

24
Criticism Publication Bias
  • In a field dominated by significance testing
  • Research which yields non significant results
    may not be published
  • Such unpublished research remains unknown to
    other researchers
  • Other researchers repeat the research
    independently
  • Eventually a significant result occurs, by
    chance, and is published
  • So the published literature consists of false
    conclusions resulting from errors of the first
    kind in statistical tests of significance.

Sterling, T. D., 1959. Publication decisions and
their possible effects on inferences drawn from
tests of significance - or vice versa. J. Amer.
Statistical Assoc., 54, 30-34.
25
Criticism sample size dependence
  • A correlation of 0.60 is not significant
    (plt0.05) if n10, but
  • A correlation of 0.10 would be significant in a
    sample of 400

Two-tailed significance levels of correlation
coefficient, for different sample sizes
26
Criticism low probability of rejecting H0
  • Assume real (population) correlation is 0.30
  • Samples of size 50 (typical for climate
    problems)
  • Two-sided significance test with ? 0.05
  • In 57 of samples, this would lead to the
    conclusion
  • not significant
  • That is, the chance of (wrongly) not rejecting
    the null hypothesis is a coin flip.

27
Criticism confusion of the inverse
  • Reasoning behind significance testing
  • If H0 is true, then this result (statistical
    significance) would probably not occur
  • This result (statistical significance) has
    occurred
  • Therefore H0 is probably not true
  • This is equivalent to
  • If a person is an Australian, then she is
    probably not a member of parliament
  • This person is a member of parliament
  • Therefore she is probably not an Australian.

28
Criticism confusion of the inverse
  • Significance tests do not tell us what we really
    want to know, i.e., Given these data, what is
    the probability that H0 is true?
  • General belief that the smaller the p-value, the
    greater the probability that is H0 false.
  • That is, belief that NHST tells us P(H0 /R),
    i.e., the probability of H0 being true given the
    observed data R.
  • In fact, NHST tells us the probability of
    observing R or more extreme data, assuming that
    H0 is true, i.e., P(R/H0).
  • But, P(H0 /R) P (R/H0)P(H0)/P(R)
  • So, P(H0 /R) what we want to know P(R/H0)
    only if P(H0) P(R)
  • No theoretical justification for such an
    assumption.

29
Criticism confusion of the inverse
B
  • Randomly choose one box and blindly draw two
    beads from box
  • H0 The box is type A H1 The box is
    type B
  • Decision rule If two beads are black (R),
    reject H0, otherwise H0 cannot be rejected

30
Criticism confusion of the inverse
B
  • P(R/ H0) ? 0.048 (i.e., approximately 5)
    percentage of false rejections, when box really
    is of type A
  • Suppose we drew two black beads, i.e, we reject
    H0. What is the probability that we have made an
    error? This is P(H0 /R). By Bayes theorem this is
    0.43.

31
Criticism trend, sample or population?
32
Statistics of the Climate system---
Significance testing and climate research
  • Overview
  • Significance testing the null hypothesis
  • Example in climate research
  • Criticism of null hypothesis significance testing
  • Alternatives to null hypothesis significance
    testing

33
What are the alternatives?
  • Confidence intervals
  • Permutation tests
  • Cross-validation
  • A posteriori testing

34
Cross-validation leave one out approach
CROSS-VALIDATION ALGORITHM FOR A SINGLE
PRESCRIBED MODEL
Given a data set of N cases, xi, yi, i1,N
Set aside one case xk, yk
Estimate model parameters from remaining (N-1)
data, eg y ak x bk
Hindcast missing case yk
Continue loop over all N cases
35
Conclusions
  • NHST tells us little of what we need to know
  • NHST is inherently misleading
  • There are alternatives available to NHST
  • We should be less enthusiastic about insisting on
    its use
Write a Comment
User Comments (0)
About PowerShow.com