FPP 26-27 - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

FPP 26-27

Description:

Significance Tests FPP 26-27 My opinion about statistical significance DO NOT RELY BLINDLY ON A FIXED CUT-OFF Consider two p-values: 0.050001 and 0.049999. – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 45
Provided by: Garrit2
Category:
Tags: fpp | measure | test

less

Transcript and Presenter's Notes

Title: FPP 26-27


1
Significance Tests
  • FPP 26-27

2
Significance tests
  • Question
  • Given the collected data, is there evidence
    against a specified hypothesis about the
    corresponding parameter?
  • In other words, are the data consistent or not
    with a specified hypothesis?

3
Logic of significance tests
  • Proof by contradiction
  • 1. assume some hypothesis is true
  • 2. find a statistic (a quantity that depends on
    data) that takes on extreme values when assumed
    hypothesis is false
  • 3. Calculate the value of this statistic in the
    collected data
  • 4. Calculate the probability of observing a value
    of the statistic as or more extreme than the
    observed value, under the assumed hypothesis
  • 5. when this probability is small, one of two
    things happened
  • A. the assumed hypothesis is correct and a rare
    event occurred
  • B . the assumed hypothesis is incorrect.
  • since rare events are by definition rare, we
    interpret small probabilities as evidence that
    the assumed hypothesis is false.
  • When the probability is not small, the data
    provide insufficient evidence to claim that the
    assumed hypothesis is false.

4
Significance test for a population percentage
  • Civil rights and the 1960s
  • In the court case Swain vs. Alabama (1965), the
    prosecution alleged there was discrimination
    against black people in grand jury selection. 
    Census data from the time indicates that 25 of
    people eligible for grand jury service were
    black.  A random sample of 1050 people called to
    appear for possible jury duty contained 177 black
    people.   Is there evidence of discrimination?
  • Reference Devore, J.  Probability and
    Statistics for Engineering and the Sciences. 
    Pacific Grove, CA Duxbury, 2000, p. 339

5
Step 1 Formulate hypothesis
  • Claim There is discrimination
  • The opposite of this claim is called the null
    hypothesis. It usually can be translated as
    there is nothing unusual going on.
  • The claim is called the alternative hypothesis.
    It usually can be translated as there is some
    unusual pattern in the data
  • H0 P 0.25 vs HA P lt 0.25

6
Step2 Find a relevant statistic
  • Values of the sample percentage of black jurors
    much smaller than 0.25 suggest the null
    hypothesis is not true
  • Sample proportion 177/1050 0.1689.
  • Is this much smaller than 0.25?
  • A good way to determine this is by converting the
    difference between 0.1689 and 0.25 to standard
    units

7
Step 3 Calculate z in data
  • We get
  • The sample percentage of black jurors is six SE
    away from zero

8
Step 4 Calculate the p-value
  • When n (the sample size) is large enough, we an
    use a standard normal curve to calculate the
    probability of seeing a value of z less (i.e.as
    or more extreme) the observed value of -6.06
  • To find the probability we need the distribution
    of z. Do we know it?

9
Conclusion in Swain case
  • Because the p-value is approximately 0, we reject
    the null hypothesis. It is very unlikely that we
    would observe a sample percentage of 16.89 or
    smaller if the true percentage was 0.25. The
    data suggest that black jurors were indeed
    selected less frequently than would have been
    expected. The data provide some evidence of
    discrimination.

10
Stating hypothesis
  • Null Hypothesis (H0)
  • The statement being tested in a test of
    significance is called the null hypothesis
  • Usually the null hypothesis
  • is a statement of no effect or no difference,
  • is a statement about a population,
  • is expressed in terms of a (some) parameter(s).
  • Example H0 ?0

11
Stating hypothesis
  • Alternative Hypothesis ( Ha )
  • name given to the statement we hope or suspect to
    be true instead of H0
  • Example Ha ??0
  • Hypotheses always refer to some population or
    model, not a particular outcome
  • We must decide whether the alternative hypothesis
    (Ha) should be one-sided or two-sided

12
Stating hypothesis
  • One-sided alternative hypotheses
  • Example Ha µlt 0. Ha µ gt 0
  • Two-sided alternative hypothesis
  • Example Ha µ? 0

13
Stating hypothesis
  • Choosing one-sided or two-sided Hypothesis
  • The alternative hypothesis should express the
    hopes or suspicions we had in mind when we
    decided to collect the data
  • It is cheating to first look at the data and then
    frame Ha to fit what the data show
  • If you do not have a specific direction in
    advance, use a two-sided alternative

14
Stating hypothesis
  • Example Your company hopes to reduce the mean
    time (?) required to process customer orders. At
    present, this mean is 3.8 days. You study the
    process and eliminate some unnecessary steps.
  • Q Did you succeed in decreasing the average
    process time?
  • Target to show that the mean is now less than
    3.8 days.
  • So alternative hypothesis is one-sided
  • The null hypothesis is no change value
  • Ho µ 3.8 vs Ha µlt 3.8

15
Stating hypothesis
  • The mean area of several thousand apartments in a
    new development is advertised to be 1250 sqft. A
    tenant group thinks that the apartments are
    smaller than advertised. They hire an engineer
    to measure a sample of apartments to test their
    suspicion.
  • H0 ?1250 vs. Ha ?lt1250

16
Stating hypothesis
  • Experimenters on learning in animals sometimes
    measure how long it takes a mouse to find its way
    through a maze. The mean time is 18 seconds for
    one particular maze. A researcher thinks that a
    loud noise will cause the mice to complete the
    maze slower. She measures how long each of 10
    mice takes with a noise as stimulus
  • H0 ?18 vs. Ha ?gt18

17
Stating hypothesis
  • Last year, your companys service technicians
    took an average of 2.6 hours to respond to
    trouble calls from business customers who
    purchased service contracts. Do this years data
    show a different average response time?
  • H0 ? 2.6 vs. Ha ? ? 2.6

18
Test Statistic
  • After correctly formulating the null and
    alternative hypothesis we make a comparison
    between the hypothesized value and the data by
    using a test statistic.
  • Many test statistics can be thought of as a
    standardized distance between a sample estimate
    of a parameter and the value of the parameter
    specified by the null hypothesis
  • Most test statistics have generic form
  • Test statistic for a proportion
  • Test statistic for a mean

19
P-values
  • A test of significance assesses the evidence
    against the null hypothesis and provides a
    numerical summary of this evidence in terms of a
    probability
  • The idea is that surprising outcomes are
    evidence against Ho
  • A surprising outcome is one that is far from what
    we would expect if Ho were true

20
P-values
  • A test of significance finds the probability of
    getting an outcome as extreme or more extreme
    than the actually observed outcome
  • The direction or directions that count as far
    from what we would expect are determined by the
    alternative hypothesis
  • Definition The probability, assuming that H0 is
    true, that the test statistic would take a value
    as extreme or more extreme than that actually
    observed is called the P-value of the test
  • the smaller the P-value, the stronger the
    evidence against H0 provided by the data

21
P-values
  • What does as or more extreme really mean?
  • When the alternative has a gt sign, as or more
    extreme means use area to the right of the test
    statistic in p-value calculation
  • When the alternative has a lt sign, as or more
    extreme means use area to the left of the test
    statistic in p-value calculation
  • When the alternative uses a? as or more extreme
    mean values of the test statistic far from zero
    in positive and negative directions.
  • For these type of alternative hypthoses, add
    areas to the left of -test statistic and to
    the right of test statistic

22
P-values
23
Interpretation of a p-value
  • Common misinterpretations of p-values
  • The p-value is not the probability that the null
    hypothesis is true. (the null is either true or
    not)
  • Also, (1-p-value) is not the probability that the
    alternative hypothesis is true. (the alternative
    is either true or not true)
  • Correct interpretation
  • The p-value is the probability of getting a value
    of a test statistic as or more extreme than the
    value of the statistic computed from the
    collected data, under the assumption that the
    null hypothesis is true

24
Enough evidence?
  • Below are some guidelines for judging p-values.
    (Dont treat these as golden standards)
  • p-value Evidence against
    H0
  • lt 0.01-ish very
    strong
  • gt .01-ish and lt.05-ish moderate
  • gt .05-ish and lt .10-ish weak
  • gt .10 ish
    practically none

25
Etruscan example
  • In the eighth century B.C., the Etruscan
    civilization was the most advanced in all of
    Italy. Its art forms and political innovations
    were destined to leave indelible marks on the
    entire Western world. Originally located in the
    region now known as Tuscany, it spread rapidly
    across the Apennines and eventually overran much
    of Italy. But as quickly as it came, it faded.
    Militarily it was no match for the burgeoning
    Roman legions, and by the dawn of Christianity it
    was all but gone.
  • No chronicles of the Etruscan empire have ever
    been found, and to this day its origin remains
    shrouded in mystery. Were the Etruscans native
    Italians or were they immigrants? And if they
    were immigrants, where did they come from? Much
    of our knowledge of the Etruscans derives from
    archaeological investigations and anthropometric
    studies (for example) body measurements to
    determine origins. (Source Larsen and Marx,
    Statistics, 2001, p. 513.)
  • A team of archaeologists collected 84 skulls of
    Etruscan men and measured their head breadth (in
    mm). Lets assume that these 84 men are a random
    sample of Etruscan men. If the Etruscan men were
    native, it makes sense to think that the
    population average head breadth of Etruscans is
    comparable to the head breadth of modern
    Italians, 132.44 mm. This assumes evolution has
    not shifted average head size substantially over
    the last 2800 years, an assumption that is
    reasonably close to true.

26
Exploratory data analysis for Etruscans
27
Significance test
  • Step1 Specify the null and alternative
    hypothesis
  • Claim true average breadth of Etruscan heads
    differs from 132.44
  • Hoµ 132.44 vs Ha µ? 132.44
  • Step2 compute a test statistic
  • The sample average is over 17 SEs away from the
    hypothesized average of 132.44
  • Step3 calculate the p-value
  • For all intents and purposes this p-value is zero
    why?
  • Step4 make a conclusion
  • There is enough evidence in the data to conclude
    that modern Italians and the Estruscans have
    different average head sizes.

28
A more wordy conclusion
  • Its practically impossible to observe a
    difference of 17 SEs by chance alone. Our
    initial assumption in the null hypothesis is very
    unlikely to be true. The data overwhelmingly
    suggest that modern Italians and the Etruscans
    have different average head sizes, indicating
    that Etruscans were not native to Italy.
  • For those interested, current theory is that
    Etruscans came from Asia. But, it remains a
    mystery how they got to Italy

29
Significance test using JMP
30
Example 1
  • A sample of 40 recovery alcoholics was given the
    State-Trait Inventory Test. The mean score of
    the 40 recovery alcoholics was 38 with a sample
    SD of 7. A psychologist suspected that
    recovering alcoholics in general had a higher
    mean score than the norm of 35. Do the sample
    justify the suspicion?

31
Example 2
  • There was concern among health officials in a
    community that an unusually large percentage of
    babies with abnormally low birth weight were
    being born. Abnormally low birth weight here is
    defined as less than 88 ounces. A sample of 180
    births showed 14 babies with abnormally low birth
    weight. The proportion births that the officials
    expect to be abnormally low is 5. Do the data
    support the health officials claims?

32
Statistical significance
  • To formalize testing further, some researchers
    advocate strict p-value cutoffs when deciding
    whether or not to reject null hypotheses.
  • Example reject the null hypothesis when the
    p-value is less than 0.05. Otherwise, do not
    reject it.

33
Statistical significance
  • These cut-offs are called significance levels.
  • They are typically labeled with the Greek letter
    a (alpha).
  • Example for a statistical significance level of
    0.05, we write
  • a 0.05
  • When the null hypothesis is rejected, the term
    used to describe the outcome of the test is
    statistically significant.
  • Made-up example with typical language
  • We go a p-value of 0.036 and used a 0.05. The
    results are statistically significant at the 0.05
    level.

34
My opinion about statistical significance
  • DO NOT RELY BLINDLY ON A FIXED CUT-OFF
  • Consider two p-values 0.050001 and 0.049999.
  • These two p-values provide the same amount of
    evidence against the null hypothesis.
  • But if we judge strictly by the 0.05 cut-off we
    dont reject the null for 0.050001 and we do for
    0.04999.
  • Ridiculous no? Consider p-values on their own
    merits

35
Type I and Type II errors
  • Possible errors from decision to reject or not to
    reject the null hypothesis
  • Type I error reject when Ho is true
  • Type II error fail to reject when Ha is true
  • Hypothesis testing is not perfect. You never
    know if you are making one these errors!
  • Important to replicate study whenever possible to
    reduce these errors

36
The role of sample size
  • The chance of a making a Type I error does not
    depend on sample size. (Sample sizes
    incorporated into test statistics).
  • The chance of making a Type II error decreases as
    sample size increases. (Be wary when using test
    based on small sample sizes)

37
The role of sample size
  • When the hypothesized value is NOT very different
    from the actual value of the parameter, you need
    a large sample size to reduce the chance of a
    Type II error.
  • In many grant proposals, you have to justify the
    study size by methods that attempt to minimize
    the chance of Type II errors.
  • These methods are called power analyses.

38
The role of sample size
  • Inferences are always improved by obtaining as
    much (accurate and relevant) data as possible.
  • With large enough sample size, you can reject any
    false null hypothesis
  • However,

39
Practical vs. statistical significance
  • When you get a statistically significant result,
    consider whether it is practically significant.
  • If your sample size is large enough youll be
    able to detect a difference between the
    hypothesised value of a parameter and its true
    value if Ho is wrong.
  • But is this difference of practical significance
  • Example of weight lifting study

40
Dangers of excessive fishing
  • With enough hypothesis tests, youll find
    something statistically significant.
  • Some of these statistically significant results
    may really be Type I errors.
  • Try to avoid excessive fishing for statistical
    significance. If you perform many tests, be sure
    to report how many you do. And, see if results
    are replicated in separate studies

41
Non-significant results
  • Failing to reject a null hypothesis is not a
    failed study
  • It is just as important to learn that a null
    hypothesis explains data well as it is to learn
    that it does not

42
Relationship between CI and hypothesis tests
  • You can use CIs like a hypothesis test
  • Example Say your null hypothesis is Ho p
    0.5.
  • If 95 CI does not contain null hypothesis vale,
    e.g. (0.64, 0.70), then the two sided test has
    p.value lt 0.05
  • If 95 CI contains the null hypothesis value,
    e.g. (0.47, 0.87), then the two-sided test has
    p-value gt 0.05

43
CIs vs Hypothesis tests
  • Hypothesis test can identify parameter values
    that are inconsistent with the data.
  • They do not specify parameter values that
    plausibly could have produced the data.
  • Confidence intervals do this. Hence, when given
    a choice use CIs over hypothesis tests.

44
Important caveat
  • A hypothesis test will not remedy a poorly
    designed study
  • Bad data yield unreliable p-values
Write a Comment
User Comments (0)
About PowerShow.com