Psychology 203 - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Psychology 203

Description:

Statistical significance is not a reliable guide to the real size of an effect. ... If the effect size is small then the distributions will overlap to a much ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 31
Provided by: lindaj2
Category:
Tags: psychology | size

less

Transcript and Presenter's Notes

Title: Psychology 203


1
Psychology 203
  • Semester 1, 2007
  • Week 6
  • Lecture 11

2
Size Power
Effect
Statistical
  • They matter, trust me

Gravetter Wallnau Chapter 8
3
First, a few words on hypothesis testing
  • Psychological science progresses by taking
    calculated risks
  • Accept H1 and reject H0
  • Reject H1 and accept H0
  • But failing to reject H0 does not mean that H0 is
    true!
  • Absence of evidence is not evidence of absence
  • Concept of type 1 and type 2 errors

4
The Key Elements of Hypothesis Testing
  • Hypothesised population parameter The null
    hypothesis provides a specific value for the
    unknown population parameter e.g. a value for the
    population mean.
  • Sample Statistic A statistic that corresponds
    with 1. but is based on the sample data e.g. the
    sample mean.
  • Estimate of Error An estimate of the difference
    we can reasonably expect between any sample
    statistic (e.g. sample mean) and the population
    parameter (e.g. population mean), e.g. Standard
    Error of the Mean (SEM)
  • Test statistic e.g. Z-score test
  • Alpha level a criterion for interpreting the
    test statistic, aka the level of significance
    e.g. ?0.05

5
Criticisms of Hypothesis Testing
  • An all or none decision
  • e.g. z 1.90 is not significant, whereas z 2.0
    is!
  • Cmon! 1.90 is sooooooo close!
  • OK, but we often have to draw a line in the sand
    somewhere, when making decisions

6
More criticisms of Hypothesis Testing
  • We cant answer the question we want to
  • We cant say whether the null hypothesis is true
    or false
  • E.g. Assume Treatment X really always has an
    effect, however small.
  • If so, then H0 is always false
  • But we are unable to show that H0 is false (or to
    show that is true)
  • We can only show that H0 is very unlikely
  • The question we can answer is about our data, not
    H0, i.e how likely is the data we got, if H0 is
    true

7
Yet more criticisms of Hypothesis Testing
  • Significant does not necessarily mean
    significant!
  • The manufacturers of Herbalift, herbal
    supplement, insist that University Tests show
    that their product will give you significantly
    more energy in your daily life.
  • Doesnt this guarantee youll notice an increase
    in your energy levels?
  • Not necessarily! The effect can be tiny and still
    be significant!
  • Statistical significance does not mean an effect
    is big or important.
  • Why is this so?

8
Testing Herbalift
  • The average daily energy level of an UWA Uni
    student aged between 18 and 25 is rated as 60,
    with an sd of 10
  • A sample of 500 students try Herbalift for a
    month and their average daily energy levels are
    62
  • Is this 2 improvement significant?

9
z-test of Herbalift
herbalift works!
  • ?60,?10, M62, n500
  • Calculate Standard Error
  • Calculate z
  • For alpha .05, critical z1.96
  • 4.44 gt 1.96, therefore reject H0
  • Conclude our sample have significantly more
    energy than the untreated population

10
z-test of Herbalift, smaller sample
herbalift works!
  • ?60,?10, M62, n20
  • Calculate Standard Error
  • Calculate z
  • For alpha .05, critical z1.96
  • 0.89 lt 1.96, therefore fail to reject H0
  • We have no evidence the sample who took herbalift
    differ in energy levels from the untreated
    population

No supporting evidence
11
Significance Sample Size
  • The exact same sized effect (e.g. 2) can be
    significant or not, depending on the size of the
    sample used to test the hypothesis
  • Any effect, no matter how small and trivial can
    be significant, if you have a big enough sample
  • Statistical significance is not a reliable guide
    to the real size of an effect.
  • Its now recommended (APA) that researchers
    report a measure of effect size, along with any
    test of significance

12
Measuring Effect Size
  • Think back to Herbalift. We found a 2
    improvement (2 difference between population
    sample mean).
  • Isnt this the size of the effect?
  • Yeah, but remember its hard to interpret mean
    differences without taking into account the
    variation (sd) as well.
  • Jacob Cohen (1988) recommended standardizing
    effect size in much the same way z-scores
    standardize raw scores

13
Simplest Measure of Effect Size
The bigger the difference the bigger the effect
size
difference between means
Cohens d
standard deviation
The more variation (bigger sd) the smaller the
effect size
14
Cohens D is not affected by sample size
  • Calculate the effect size for Herbalift
  • Cohens d
  • Since both the n500 and the n20 samples had the
    same mean sd they also have the same effect
    size

15
Interpreting Effect Size
  • OK, but what is considered a big effect?

16
Cohens D Distribution Overlap
a)
?10
?60
?1100
?2115
?2115
d measures the separation between two
distributions
17
Statistical Power
  • Another, less obvious, way to determine the
    strength of an effect is to measure the power of
    the statistical test
  • Power is the probability that the test will
    correctly reject a false null hypothesis
  • In other words, the probability that the test
    will tell you theres an effect when there really
    is an effect
  • Usually calculate power before conducting study
    to try make sure you wont be wasting your time
    money i.e. you have a good chance of detecting
    an effect if it exists

18
Type 1 Type 2 ErrorsThe probability of being
wrong
  • Type 1 error (a) rejecting H0 when it is true
  • The criterion e.g. plt.05 sets the Type 1 Error
    rate, e.g. 5
  • So the probability of making the right decision
    is 1 - a e.g. 95
  • Type 2 error (ß) Failure to reject H0 when it
    is false
  • Determination of the Type 2 error rate is less
    straight forward
  • It depends on the number of subjects, the effect
    size and the a level
  • The probability of making the right decision
    correctly rejecting H0 is 1-ß
  • i.e. 1-ß The power of the test

19
Why power matters
  • Trialling a new cancer drug
  • The researchers were blind to the benefits of the
    new drug because their statistical test was not
    sufficiently sensitive to the effect (however
    small) the drug was having.
  • Power analysis helps you avoid this unfortunate
    scenario
  • In other words, if H0 is false we want to be able
    to conduct an experiment that has a good chance
    of leading us to this conclusion
  • We can quantify this chance

20
An underpowered experiment
  • The previous example was of an underpowered
    experiment
  • We may miss an experimental effect (commit a type
    2 error) as more sample means would be expected
    to fall below the a value
  • Underpowered experiments usually involve a small
    effect for the n
  • i.e. the expected effect size (mean difference or
    extent of covariation) was small
  • relative to the number of participants in the
    study

21
An overpowered experiment
  • If an experiment is over powered we run the risk
    of making the opposite mistake
  • i.e. deciding that there is an effect and H0
    should be rejected (type 1 error) when H0 is
    actually true
  • By over powered we usually mean that the sample
    size was so large that trivial differences were
    treated as being significant
  • Triviality is determined by the research context
    e.g. clinically significant effects

22
Statistical Power
Original Popn ? 80 ? 10
  • Probability of rejecting H0 is very high, almost
    100
  • So the test has excellent power
  • And we will almost certainly find an effect if it
    is there!

With 8-point effect ? 88 ? 10
If H0 is true No effect ? 80 ? 10
n25
Reject H0
Reject H0
80
88
84
76
z
-1.96
1.96
0
23
Factors Affecting Statistical Power - Effect Size
  • Large effects are easy to detect since the
    sampling distribution of the means will not
    overlap by so much
  • If the effect size is small then the
    distributions will overlap to a much greater
    degree
  • And the more the distribution of the sample means
    overlap, the greater the probability that the
    mean for the treatment group will be less than
    our alpha value (criterion) i.e. we will fail to
    find the effect
  • The bigger the effect size, the greater the power

24
Factors Affecting Statistical Power - Sample Size
  • Probability of rejecting H0 much less, around 50
  • So the test has much less power
  • And we might fail to find an effect that is
    really there
  • So the bigger the sample size the greater the
    power

Original Popn ? 80 ? 10
With 8-point effect ? 88 ? 10
If H0 is true No effect ? 80 ? 10
n 4
Reject H0
Reject H0
80
76
88
z
See Gravetter Text (7th ed, pp260-261)for how to
calculate exact power
-1.96
1.96
0
Smaller n means greater SEM
25
Other factors affecting Power
  • Effect size
  • Sample size
  • Alpha level
  • Reducing alpha (e.g., from plt.05 to plt.01)
    reduces the power
  • The more extreme criterion will move a further
    into the sampling distribution of the treatment
    means
  • So more overlap in sampling distributions
  • Number of tails
  • One rather than two tailed test will increase
    power
  • The critical value will move away from the
    sampling distribution of the treatment mean (less
    overlap)
  • See text and http//www.socialresearchmethods.net/
    kb/power.htm

26
Using Power to make sure your experiment isnt a
waste of time
  • Its a good idea not to embark upon an
    underpowered or overpowered experiment
  • How can you do this?
  • Usually by working out the sample size you need
    to ensure your experiment is sufficiently powered

27
Finding out the sample size you need1. The hard
way
  • Decide how crucial it is that you find an effect
    if its there
  • i.e. how much power you need e.g. 80
  • Decide your alpha level e.g. .05
  • Estimate your effect size e.g. d0.5
  • Look up a Delta (d) table
  • What the bleep is a delta table?

28
(No Transcript)
29
Use this formula to find out the sample size you
need
The formula changes, depending on the statistic
you intend to use.
This is the formula if your statistic is a one
sample t-test
1.
We need 32 participants
2.
3.
30
Finding out the sample size you need2. The easy
way
  • Try an online power calculator
  • Check out the various calculators listed at
    http//statpages.org/Power
  • Note also that its not always easy to estimate
    the effect size before you do an experiment
Write a Comment
User Comments (0)
About PowerShow.com