Title: Psychology 203
1Psychology 203
- Semester 1, 2007
- Week 6
- Lecture 11
2Size Power
Effect
Statistical
Gravetter Wallnau Chapter 8
3First, a few words on hypothesis testing
- Psychological science progresses by taking
calculated risks - Accept H1 and reject H0
- Reject H1 and accept H0
- But failing to reject H0 does not mean that H0 is
true! - Absence of evidence is not evidence of absence
- Concept of type 1 and type 2 errors
4The Key Elements of Hypothesis Testing
- Hypothesised population parameter The null
hypothesis provides a specific value for the
unknown population parameter e.g. a value for the
population mean. - Sample Statistic A statistic that corresponds
with 1. but is based on the sample data e.g. the
sample mean. - Estimate of Error An estimate of the difference
we can reasonably expect between any sample
statistic (e.g. sample mean) and the population
parameter (e.g. population mean), e.g. Standard
Error of the Mean (SEM) - Test statistic e.g. Z-score test
- Alpha level a criterion for interpreting the
test statistic, aka the level of significance
e.g. ?0.05
5Criticisms of Hypothesis Testing
- An all or none decision
- e.g. z 1.90 is not significant, whereas z 2.0
is! - Cmon! 1.90 is sooooooo close!
- OK, but we often have to draw a line in the sand
somewhere, when making decisions
6More criticisms of Hypothesis Testing
- We cant answer the question we want to
- We cant say whether the null hypothesis is true
or false - E.g. Assume Treatment X really always has an
effect, however small. - If so, then H0 is always false
- But we are unable to show that H0 is false (or to
show that is true) - We can only show that H0 is very unlikely
- The question we can answer is about our data, not
H0, i.e how likely is the data we got, if H0 is
true
7Yet more criticisms of Hypothesis Testing
- Significant does not necessarily mean
significant! - The manufacturers of Herbalift, herbal
supplement, insist that University Tests show
that their product will give you significantly
more energy in your daily life. - Doesnt this guarantee youll notice an increase
in your energy levels? - Not necessarily! The effect can be tiny and still
be significant! - Statistical significance does not mean an effect
is big or important. - Why is this so?
8Testing Herbalift
- The average daily energy level of an UWA Uni
student aged between 18 and 25 is rated as 60,
with an sd of 10 - A sample of 500 students try Herbalift for a
month and their average daily energy levels are
62 - Is this 2 improvement significant?
9z-test of Herbalift
herbalift works!
- ?60,?10, M62, n500
- Calculate Standard Error
- Calculate z
- For alpha .05, critical z1.96
- 4.44 gt 1.96, therefore reject H0
- Conclude our sample have significantly more
energy than the untreated population
10z-test of Herbalift, smaller sample
herbalift works!
- ?60,?10, M62, n20
- Calculate Standard Error
- Calculate z
- For alpha .05, critical z1.96
- 0.89 lt 1.96, therefore fail to reject H0
- We have no evidence the sample who took herbalift
differ in energy levels from the untreated
population
No supporting evidence
11Significance Sample Size
- The exact same sized effect (e.g. 2) can be
significant or not, depending on the size of the
sample used to test the hypothesis - Any effect, no matter how small and trivial can
be significant, if you have a big enough sample - Statistical significance is not a reliable guide
to the real size of an effect. - Its now recommended (APA) that researchers
report a measure of effect size, along with any
test of significance
12Measuring Effect Size
- Think back to Herbalift. We found a 2
improvement (2 difference between population
sample mean). - Isnt this the size of the effect?
- Yeah, but remember its hard to interpret mean
differences without taking into account the
variation (sd) as well. - Jacob Cohen (1988) recommended standardizing
effect size in much the same way z-scores
standardize raw scores
13Simplest Measure of Effect Size
The bigger the difference the bigger the effect
size
difference between means
Cohens d
standard deviation
The more variation (bigger sd) the smaller the
effect size
14Cohens D is not affected by sample size
- Calculate the effect size for Herbalift
- Cohens d
- Since both the n500 and the n20 samples had the
same mean sd they also have the same effect
size
15Interpreting Effect Size
- OK, but what is considered a big effect?
16Cohens D Distribution Overlap
a)
?10
?60
?1100
?2115
?2115
d measures the separation between two
distributions
17Statistical Power
- Another, less obvious, way to determine the
strength of an effect is to measure the power of
the statistical test - Power is the probability that the test will
correctly reject a false null hypothesis - In other words, the probability that the test
will tell you theres an effect when there really
is an effect - Usually calculate power before conducting study
to try make sure you wont be wasting your time
money i.e. you have a good chance of detecting
an effect if it exists
18Type 1 Type 2 ErrorsThe probability of being
wrong
- Type 1 error (a) rejecting H0 when it is true
- The criterion e.g. plt.05 sets the Type 1 Error
rate, e.g. 5 - So the probability of making the right decision
is 1 - a e.g. 95 - Type 2 error (ß) Failure to reject H0 when it
is false - Determination of the Type 2 error rate is less
straight forward - It depends on the number of subjects, the effect
size and the a level - The probability of making the right decision
correctly rejecting H0 is 1-ß - i.e. 1-ß The power of the test
19Why power matters
- Trialling a new cancer drug
- The researchers were blind to the benefits of the
new drug because their statistical test was not
sufficiently sensitive to the effect (however
small) the drug was having. - Power analysis helps you avoid this unfortunate
scenario - In other words, if H0 is false we want to be able
to conduct an experiment that has a good chance
of leading us to this conclusion - We can quantify this chance
20An underpowered experiment
- The previous example was of an underpowered
experiment - We may miss an experimental effect (commit a type
2 error) as more sample means would be expected
to fall below the a value - Underpowered experiments usually involve a small
effect for the n - i.e. the expected effect size (mean difference or
extent of covariation) was small - relative to the number of participants in the
study
21An overpowered experiment
- If an experiment is over powered we run the risk
of making the opposite mistake - i.e. deciding that there is an effect and H0
should be rejected (type 1 error) when H0 is
actually true - By over powered we usually mean that the sample
size was so large that trivial differences were
treated as being significant - Triviality is determined by the research context
e.g. clinically significant effects
22Statistical Power
Original Popn ? 80 ? 10
- Probability of rejecting H0 is very high, almost
100 - So the test has excellent power
- And we will almost certainly find an effect if it
is there!
With 8-point effect ? 88 ? 10
If H0 is true No effect ? 80 ? 10
n25
Reject H0
Reject H0
80
88
84
76
z
-1.96
1.96
0
23Factors Affecting Statistical Power - Effect Size
- Large effects are easy to detect since the
sampling distribution of the means will not
overlap by so much - If the effect size is small then the
distributions will overlap to a much greater
degree - And the more the distribution of the sample means
overlap, the greater the probability that the
mean for the treatment group will be less than
our alpha value (criterion) i.e. we will fail to
find the effect - The bigger the effect size, the greater the power
24Factors Affecting Statistical Power - Sample Size
- Probability of rejecting H0 much less, around 50
- So the test has much less power
- And we might fail to find an effect that is
really there - So the bigger the sample size the greater the
power
Original Popn ? 80 ? 10
With 8-point effect ? 88 ? 10
If H0 is true No effect ? 80 ? 10
n 4
Reject H0
Reject H0
80
76
88
z
See Gravetter Text (7th ed, pp260-261)for how to
calculate exact power
-1.96
1.96
0
Smaller n means greater SEM
25Other factors affecting Power
- Effect size
- Sample size
- Alpha level
- Reducing alpha (e.g., from plt.05 to plt.01)
reduces the power - The more extreme criterion will move a further
into the sampling distribution of the treatment
means - So more overlap in sampling distributions
- Number of tails
- One rather than two tailed test will increase
power - The critical value will move away from the
sampling distribution of the treatment mean (less
overlap) - See text and http//www.socialresearchmethods.net/
kb/power.htm
26Using Power to make sure your experiment isnt a
waste of time
- Its a good idea not to embark upon an
underpowered or overpowered experiment - How can you do this?
- Usually by working out the sample size you need
to ensure your experiment is sufficiently powered
27Finding out the sample size you need1. The hard
way
- Decide how crucial it is that you find an effect
if its there - i.e. how much power you need e.g. 80
- Decide your alpha level e.g. .05
- Estimate your effect size e.g. d0.5
- Look up a Delta (d) table
- What the bleep is a delta table?
28(No Transcript)
29Use this formula to find out the sample size you
need
The formula changes, depending on the statistic
you intend to use.
This is the formula if your statistic is a one
sample t-test
1.
We need 32 participants
2.
3.
30Finding out the sample size you need2. The easy
way
- Try an online power calculator
- Check out the various calculators listed at
http//statpages.org/Power - Note also that its not always easy to estimate
the effect size before you do an experiment