Sample Sizes for IE - PowerPoint PPT Presentation

1 / 16

About This Presentation

Title:

Sample Sizes for IE

Description:

Sample Sizes for IE Power Calculations Overview General question: How large does the sample need to be to credibly detect a given effect size? What does Credibly ... – PowerPoint PPT presentation

Number of Views:63

Avg rating:3.0/5.0

Slides: 17

Provided by: Lori2180

Learn more at: http://cega.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: Sample Sizes for IE

1
Sample Sizes for IE

Power Calculations

2
Overview

General question How large does the sample need
to be to credibly detect a given effect size?
What does Credibly mean here?
We can be reasonably sure that the difference
between the treatment group and the comparison
group is due to the program
Randomization removes bias, but it does not
remove noise. To reduce noise, we need a large
sample size. But how large is large?

3
Measuring Impact

At the end of an experiment, we will compare the
outcome of interest in the treatment and the
comparison groups.
We are interested in the difference
Mean in treatment - Mean in control Effect
size
For example mean of the malaria prevalence in
villages with ITN distribution vs. mean of
malaria prevalence in villages with no ITNs
To make conclusions based on that effect size, we
need it to be calculated with precision- since
there is always variability in data
If there are other many unobserved factors
affecting outcomes, it is harder to say whether
the treatment had an effect

4
Precise outcomes
5
Some noise
6
Very noisy
7
Confidence Intervals

We only work with data which is a sample of the
population. In order to assess whether this is
valid for the entire population, we need a
measure of reliability
A 95 confidence interval for an effect size
tells us that, for 95 of any samples that we
could have drawn from the same population, the
estimated effect would have fallen into this
interval.
The Standard error (se) of the estimate in the
sample captures both the size of the sample and
the variability of the outcome
it is larger with a small sample and with a
variable outcome

8
Two Types of Errors

First type of error Conclude that there is an
effect, when in fact there are no effect.
The level of your test is the probability that
you will falsely conclude that the program has
an effect, when in fact it does not.
So with a level of 5, you can be 95 confident
in the validity of your conclusion that the
program had an effect.
To be confident, a 5, 10, 1
Rule of thumb is that if the effect size is more
than twice the standard error, you can conclude
with more than 95 certainty that the program had
an effect

9
Two Types of Errors

Second type of error you fail to reject that the
program had no effect, when it fact it does have
an effect.
The Power of a test is the probability of finding
a significant effect in the RCT
Only with a significant effect can you cleanly
influence policy
Power Calculations are a tool to see how likely
we are to find a significant effect for a given
sample size

10
What you Need for a Power Calculation
Significance level -This is often conventionally set at 5. - Lower levels (less likely to reject a false positive), we need more sample size to detect the effect
Power Level -A power level of 80 says 80 of the time, if there is a true effect you will be able to detect it in a given sample -Larger sample More Power
The mean and the variability of the outcome in the comparison group -From previous surveys conducted in similar settings -The larger the variability is, the larger the sample needed for a given power
The effect size that we want to detect -What is the smallest effect that should prompt a policy response? - The smaller the expected effect size the larger sample size needed
11
How to Determine Effect Size

What is the smallest effect that should justify
the program to be adopted (in terms of
cost-benefit)?
Sets minimum effect size we would want to be able
to test for
Common danger use an effect size that is too
optimistic too small of sample size
How large an effect you can detect with a given
sample depends on how variable the outcomes is.
Example If all children have very similar
diarrhea prevalence without a program, a very
small impact will be easy to detect
The Standardized effect size is the effect size
divided by the standard deviation of the outcome
Common effect sizes are .20 (small) .40
(medium) .50 (large)

12
Design Factors to Take into Account

Availability of a Baseline
A baseline can help reduce needed sample size
since
Removes some variability in data, increasing
precision
Can been use it to stratify and create subgroups
The level of randomization
Whenever treatment occurs at a group level, this
reduces power relative to randomization at
individual level

13
Cluster (Group) Randomization
Rural Water Project Water Guard Individual
Rural Water Project Spring Improvement Village
Community-based Monitoring in Uganda Village
HIV/AIDS Education School-level
14
Implications from Group Design

The outcomes for all the individuals within a
unit may be correlated
All villagers affected by spring improvements at
same time
All students at school with trained teachers may
have benefited from information
The sample size needs to be adjusted for this
correlation
The more correlation within the group, the more
we need to adjust the standard errors

15
Implications

It is extremely important to randomize an
adequate number of groups.
Typically the number of individual within groups
matter less than the number of groups
Big increases in power usually only happens when
the number of groups that are randomized increase
If you randomize at the level of the district,
with one treated district and one control
district, you have 2 observations!

16
Conclusions

Power calculations involve some guess work
Some time we do not have the right information to
conduct it very properly
However, it is important to do them to
Avoid launching studies that will have no power
at all waste of time and money
Determine the appropriate resources to the
studies that you decide to conduct (and not too
much)
If you have a fixed budget, can determine whether
the project is feasible at all
Software http//sitemaker.umich.edu/group-based/o
ptimal_design_software