Sample size calculations for cross-classified models - PowerPoint PPT Presentation

About This Presentation
Title:

Sample size calculations for cross-classified models

Description:

... statistical assumption about the distribution of the heights of Welshmen e.g. ... Let us assume that in reality our sample mean is 180cms and the population ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 31
Provided by: Bro78
Category:

less

Transcript and Presenter's Notes

Title: Sample size calculations for cross-classified models


1
Sample size calculations for cross-classified
models
  • William Browne, Mousa Golalizadeh and Richard
    Parker
  • University of Bristol

2
Contents
  • Sample size background
  • Brief description of MLPowSim
  • Fife dataset and model
  • Balanced data
  • Potential ways to factor unbalanced data into
    sample size calculations
  • Simple design effect formula for cross-classified
    model.

3
Background
  • Many quantitative social science research
    questions are of the form of a hypothesis A has
    a significant effect on B.
  • To answer such a question data is collected that
    allows the researcher to (hopefully) test whether
    statistically A has a significant effect on B.
    (In fact we aim to reject the hypothesis that A
    doesnt significantly affect B).
  • A test is performed and either the researcher is
    happy and A indeed has a significant effect on B
    or is left wondering why the data collected do
    not back up their hypothesis. Is the hypothesis
    false or was the data not sufficient?
  • The sufficiency of the data is the motivation for
    sample size calculations.

4
Example
  • Suppose I have the research question Are
    Welshmen on average taller than 175 cms?
  • I now need to get hold of a random sample of n
    Welshmen and measure each of their heights.
  • I make some statistical assumption about the
    distribution of the heights of Welshmen e.g. that
    they come from a Normal distribution.
  • I might like to check this assumption by plotting
    a histogram of the data.
  • I can then form a statistical hypothesis test and
    test whether indeed Welshmen are taller than
    175cms.
  • I need to decide how big to make n, my sample of
    Welshmen.

5
Hypothesis Testing
  • Let us assume our null hypothesis is that the
    average height of Welshmen (µ) is 175cm.
  • So we test H0µ175 vs HAµgt175 (or alternatively
    H0?0 vs HA?gt0 where ?µ-175)
  • In practice we calculate from our sample its mean
    ( ) and standard deviation (s2) and use these
    along with n to form a test statistic which we
    can compare with the distribution assumed under
    H0

6
Type I and Type II errors
  • No hypothesis test is perfect and there is always
    the possibility of errors
  • P(Type I error) a significance level or size
  • P(Type II error) ß, 1-ß is the power of the
    test.
  • In general we fix a to some value e.g. 0.05, 0.01
    then 1-ß depends on our sample size.

Truth Truth
H0 True H0 False
Decision Reject H0 Type I error Correct
Decision Accept H0 Correct Type II error
7
Example hypothesis test
  • Let us assume that in reality our sample mean is
    180cms and the population standard deviation (sd)
    is 5cms (known).
  • We can then form a test statistic as follows
  • Note here that for small n and unknown sd we
    should use a student-t distribution rather than
    Normal.
  • For a 1-sided Z test we wish Z gt 1.645 and so
    we need our sample to be of size 3 to reject H0,
    using a student-t distribution increases this to
    5. (Here a0.05)
  • However if the sample mean had been only 176cms
    then we would need n gt (1.6455)2 68 Welshmen
    to reject H0

8
Power calculations
  • Our last slide in some sense is backwards as we
    cannot get from a given sample mean to choosing a
    sample size!
  • What we do instead is use different terminology
    and play God!
  • We will choose an effect size, ? which will
    represent a guess at the increase in the sample
    mean for Welshmen.
  • There then exists an (approximate) formula that
    links four quantities, size (a), power (1-ß),
    effect size (?) and sample size (n)
  • Note that the standard error (SE) of ? is a
    function of n and s the population sd which is
    assumed known.
  • We can now evaluate one of these quantities
    conditional on the others e.g. what sample size
    is required given a,1-ß and ??

Here RHS is sum of cases H0 true and H0 false.
9
Welsh height example
  • Here we have looked at two examples with effect
    sizes 5 and 1 respectively. Assume s takes the
    value 5 and so let us suppose we take a sample of
    size 25 Welshmen.
  • Then
  • Case 1 5/(5/v25)1.645z1-ß,z1-ß3.355
  • ß0.9996
  • Case 2 1/(5/ v25)1.645z1-ß,z1-ß-0.645
  • ß0.25946
  • So here a sample of 25 Welshmen from a population
    with mean 180cms would almost always result in
    rejecting H0,
  • but if the population mean is 176cms then only
    26 of such samples would be rejected.
  • We can plot curves of how power increases with
    sample size as shown in the next slide.

10
Power curve for Welshmen example
  • Here we see the two power curves for the two
    scenarios

11
Extending the idea
  • The simple formula
    can
  • be used in many situations and hypothesis tests.
  • To generalise the idea we assume that ? is an
    effect size associated with a statistic that we
    wish to compare with a (null) hypothesized value
    of 0.
  • The complication occurs in finding a formula for
    the standard error for the statistic and relating
    this formula to the sample size, n.
  • We will next consider an alternative approach
    before returning to look at how both approaches
    can be extended to cross-classified models.

12
The use of simulation
  • In reality our (hoped for) research path will be
    as follows
  • Construct research question -gt Form null
    hypothesis that we believe false -gt Collect
    appropriate data -gt Reject hypothesis therefore
    proving our research question.
  • Assuming what we believe our research question is
    correct and hence the null hypothesis is false we
    can still be let down by not collecting enough
    data.
  • The idea behind using simulation is to simulate
    the data gathering process (assuming we know the
    right answer) many times and see how often we can
    reject the null hypothesis. The percentage of
    rejected null hypotheses (via simulation) will
    then estimate power.

13
Simulation in our example
  • Consider our Welsh height example case 2 where we
    believe Welshmen have a mean height of 176cms
    (and sd 5cms) and we are testing the hypothesis
    H0µ175cms, and we consider a sample size 25.
  • Then we generate N samples (e.g. 5000) of size
    25,
  • and for each sample
    form a lower bound for the confidence interval of
    the form
  • . This we compare with
    the value 175 and the proportion greater than 175
    is an estimate of the power of the test.
  • We can repeat this exercise for different sample
    sizes and form a power curve.

14
Power curve comparison
Note simulation curve is a good approximation of
the theoretical curve although there are some
minor (Monte Carlo) errors even with 5000
simulations per sample size.
15
Advantages/Disadvantages
  • Theoretical approach is quick when the formula
    can be derived.
  • Approximations for more complex situations exist
    which are equally quick.
  • Simulation approach generalizes to more
    situations but is much slower and we may need
    large numbers of simulations per scenario to get
    accurate power estimates.
  • Note that alternative, Standard error based
    method, typically needs less simulations per
    scenario for the same accuracy and works for
    normal responses.

16
MLPowSim software package
  • Software package recently completed.
  • A rather old fashioned text-based interface
    allows user to specify sample size scenarios.
  • Software then generates either MLwiN macro code
    or an R command file to run the simulations to
    calculate power for scenarios.
  • Normal, Binomial and Poisson response offered.
  • Software will cope with 1-level, 2-level
    (balanced and unbalanced), 3-level nested
    (balanced and unbalanced) and cross-classified
    (balanced and unbalanced) with 2 higher
    classifications models.
  • Many options for unbalanced designs.
  • Extensive user manual (150 pages with lots of
    examples)
  • See http//seis.bris.ac.uk/frwjb/esrc.html for
    details

17
Cross-classified example Fife dataset
  • Dataset taken from the MLwiN users guide.
  • Basic structure is 3,435 pupils from 19 secondary
    schools who also have primary school (of which
    there are 148) recorded.
  • We will use this as basis for sample size
    calculations and use a simple variance components
    model
  • Our response, Exam attainment at 16 is then
    modelled simply as a constant plus a secondary
    school effect plus a primary school effect plus a
    residual.
  • Our problem is how would one perform a power
    calculation for this or a similar scenario?

18
Fife data Balanced design?
  • Estimates from data
  • We will begin by trying a balanced design where
    we have p pupils in each combination of secondary
    school (SS) and primary school (PS) with ns
    secondary and np primary schools.
  • Clearly balanced data inappropriate as we will
    not in reality get balanced data
  • Here we try 3 pupils in each combination of ns
    and np with ns 10,20,30 and
    np20,40,60,80,100.
  • Note for 30 SS each PS must have at least 90
    pupils which is not really feasible!

19
Balanced design results
Results using MLPowSim and lmer in R. Note a
power of gt 0.8 is reached with 20 SS and 100 PS
or 30 SS and 80 PS (3 pupils per pairing) 6,000
and 7,200 pupils. Note reducing to 1 pupil per
pairing has little impact on power.
20
Methods to include imbalance in power
calculations
  • MLPowSim offers several options
  • Non-response of single observations.
  • Dropout of whole groups.
  • Sampling from a secondary school/primary school
    look up table.
  • Sampling from a pupil look up table.

21
Methods 1 2
  • Non response of individuals (with fixed
    probability) and dropout of some pairings of SS
    and PS are useful in other situations but not so
    much here.
  • Using these options in MLPowSim shows
  • 50 dropout of individuals reduces power but not
    greatly.
  • 50 dropout of pairings similarly reduces power
    but not greatly.
  • This is in line with the observation that
    reducing the of pupils per pairing as opposed
    to of SS or PS only has a small impact on
    power.
  • Basically neither of 1 and 2 removes whole SS or
    PS from the data which has a far greater impact
    on power

22
Method 3 fixed sample from secondary (or
primary) schools
  • Here the idea is to imagine a design where we
    have balance across SS i.e. our sampling strategy
    is to sample n pupils from each SS.
  • Then the PS identifier for each pupil is
    discovered at a later date and is not part of the
    sampling scheme (and is (in MLPowSim) in effect
    sampled from the distribution within that SS).
  • To run this method MLPowSim requires a file
    giving relative numbers of pupils for each PS/SS
    combination.
  • For our example we will use the actual numbers
    from the real data.
  • Essentially we mimic the scenario of balance
    within SS which is a plausible sampling scheme.
  • Note We can also do the alternative of balance
    across PS.

23
Method 3 results - SS
Here we see a gradual rise in power as we
increase the of pupils per SS as this in turn
increases of PS. It however takes a rather
large number of pupils per SS to ensure all PS
are in the simulation, and hence the number
required to reach a power of 0.8
24
Method 3 results - PS
Here we see a steep rise in power for small
samples in each primary school followed by a
fairly flat curve as adding more pupils doesnt
increase number of SS as all are captured with
only a small number of pupils per PS.
25
Method 4 fixed sample from whole population
  • Here we take method 3 one step further and assume
    we take a random sample of pupils from our
    overall sampling frame without stratifying by
    either SS or PS.
  • Here after each pupil is selected its SS and PS
    are then recorded. In our example we use the
    actual data as a sampling frame and so the
    probability of a pupil coming from each pairing
    is proportional to the number from that pairing
    in the dataset.
  • This should result in simulated datasets that are
    similar in form to the true dataset.

26
Model 4 Results
Here we see as with method 3 that power initially
increases at a fast rate but after a while each
dataset will contain most, if not all, of the SS
and PS and then the rate slows and it takes a
large number of pupils for the power to reach 0.8
27
Design effect formula (2 level model)
  • If we assume balance then with n pupils in each
    of N schools for a simple VC model (and only this
    simple model) the following formula holds
  • Design effect 1 (n-1)? where ? is the
    intra-class correlation.
  • So if we know the simple random sampling (SRS)
    sample size required for a given power we need to
    multiply this by the design effect.
  • For example if ?0.1 then for schools of size 10
    pupils we would need 190.11.9 times as many
    students (in total) to get the same power.
  • So if for example we found that SRS requires 300
    pupils then for schools of size 10 we require
    1.90300570 pupils or 57 schools.
  • This can be shown to fit the simulated results.

28
Proposed formula for Cross classified models
  • We here propose an extension for cross-classified
    models (VC only).
  • Design effect
  • Here we are assuming balance and all terms need
    defining

29
DE formulae for XC models
  • Formulae
  • appears to mimic behaviour noted in simulation
    methods, in particular in our examples, the
    number of pupils per school pairing has little
    impact on power.
  • As the two n terms involve numbers of clusters
    increasing the number of SS or PS will also
    increase the DE and so solving is more difficult
    than in the hierarchical case!
  • Of course there are multiple combinations of SS
    and PS that solve the problem!

30
Summary
  • We have discussed sample size calculations in
    general and shown results specific to
    cross-classified models
  • We welcome feedback from users of MLPOWSIM.
  • We offer methods (via simulation) for dealing
    with non-balanced data which may be more of an
    issue in cross-classified models.
  • We have tentatively proposed a simple formulae so
    that some of heavy computations for the
    simulation method can be removed in simple cases.
Write a Comment
User Comments (0)
About PowerShow.com