Title: Diapositiva 1
1Sampling and power analysis in the High
Resolution studies
Pamela Minicozzi
2High Resolution studies
collected detailed data from patients clinical
records, so that the influence of non-routinely
collected factors (tumour molecular
characteristics, diagnostic investigations,
treatment, relapse) on survival and differences
in standard care could be analysed
3Problem
- In each country, the population of incident
cases - for a particular cancer consists of N
subjects - N is large (so, rare cancers are not considered
here) - Since N is large, not all cases can be
investigated -
- use a representative sample to derive valid
conclusions - that are applicable to the entire original
population
Solution
4Two questions
- What kind of probability sampling should we use?
- What sample size should we use?
5Sampling
6Previous High Resolution studies
- Samples were representative of
- 1-year incidence
- a time interval (e.g. 6 months) within the study
period, provided that incidence was complete - an administratively defined area covered by
cancer registration
7Present High Resolution studies
We want to eliminate variations in types of
sampling between countries and within a single
country
This implies more sophisticated sampling
Main types of probability sampling
8Simple random sampling
- assign a unique number to each element of the
study population - determine the sample size
- randomly select the population elements using
- a table of random numbers
- a list of numbers generated randomly by a
computer
Advantage - auxiliary information on
subjects is not required Disadvantage - if
subgroups of the population are of particular
interest, they may
not be included in sufficient
numbers in the sample
9Stratified sampling
- identify stratification variable(s) and
determine the number of - strata to be used (e.g. day and month of
birth, year of diagnosis, cancer registry, etc.) - divide the population into strata and determine
the sample size of each - stratum
- randomly select the population elements in each
stratum
Advantage - a more representative sample
is obtained Disadvantage - requires
information on the proportion of the total
population belonging to
each stratum
10Systematic sampling
- determine the sample size (n) thus the sampling
interval i is n/N - randomly select a number r from 1 to i
- select all the other subjects in the following
positions - r, r i, r 2i, etc, until the sample is
exhausted
Advantage - eliminate the possibility of
autocorrelation Disadvantage - only the first
element is selected on a probability
basis ? pseudo-random sampling
11How many subjects do we need?
12The main elements
the probability that the difference will be
detected (e.g. 80, 90)
the probability that a positive finding is due to
chance alone (e.g. 1, 5)
they explored whether some variables can be
measured with sufficient precision (or
available) and checked the study vision
13Previous High Resolution studies
- Number of patients was defined based on
- observed differences in survival and risk of
death - incidence of the cancer under study
- difficulties in collecting clinical information
- available economic resources
Notwithstanding that ...
- we were able to identify statistically
significant relative excess risks of death - up to 1.60 among European countries
- up to 1.40 among Italian areas
- for breast cancer for which differences in
survival are small. - ? Applicable to other cancers for which survival
differences are larger
14Example for breast cancer (diagnosis 95-99)
Plot power as a function of hazard ratio for
a 5 two-sided log-rank test with 80 power
over sample sizes ranging from 100 and 1000
Assume 75 survival as reference (the overall
survival in Europe, range 65-90)
45
15Example for colorectal cancer (diagnosis 95-99)
Plot power as a function of hazard ratio for
a 5 two-sided log-rank test with 80 power
over sample sizes ranging from 100 and 1000
Assume 50 survival as reference (the overall
survival in Europe, range 30-70)
32
16Example for lung cancer (diagnosis 95-99)
Plot power as a function of hazard ratio for
a 5 two-sided log-rank test with 80 power
over sample sizes ranging from 100 and 1000
Assume 10 survival as reference (the overall
survival in Europe, range 5-20)
30
17Present High Resolution studies
We want to analyse both differences in survival
and adherence to standard care
Power analysis for both logistic regression
analysis (to analyse the odds of receiving one
type of care (typically standard care)) and
relative survival analysis (to analyse
differences in relative survival and relative
excess risks of death)
18Conclusions
- Taking into account
- existing samplings and power methodology
- experience from previous studies
- different coverage of Cancer Registries
- available economic resources
- We want to
- standardize the selection of data
- include a minimum number of cases that satisfies
statistical - considerations related to all aims of our
studies
Prof. JS Long1 (Regression Models for Categorical
and Limited Dependent,1997) suggests that sample
sizes of less than 100 cases should
be avoided and that 500
observations should be adequate for almost any
situation.
1Professor of Sociology and Statistics at Indiana
University
19Thank you for your attention
And What about your experience?