Study designs: Cross-sectional studies, ecologic studies (and confidence intervals) - PowerPoint PPT Presentation

About This Presentation
Title:

Study designs: Cross-sectional studies, ecologic studies (and confidence intervals)

Description:

Study designs: Cross-sectional studies, ecologic studies (and confidence intervals) Victor J. Schoenbach, PhD home page Department of Epidemiology – PowerPoint PPT presentation

Number of Views:318
Avg rating:3.0/5.0
Slides: 80
Provided by: Victor219
Category:

less

Transcript and Presenter's Notes

Title: Study designs: Cross-sectional studies, ecologic studies (and confidence intervals)


1
Study designs Cross-sectional studies, ecologic
studies (and confidence intervals)
Principles of Epidemiology for Public Health
(EPID600)
Victor J. Schoenbach, PhD home page Department of
EpidemiologyGillings School of Global Public
HealthUniversity of North Carolina at Chapel
Hill www.unc.edu/epid600/
2
Signs from around the world
  • In a Copenhagen airline ticket office
  • We take your bags and send them in all
    directions.

3
Signs from around the world
  • In a Norwegian cocktail lounge
  • Ladies are requested not to have children in the
    bar.

4
Signs from around the world
  • Rome laundry Ladies, leave your clothes here
    and spend the afternoon having a good time.

5
Faster keyboarding - 1
  • I cdnuolt blveiee taht I cluod aulaclty
    uesdnatnrd waht I was rdanieg. The phaonmneal
    pweor of the hmuan mnid, aoccdrnig to a
    rscheearch at Cmabrigde Uinervtisy. It dn'seot
    mttaer in waht oredr the ltteers in a wrod are,
    the olny iprmoatnt tihng is taht the frist and
    lsat ltteer be in the rghit pclae. The rset can
    be a taotl mses and you can sitll raed it wouthit
    a porbelm.
  • Gary C. Ramseyer's First Internet Gallery of
    Statistics Jokes http//davidmlane.com/hyperstat/h
    umorf.html (162)

6
Faster keyboarding - 2
  • Most of my friends could read this with
    understanding and rather quickly I might add.
    Then I had them read a statistical bit of
    literature
  • Miittluvraae asilyans sattes an idtenossiy
    ctuoonr epilsle is the itternoiecsno of a panle
    pleralal to the xl-yapne and the sruacfe of a
    btiiarave nmarol dbttiisruein.
  • Gary C. Ramseyer's First Internet Gallery of
    Statistics Jokes http//davidmlane.com/hyperstat/h
    umorf.html (162)

7
Study designs Cross-sectional studies, ecologic
studies (and confidence intervals)
Principles of Epidemiology for Public Health
(EPID600)
Victor J. Schoenbach, PhD home page Department of
EpidemiologyGillings School of Global Public
HealthUniversity of North Carolina at Chapel
Hill www.unc.edu/epid600/
8
Today outline
  • Cross-sectional studies (and sampling)
  • Ecologic studies
  • Confidence intervals

9
Cross-sectional studies
  • Cross-sectional studies include surveys
  • People are studied at a point in time, without
    follow-up.
  • Can combine a cross-sectional study with
    follow-up to create a cohort study.
  • Can conduct repeated cross-sectional studies to
    measure change in a population.

10
Cross-sectional studies
  • Number of uninsured Americans rises to 50.7
    million. (USA Today, 9/17/2010 data from Census
    Bureau)
  • In 2007-2008, almost one in five children older
    than 5 years was obese. (Health, United States,
    2010 data from the National Health and Nutrition
    Examination Survey)
  • 35 (7.4 million) of births to U.S. women during
    the preceding 5 years were mistimed or unwanted
    (2002 National Survey of Family Growth, Series
    23, No. 25, Table 21)
  • Source www.cdc.gov/nchs/

11
Cross-sectional studies
  • Incidence information is not available from a
    typical cross-sectional study
  • Sometimes can reconstruct incidence from
    historical information
  • Example the incidence proportion of quitting
    smoking, called the quit ratio
    ex-smokers / ever-smokersis calculated from
    survey data.

12
Measure prevalence at point in time
  • Snapshot of a population, a still life
  • Can measure attitudes, beliefs, behaviors,
    personal or family history, genetic factors,
    existing or past health conditions, or anything
    else that does not require follow-up to assess.
  • The source of most of what we know about the
    population

13
Population census
  • A cross-sectional study of an entire population
  • Provides the denominator data for many purposes
    (e.g., estimation of rates, assessing
    generalizability, projecting from smaller
    studies)
  • A huge effort people can be difficult to find
    and to count may not want to provide data
  • Some countries maintain accurate and current
    registries of the entire country

14
National surveys conducted by NCHS
  • National Health Interview Survey (NHIS)
    household interviews
  • National Health and Nutrition Examination Survey
    (NHANES) interviews and physical examinations
  • National Survey of Family Growth (NSFG)
    household interviews
  • National Health Care Survey (NHCS) medical
    records

15
National surveys
  • Designed to be representative of the entire
    country
  • Modes household interview, telephone, mail
  • Employ complex sampling designs to optimize
    efficiency (tradeoff between information and
    cost)
  • Logistically challenging (answering machines,
    cellphones, . . .)
  • See presentation by Dr. Anjani Chandra at
    www.minority.unc.edu/institute/2003/materials/slid
    es/Chandra-20030522.ppt

16
Example National Health Interview Survey
  • Conducted every year in U.S. by National Center
    for Health Statistics (CDC)
  • Stratified, multistaged, household survey that
    covers the civilian noninstitutionalized
    population of the United States
  • Redesigned every decade to use new census

17
multistaged
  • Improves logistical feasibility and reduces costs
    (though reduces precision)
  • 1. Divide population into primary sampling units
    (PSUs)PSU primary sampling unit
    metropolitan statistical area, county, group of
    adjacent counties

18
multistaged
  • 2. Select sample of census block groups (SSUs)
    within each selected PSU
  • 3. Map each selected census block group or
    examine building permits
  • 4. Select one cluster of 4-8 housing units
    dispersed evenly throughout the block
  • NCHS draws a new representative sample for each
    weeks interviews

19
stratified
  • US divided into 1,900 PSUs
  • Largest 52 PSUs are self-representing
  • Rest of PSUs divided into 73 categories
    (strata), based on socioeconomic and
    demographic variables
  • Sampling takes place separately within each
    category (stratum)

20
Sample size and Precision
21
Weighted sampling
22
stratified
  • Also place census blocks into categories and
    sample within each
  • Oversample some strata

23
Defined population
  • Studies, especially cross-sectional studies, are
    easiest to interpret when they are based in a
    population that has some existence apart from the
    study itself (defined population)
  • 1. Political subdivision (city, county, state)
  • 2. Institutional (HMO, employer, profession)
  • Probability sampling enables statistical
    generalizability to the defined population

24
Surveys of sentinel populations
  • HIV seroprevalence survey in three county STD
    clinics in central NC in 1988
  • 3,000 anonymous, unlinked, leftover sera
  • Anonymous questionnaire for demographics and risk
    factors
  • Schoenbach VJ, Landis SE, Weber DJ, Mittal
    M, Koch GG, Levine PH. HIV seroprevalence in
    sexually transmitted disease clients in a
    low-prevalence southern state. Ann Epidemiol
    19933281-288

25
HIV seroprevalence
  • Schoenbach VJ, Landis SE, Weber DJ, Mittal
    M, Koch GG, Levine PH. HIV seroprevalence in
    sexually transmitted disease clients in a
    low-prevalence southern state. Ann Epidemiol
    19933281-288

26
Seroprevalence ( HIV) by risk factors
  • Schoenbach VJ, Landis SE, Weber DJ, Mittal M,
    Koch GG, Levine PH. HIV seroprevalence in
    sexually transmitted disease clients in a
    low-prevalence southern state. Ann Epidemiol
    19933281-288

27
Interpretation
  • Measures prevalence if incidence is our real
    interest, prevalence is often not a good
    surrogate measure
  • Studies only survivors and stayers
  • May be difficult to determine whether a cause
    came before an effect (exception genetic
    factors)

28
Other points
  • Can choose by exposure or overall
  • Can choose by disease may not be
    distinguishable from a case-control study with
    prevalent cases

29
Outline
  • Cross-sectional studies (and sampling)
  • Ecologic studies
  • Confidence intervals

30
Ecologic studies
  • Most study designs cross-sectional,
    case-control, cohort, intervention trials can
    be carried out with individuals or with groups
  • Group-level studies which use routinely collected
    data are easier and less costly
  • Group-level studies that involve interventions
    may not be easier or less costly

31
Types of group-level variables
  • Summary of individual-level variable (e.g.,
    median household income, with high school
    diploma)
  • Property of the aggregate (e.g., neighborhood
    grocery stores, seat belt legislation, community
    competence)

32
Interpretation
  • Link between summary exposure variable and
    individual-level outcome must be inferred
  • Inference from group to individual is not always
    sound

33
Example Male Circumcision and HIV
(Slope indicates strength of relationship r
indicates linearity)
  • Source Bongaarts J, et al. The relationship
    between male circumcision and HIV infection in
    African populations. AIDS 1989 3(6) 373-7.

34
Outline
  • Cross-sectional studies (and sampling)
  • Ecologic studies
  • Confidence intervals

35
Confidence intervals
  • Provide a plausible range for the quantity being
    estimated
  • Width indicates the precision of an estimate for
    a given level of confidence
  • Confidence intervals quantify only random error
    from sampling variation, not systematic error
    from nonresponse, study design, etc.

36
Confidence level vs. precision
  • The more vague my estimate, the more confident I
    can be that it includes the population parameter
    I am 100 confident that the prevalence of HIV
    is between 0 and 100.
  • The more specific my estimate, the lower my
    confidence I am 0 confident that the
    prevalence of HIV is 5.23

37
Confidence intervals interpretation
  • Simple interpretations are typically not precise
  • Precise interpretations are typically not simple

38
Simple but imprecise
  • There is 95 confidence that the interval
    contains the true value True, but begs the
    question how to define confidence

39
Simple but imprecise
  • There is a 95 probability that the interval
    contains the true value Not quite correct
    probability (as conventionally defined) applies
    to a process, not to a single instance

40
Probability applies to a process example
  • A 95 confidence interval can be viewed as a
    measurement or estimation process that will be
    correct (the interval includes the true value of
    the parameter) 95 of the time and incorrect 5
    of the time.
  • Let us make up another estimation process that
    will be correct (about) 95 of the time.

41
Why probability applies to a process
  • Estimate your gender by flipping a coin 5 times -
    if the result is 5 heads estimate
    your gender to be its opposite otherwise
    estimate your gender to be what you think it is
    now.
  • Probability that estimate will be correct is(1
    Probability of 5 heads) 0.97 97
  • Probability that estimate will be incorrect is 3

42
Why probability applies to a process
  • So we now have a measurement process that will be
    correct 97 of the time. We will use it to
    measure your gender.
  • Flip the coin 5 times, and suppose you get 5
    heads
  • Is there a 97 probability that you are of the
    opposite sex?

43
Precise but not simple
  • A 95 confidence interval is
  • 1. obtained by using a procedure that will
    include the population parameter being estimated
    95 of the time
  • 2. the set of all population values which are
    likely to yield a sample like the one we
    obtained

44
Suppose that this line represents the value of
the parameter we are trying to estimate
True value
45
Possible estimates of that parameter in N
identical studies (shows sampling variation)
Study estimates
True value
46
One possible true value and how it would
manifest, on average, in N identical studies
True value
95 of the distribution
47
Estimate from one study of a given size
?
Estimate
48
A possible true value with lt 2.5 chance of
being observed at or beyond the estimate
?
Estimate
95 of the distribution
49
A possible true value with gt 2.5 probability of
being observed at or beyond the estimate
?
Estimate
95 of the distribution
50
A possible true value with gt 2.5 probability of
being observed at or beyond the estimate
?
Estimate
95 of the distribution
51
A possible true value with lt 2.5 probability of
being observed at or beyond the estimate
?
Estimate
95 of the distribution
52
What the confidence interval represents
?
95 confidence interval
53
What the confidence interval represents
95 confidence interval
54
One possible true value and how it would
manifest, on average, in N identical studies
True value
1.96 x s.e. 1.96 x s.e.
55
Confidence intervals another take
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
56
One possible population
O
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
57
Another possible population
O
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
58
A 3rd possible population
O
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
59
A 4th possible population
O
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
60
A 5th possible population
O
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
61
A 6th possible population
O
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
O
O
O
62
etc.
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
O
O
O
O
63
There are 1.6 x 1060 possible populations (no
cases all cases)
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
O
O
O
O
64
Suppose this is the population
(prevalence 15)
O
O
O
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
65
Take a sample (n10)
O
O
O
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
66
The sample
?
?
?
? ?
?
? ?
?

?
O
O
67
Make point estimate of prevalence
?
?
?
? ?
?
? ?
?

?
O
O
68
Interval estimate
  • What are all the possible populations that would
    be expected to yield this prevalence in a sample
    of size 10?

69
This one is not possible
O
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
70
Possible, but VERY UNLIKELY
O
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
O
71
Not quite 2.5 probability (2.1, in fact)
O
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
O
O
O
O
72
Yields just about 2.5 (3, actually) probability
of selecting 2 (or more) cases in 10
O
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
O
O
O
O
O
73
One possible true value and how it would
manifest, on average, in N identical studies
True value
95 of the distribution
74
Just above 2.5 (actually 2.6) probability of
selecting 2 (or fewer) cases in 10
O
O
O
O
O
O
O
O
O
O
O
O
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
75
Just below 2.5 (actually 2.4) probability of
selecting 2 (or fewer) cases in 10
O
O
O
O
O
O
O
O
O
O
O
O
O
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
76
Interval estimate for 2/10
  • Lower bound 2.5 (5 cases)
  • Upper bound 55 (110 cases)
  • Meaning Our sample of 10 with 2 cases provides
    evidence to exclude, at conventional error
    tolerance, populations with fewer than 5 cases or
    more than 110 cases. Populations with 5-110
    cannot be excluded as likely sources for this
    sample.

77
Interval estimate for 2/10
  • Actual population prevalence was 15, which in
    fact is between 2.5 and 55.
  • 2.5 to 55 is a very wide interval, i.e., a very
    imprecise estimate
  • To make it more precise, we need a larger sample

78
Signs from around the world Germany
  • A sign posted in Germany's Black Forest It is
    strictly forbidden on our black forest camping
    site that people of different sex, for instance,
    men and women, live together in one tent unless
    they are married with each other for that
    purpose.

79
Signs from around the world Finland
  • On the faucet in a Finnish washroom
  • To stop the drip, turn cock to right.
Write a Comment
User Comments (0)
About PowerShow.com