Title: Case-Control Studies
1Case-Control Studies
- Pradeep Deshmukh
- Professor
- Dr Sushila Nayar School of Public Health
- MGIMS, Sewagram
2(No Transcript)
3Definition.
- The case-control study is an analytic
epidemiologic research design in which the study
population consists of groups who either have
(cases) or do not have a particular health
problem or outcome (controls). - The investigator looks back in time to measure
exposure of the study subjects. The exposure is
then compared among cases and controls to
determine if the exposure could account for the
health condition of the cases.
4SYNONYMS
- Case-Referent
- Case-Compeer
- Retrospective
5Characteristics
- Observational / Non-experimental
- Occasionally Exploratory
- Explanatory (Analytical)
- Retrospective
- Effect to Cause
- Both Exposure Disease have already occurred
- Uses Comparison Group
6Why case-control design for study of rare
diseases?
- Consider some rare disease say some cancer
(leukemia) - Crude Annual Incidence 3.4/100000 (lt 15 years)
- Cohort Study A year of observation on a million
children to identify 34 cases - Sample of 34 cases Sub-divided in 2 or more
exposure categories - What about conducting case-control design?
7Case-Control Studies for Diseases having long
induction period
- Advantageous Long induction period between the
exposure and clinical onset of disease - Cohort Study Waiting years for accrual of cases
- Case-Control Study Compress time
- Case-Control Studies Chronic Diseases (Cancer /
Cardiovascular Diseases)
8Case-control studies in hierarchy of designs
- RCT Methodological Standard of Excellence
- However,
- Ca-Co - Not only SIMPLE to perform but some times
the ONLY approach to solve a problem. - Philosophically no design is Gold Standard.
- Understand strengths and weaknesses .
- Select appropriate study design to address your
RQ...
9Progression of study design Clinical research
- Isolated Case Reports
- Case Series
- Cross-Sectional study
- Case-Control Study
- Cohort Study
- Randomized Clinical Trial
- Meta-Analysis
10PROGRESSION OF STUDY DESIGN COMMUNITY RESEARCH
- Ecological Study
- Cross-Sectional Study
- Case-Control Study
- Cohort Study
- Randomized Community Trial
- Meta-Analysis
- EXAMPLE
- Lipid - Atherosclerosis Association
11Lipid - Atherosclerosis Association
- Analysis of Death Rates from CAD according to per
capita fat consumption in 20 countries ?
Hypothesis of L-A association. - CS Studies Framingham and Evans County Heart
Studies (Dawber et al 1971, Cassel 1971) - Case-Control Studies confirmed Association.
- Cohort Studies (Truett et al 1967, Tyroler et al
1971) - Community Based Controlled Trials of Lipid
Reduction (Lipid Research Clinics Program)
12Causation/causal association
- Criteria to be fulfilled
- Temporal association
- Strength of association effect of cessation
- Specificity of association
- Consistency of association
- Biological plausibility
- Coherence of association
13The setting of case-control research
- Clinical
- Mechanisms of Disease Causation
- Community
- Population Health Impact of Exposure
14Decision to conduct case-control research
- The characteristics of the exposure and disease
- The current state of knowledge Relationship
- The immediate goals of the study
- The research setting
- The resources available
15Research questions
- Is OC use associated with MI in women?
- Is current IUD use associated with PID?
- Is OC use associated with the risk of breast
cancer? - Is age at first coitus associated with cervical
cancer? - Is legal abortion associated with placenta previa
in a later pregnancy?
16WHO ARE CASES?
- With a Specific Outcome
- Presence of Disease / Syndrome
- Complications / progression of Disease (Severe
dehydration crisis) - Death (Neonatal mortality)
- Serum cholesterol / Birth weight
- Delayed Immunization
- Early Initiation of Cigarette Smoking
- Adverse Reactions of Drugs / Vaccines (SIDS)
- Behavior (Juvenile Delinquency)
- Drug Resistance (MDR-TB)
- Couple as a case (Infertility)
17Selection of cases (Definition)
- Diagnostic Criteria
- Risk of Disease Misclassification
- Continuous / Discrete Outcome Variable
- Relatively simple straightforward Children
with cleft palates (physical examination) - Sometimes difficult Hypertension
- Diagnosis Combination of methods
- Rationale / Logical
- Criteria Specific
- Operational versus Rigid
- Standard Definition (WHO, CDC, etc)
- Reference (growth references NCHS, CDC, New WHO)
18Selection of cases (definition)
- Eligibility Criteria
- Inclusion/Exclusion criteria
- Ca-Co studies should be limited to incident cases
(Sackett 1979) - Exposures are presumably more recent and
therefore more reliably recalled. - Relatively homogeneous group
- Exclusion of prevalent cases Minimize the
Selection Bias (Neyman Fallacy). - Ex PID and IUD Use
- Women who are not sexually active or who have had
a tubal ligation are not likely to have recently
used any contraceptive method including IUDs
19Case definition
- Conceptual definition
- Obesity defined as body fat percentage gt 33
- Operational definition
- Body Mass Index gt 30
20Case definition Issues
- Case definition should avoid misclassification
- For example Sinha et al (2008)
- Anemia was defined as Hemoglobin lt 110 gm/L as
measured by WHO Colour Scale - WHO Colour Scale over-estimates the hemoglobin
- Misclassified cases with mild anemia
- Also, studying mild forms of cases, gives larger
case group but misclassifies cases as non-cases
OR non-cases as cases as early diagnosis is
generally imprecise
21Case definition Issues
- A severe case definition may exclude people who
have been cured or who died of disease before the
condition was severe enough to be labelled as
case - Standard/consensus definitions if available, must
be used - For example,
- Rheumatoid arthritis Rome criteria, NY
criteria, 1987 ARC criteria - Metabolic Syndrome ATP III, IDRF, and so on
- Lack of agreement over definition may introduce
variability in estimates of effect
22Case definition Issues
- The issues of severity, diagnostic criteria and
subjectivity of criteria all lead to potential
problems of misclassification of cases - The researcher can choose between more
restrictive and inclusive definitions - Think in terms of sensitivity and specificity of
definition and its effect on validity, sample
size, precision and power - Brenner and Savitz (1990) reported that
- Restrictive definition (less sensitive) leads to
lack of precision and power by reducing sample
size - Broad criteria (less specificity) produce
misclassification leading to biased measure of
effect - So, weigh validity - specificity over sensitivity
(Restrictive definition over inclusive definition)
23Sources of cases (Research Setting)
- Hospitals (Multi-Centric Studies)
- Community
- Industrial Population
24Identification of cases Issues
- The goal is to
- Ensure that all true cases have an equal
probability of entering the study and that no
false cases enter - Example Conceptual definition of HIV
- Factors affecting decision to test/access the
test and Sn Sp of test will decide who
eventually becomes a case under operational
definition - Selection bias ??
25Biases
- Selection bias
- Unequal chance of getting into study
- Berksons bias
- Variable rate of hospitalization affecting case
selection - Neyman fallacy
- Incident case Vs prevalent case
- Detection bias
- Due to closer medical attention, detection of
endometrial cancer was more in a group using
estrogen
26Selection of Controls
- The controls should come from the population at
risk of the disease - Men can not be controls for a gynecological
condition - The controls should be eligible for the
exposure - The controls should have same exposure rate as
that of the population from where the cases are
drawn
27Wachlders four principles for selection of
controls
- The study base
- Source of case and the control should be the same
- Deconfounding
- Comparable accuracy
- Similar misclassification errors in cases
controls - Same potential of recall bias in cases control
- Efficiency
28Types of Controls
- Hospital or clinic control
- Dead control
- Controls with similar diseases
- Peer or case-nominated (friend/neighbor) control
- Population controls
29Hospital controls
- Readily available hence commonly used
- Main reasons to use hospital controls are
- To select controls whose referral pattern is
similar to cases - To obtain similar quality of examination
- For convenience
- May not be representative of the population
30Dead controls
- Might use dead controls for dead cases
- In some situations, this might lead to use of
surrogate informant - The problem is the dead control is not
representative of the living population - McLaughlin compared dead controls with living
controls and noticed that the dead controls
smoked more cigarettes and consumed more alcohol
than living controls - Appropriateness depends on the exposure being
studied
31Controls with similar diseases
- Reasons
- To minimize the recall bias
- To minimize the interviewer bias
- To examine the specificity of an exposure for a
particular type of cancer - For practical but unspecified reasons
- Problem ??
32Peer or case-nominated (friend/neighbor) control
- Neighborhood controls is used in two ways
- To refer to community or population controls
- To refer to controls selected from finite number
of close neighbors - Search starts from house of the case and
door-to-door search conducted for eligible
controls in a standardized pattern - Friend or neighbor control is a surrogate for
matching on age, SES, education, etc - A quick way to find control
- Bias is introduced if determinants of friendship
are associated with disease or exposure - Friends share many risk behaviors
33Population controls
- Randomly drawn from population
- Truly representative of population
- Ideal way of selecting controls
- Practically, very difficult to carry out
- Study base ???
34Where to select controls from?
- Way the pros and cons
- Analyze the situation for bias being introduced
- If possible,
- select different sources of controls and compare
with each other - Compare the inferences drawn
35Ratio of control to cases
- Statistical consideration
- When the number of subjects available in one
group (cases) is limited, an increase in the
other group increases the study power - Gain in power is till the ratio of 41
- Thereafter, the gain is not substantial but cost
increases - When the study of power with equal allocation is
as high as 0.9 or as low as 0.1, additional fails
to increase the power
36Ratio of control to cases
- Validity of inferences
- Even when there is no statistical need, more than
one control may be recruited per case - Enrolling two or more types of controls is a way
of checking for biases introduced by choice of
control group - If the measure of effect is similar when
comparing cases with each control group - Probably no biases (no surety)
- If different measure of effect, then the bias is
there and the researcher can understand it
37MATCHING
- Purpose To adjust - effects of relevant
confounders - Matching in Design - Accounted in Analysis
- Misconception The goal is to make the case and
control groups similar in all respects, except
for disease status. - An Optimal Matching Scheme involves only those
variables which improve statistical efficiency or
eliminate bias from the effect of interest.
38MATCHING
- Which variables are appropriate for matching?
- Risk factors from prior work may be identified
for matching - Matching by interviewer or hospital may be used
to balance out the effects of interviewer and
observer errors - It is best to limit matching to basic descriptors
(age, race, sex, etc) - Non-modifiable risk factors
- Use few matching factors
39MATCHING
- Overzealous matching may have adverse effects
- Matching on a strong correlate of the exposure,
which is not an independent risk factor for the
outcome (overmatching) may lead to an
underestimate of OR. - Matching may lead to a false sense of security
that a particular variable is adequately
controlled.
40Sample size
41Measurement of Exposure
- Questionnaires
- Records
- Conversion tables/algorithms
42Measurement of exposure
- Questionnaire
- Question comprehension
- Information retrieval
- Response formulation and recording
- Quality of exposure reports may be influenced by
- Type of respondent
- Administration of questionnaire
- Salience of exposure
- Way in which information is retrieved
- Ways in which responses are formulated and
recorded
43Measurement of exposure
- Records
- Abstraction of data from record
- Quality control measures are important
- Careful design and testing of abstraction form
- Training and supervision of abstractors
- Priori definition of terms
- Specifications of rules for handling conflicting
or missing data
44Measurement of exposure
- Conversion tables/algorithm
- To obtain more specific exposure measure from
questionnaire or record - More in use now-a-days for dietary and
occupational variables
45Group work
- Three groups
- Design a case-control study
46Analysis
47Associations
- Use of tests of significance
- Estimation of Odds ratio and its confidence
interval - Attributable risk estimation
48Tests of significance
- Unmatched study
- Matched study
49Binary exposure without covariates
Exposure to fumes Headache present Headache absent Total
Factor present a10 b90 ab 100
Factor absent c50 d850 cd 900
Total ac60 bd940 n1000
- OR ad/bc
- SE(OR) eSqrt (1/a1/b1/c1/d)
- CI OR exp ( Z 1-a/2eSqrt (1/a1/b1/c1/d))
50Binary exposure and categorical covariate
- Stratified analysis
- Calculate OR for each strata
- Mantel-Haenszel summary odds ratio
- å aidi/ni
- -------------------
- å bici/ni
- Logistic regression
51Binary exposure with continuous covariate
- Consider height as covariate in a study where
exposure is diabetes and outcome is MI - Use of stratified analysis
- Huge number of strata
- Many of these strata will have zero frequency
- Solutions
- Form limited number of categories loss of some
information - Logistic regression misclassifying the
functional form of relationship between covariate
and outcome
52Continuous exposure without covariate
- Divide the exposure variable in small number of
categories - For example, quintiles
- Logistic regression
- Good for assessing dose-response
53Continuous exposure with covariates
- Categorize the exposure
- MH estimator
- Regression technique
54Matched data
Exposure to fumes Headache present Headache absent Total
Factor present A B AB
Factor absent C D CD
Total AC BD ABCD
- A,B,C,D are number of pairs
- OR B/C
- SE (OR) e Sqrt (1/B1/C)
- CI OR exp ( Z 1-a/2Sqrt (1/B1/C))
- Association by McNemars c2 test (B-C)2/(BC)
- Regression Conditional regression
55INTERPRETATION
OR1, ORlt1, ORgt1
OR Range Interpretation 0.0 - 0.3 Strong
Benefit 0.4 - 0.5 Moderate Benefit 0.6 -
0.8 Weak Benefit 0.9 - 1.1 No Effect 1.2 -
1.6 Weak Hazard 1.7 - 2.5 Moderate Hazard gt
2.6 Strong Hazard
56Identification of confounding variables
- Statistically testing for association of
potential confounder with disease and with
exposure - If crude OR differs from adjusted OR by a
specified percentage points (15 or less), then
the variable is regarded as confounder
57Attributable risk estimation
- Also known as
- Etiologic fraction
- Excess fraction
- Population attributable risk percent
- Provides an estimate of proportion of cases that
are related to a given exposure - It is fraction of disease in population that
might be avoided by eliminating the exposure to
an etiologic agent - It takes number of exposed individuals in the
population into account - Provides important information for public health
action
58ARP
- ARP (OR-1)/OR
- The proportion of total disease risk in exposed
persons which may be attributed to their
exposure. - ORgt 1, ARP Range 0 - 1.
- OR 1, ARP 0
- If the OR is very large, much of the total
disease risk in exposed persons may result from
that exposure.
59PARP
- Cole and MacMahon (1971)
- PARP p0(OR-1) / 1p0(OR-1)
- Taylor (1977)
- PARP 1-b(cd) /d(ab)
- Corresponds to the proportion of disease risk in
all persons which may be attributed to the
exposure under investigation.
60EXAMPLE
- Intrauterine Irradiation Childhood Lukaemia
- OR 1.48
- ARP (1.48 -1) / 1.48 0.32
- One third of lukaemia in the irradiated children
may be attributed in part to prenatal
irradiation. - PARP1-70 (45155) / 155(3070)1-0.90 0.10
- 10 of all childhood lukaemia may be attributed
in part to intrauterine irradiation.
61Example.
HT NHT Total
HF diet 190 110 300
LF diet 90 220 310
280 330 610
4.22
3.46 5.18
62Stratified analysis
HT NHT Total
Male HF diet 130 90
LF diet 70 120
410
Female HF diet 60 20
LF diet 20 100
200
2.48
2.03 3.04
15.00
12.28 18.39
3.92
63Exercise on matched dataExercise using epi_info
64Bias
- Selection
- Misclassification
- Differential
- Non-differential
- Confounding
65SELECTION BIAS
- Berksons Paradox
- Neyman Fallacy
- Selective Referral
- Detection Bias
- Non-Response
- Length of Hospital Stay Bias
- Survival Bias
66BERKSONS PARADOX
Berkson (1946) Hospital samples may
systematically differ from general populations
because of factors which influence the likelihood
of hospitalization. Hospital samples may
exhibit spurious associations between two
variables, even though these variables are
independently distributed in the general
population.
67BERKSONS PARADOX
- Roberts Co-workers(1978)
- Respiratory and bone diseases
- OR (General Population) 1.06
- OR (Hospital Sample) 4.06
- Distorted medication-diseases associations
- Laxative use and arthritic disease
- OR (General Population) 1.48
- OR (Hospital Sample) 5.00
68BERKSONS PARADOX
- Walter(1980) Minimization
- The exposure is not a direct cause of
hospitalization. - The case and control populations are mutually
exclusive.
69NEYMAN FALLACY
Neyman (1955) Prevalent Cases Distorted E-D
Associations If exposure is related to Disease
Prognosis. Ex Sex Colorectal Cancer IRM gtIRF
(Devesa Silverman 1978) SRF Longer (Koch et al
1982) Sample of Prevalent Cases Prop. Of F
Minimization Incident cases (Schesselman 1982)
70SELECTIVE REFERRAL
- If cases within the population are differentially
reported to study hospital. - Tertiary Care Hospital Complicated/Severe D
Which may differ etiologically from other cases.
NON-RESPONSE
- If enrolled subjects systematically differ from
non-participants. - Evaluate Comparability of participants
non-participants.
71DETECTION BIAS
- If exposure influences the likelihood of clinical
recognition of the disease. - Ex Exogenous Oestrogen- Endometrial Cancer
- E-DA may be partially attributed to preferential
detection of D in exposed women. - E -gt Dysfunctional Bleeding -gt Intra-abdominal
Diagnostic Examination -gt Diagnosis of an
Asymptomatic D. - CaCo Studies should attempt to evaluate the
extent to which exposure brings an otherwise
asymptomatic cases to clinical detection.
72LENGTH OF HOSPITAL STAY BIAS
- If cases are selected from a registry of current
hospital patients, then cases who have been
hospitalized for the longest period of time have
a higher probability of being selected than cases
admitted for minor conditions or cases who died - These cases may have other diseases and
conditions that may be related to the disease or
exposure under study
73SURVIVAL BIAS
- If only the survivors of the outcome are selected
as cases and if survival is related to the
exposure of interest
74MISCLASSIFICATION BIAS
Non-Differential Misclassification The errors
in classification of one variable (E) do not
depend on the level of the other variable (D).
NDM Errors OR Differential
Misclassification The errors in classification
of one variable (E) depend on the level of the
other variable (D). DM Errors OR
75NDM ERRORS
- Exposure Specification
- If E is not accurately assessed.
- Unacceptability Bias
- If E is a behaviour or characteristic which
subjects are inclined to under-report.
76DM ERRORS
- Recall (Anamnestic Bias)
- Protopathic Bias
- Interview Bias
77CONFOUNDING BIAS
- If study effect is mixed with another effect.
- Confounder
- Extraneous to E-D association
- Predictive of D
- Unequally distributed between E groups.
- Ex Alcohol Oesophageal Cancer
- Confounder Smoking
- PI - CL - MG Example
- Control Design / Analysis
78ADVANTAGES
- Easy to carry out
- Rapid (less time consuming)
- Less expensive
- Useful for rare diseases
- Useful for diseases with a long latent interval
- No risk to subjects
- Multiple exposures can be studied
- No attrition problem
79DISADVANTAGES
- Susceptible for biases
- Selection of controls difficult
- Incidence (thereby RR) can not be calculated
- If disease is relatively common (gt 5 to 10), OR
may not be reliable estimate of RR - Other possible effects of exposure can not be
studied - Cause Vs Association
80APPLICATIONS
- Evaluating Vaccine Effectiveness
- Evaluations of Treatment Program Efficacy
- Evaluation of Screening
- Outbreak Investigations
- Indirect Estimation in Demography
- Genetic Epidemiology
- Occupational Health Research
- Predictive Modeling