Analysis and presentation of Case-control study data - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Analysis and presentation of Case-control study data

Description:

Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1) Simple logistic regression (with a dichotomous covariate) Suppose we are ... – PowerPoint PPT presentation

Number of Views:347
Avg rating:3.0/5.0
Slides: 48
Provided by: bvnguyentr
Category:

less

Transcript and Presenter's Notes

Title: Analysis and presentation of Case-control study data


1
Analysis and presentation of Case-control study
data
  • Chihaya Koriyama
  • February 14 (Lecture 1)

2
Study design in epidemiology
3
Why case-control study?
  • In a cohort study, you need a large number of the
    subjects to obtain a sufficient number of case,
    especially if you are interested in a rare
    disease.
  • Gastric cancer incidence in Japanese male
  • 128.5 / 100,000 person year
  • A case-control study is more efficient in terms
    of study operation, time, and cost.

4
Comparison of the study design
Case-control Cohort Rare diseases
suitable not suitable Number of disease 1
1lt Sample size relatively small need to
be large Control selection difficult
easier Study period relatively short
long Recall bias yes no Risk
difference no available available
5
Case-control study - Sequence of determining
exposure and outcome status
  • Step1 Determine and select cases of your
    research interest
  • Step2 Selection of appropriate controls
  • Step3 Determine exposure status in both cases
    and controls

6
Case ascertainment
  • What is the definition of the case?
  • Cancer (clinically? Pathologically?)
  • Virus carriers (Asymptomatic patients)
  • ? You need to screen the antibody
  • Including deceased cases?
  • You have to describe the following points,
  • the definition
  • when, where how to select

7
Who will be controls?
  • Control ? non-case
  • Controls are also at risk of the disease in
    his(her) future.
  • Controls are expected to be a representative
    sample of the catchment population from which the
    case arise.
  • In a case-control study of gastric cancer, a
    person who has received the gastrectomy cannot be
    a control since he never develop gastric cancer .

8
Various types of case-control studies
1)a population-based case-control study Both
cases and controls are recruited from the
population. 2)a case-control study nested in a
cohort Both case and controls are members of the
cohort. 3)a hospital-based case-control
study Both case and controls are patients who
are hospitalized or outpatients. Controls with
diseases associated with the exposure of interest
should be avoided.
9
The following points should be recorded
(described in your paper)
  • The list (number) of eligible cases whose medical
    records unavailable
  • The list (number) of refused subjects, if
    possible, with descriptions of the reasons of
    refusal
  • The length of interview
  • The list (number) of subjects lacking the
    measurement data, with descriptions of the reasons

10
Exploratory or Analytic
  • Exploratory case-control studies
  • There is no specific a priori hypothesis about
    the relationship between exposure and outcome.
  • Analytic case-control studies
  • Analytic studies are designed to test specific a
    priori hypotheses about exposure and outcome.

11
Case-control study - information
  • Sources of the information of exposure and
    potential confounding factors
  • Existing records
  • Questionnaires
  • Face-to-face / telephone interviews
  • Biological specimens
  • Tissue banks
  • Databases on biochemical and environmental
    measurements

12
Temporality is essential in Hills criteria
The study exposure is unlikely to be altered at
this stage because of the disease.
The study exposure is more likely to be altered
at this stage because of the symptoms.
Essential Epidemiology (WA Oleckno)
13
Bias should be minimized
  • Bias Confounding
  • Selection bias
  • Detection bias
  • Information bias (recall bias)
  • Confounding

Confounding can be controlled by statistical
analyses but we can do nothing about bias after
data collection.
14
Case-control studies
  • are potential sources of many biases
  • should be carefully designed, analyzed, and
    interpreted.

15
How can we solve the problem of confounding in a
case-control study?
  • Prevention at study design
  • Limitation
  • Matching in a cohort study But not in a
    case-control study

16
Matching in a case-control study
  • Matched by confounding factor(s) to increase the
    efficiency of statistical analysis
  • Cannot control confounding
  • A conditional logistic analysis is required.

17
Over matching
  • Matched by factor(s) strongly related to the
    exposure which is your main interest
  • CANNOT see the difference in the exposure status
    between cases and controls

18
How can we solve the problem of confounding?
  • Treatment at statistical analysis
  • Stratification by a confounder
  • Multivariate analysis

19
What you should describe in the materials and
methods,
  • Study design
  • Definition of eligible cases and controls
  • Inclusion / exclusion criteria of cases and
    controls
  • Number of the respondents and response rate
  • Main exposure and other factors including
    potential confounding factors

20
What you should describe in the materials and
methods,
  1. Sources of the information of exposure and other
    factors
  2. Matched factors, if any
  3. The number of subjects used in statistical
    analyses
  4. Statistical test(s) and model(s)
  5. Name and version of the statistical software

21
Assuring adequate study power
  • Following information is necessary
  • The confidence level desired (usually 95
    corresponding to a p-value of 0.05)
  • The level of power desired (80-95)
  • The ratio of controls to cases
  • The expected frequency of the exposure in the
    control group
  • The smallest odds ratio one would like to be able
    to detect (based on practical significance)

22
Statistical analysis Matched vs. Unmatched
studies
  • The procedures for analyzing the results of
    case-control studies differ depending on whether
    the cases and controls are matched or unmatched.

Matched Unmatched McNemars test Chi-square
test Conditional logistic Unconditional
logistic regression analysis regression
analysis
23
Advantages of pair matching in case-control
studies
  • Assures comparability between cases and controls
    on the selected variables
  • May simplify the selection of controls by
    eliminating the need to identify a random sample
  • Useful in small studies where obtaining cases and
    controls that are similar on potentially
    confounding factors may otherwise be difficult
  • Can assure adequate numbers of subjects with
    specified characteristics so as to permit
    statistical comparisons

Essential Epidemiology (WA Oleckno)
24
Disdvantages of pair matching in case-control
studies
  • May be difficult or costly to find a sufficient
    number of controls
  • Eliminates the possibility of examining the
    effects of the matched variables on the outcome
  • Can increase the difficulty or complexity of
    controlling for confounding by the remaining
    unmatched variables
  • Overmatching
  • Can result in a greater loss of data since a pair
    of subjects has to be eliminated even if ne
    subject is not responsive

Essential Epidemiology (WA Oleckno)
25
An example of unmatched case-control study
Lung cancer Controls cases N100 N100 Smokers
(NOT recently started) ? ? 70
40 
Cases Controls
smoker 70 40
Non-smoker 30 60
Odds ratio
26
Risk measure in a case-control study
  • Odds prevalence / (1- prevalence)
  • Odds ratio odds in cases / odds in controls
  • Disease
  • (case) -(control)
  • a c
  • Exposure - b d
  • Exposure odds in cases a / b
  • Exposure odds in controlsc / d
  • Odds ratio(a / b) / (c / d) a d / b c

27
An example of matched case-control study
Lung cancer Matched controls Cases by sex
age N100 N100 Smokers (NOT recently started)
? ? 70 40 
Case Case
Smoker Non-smoker
Control smoker 30 10
Control Non-smoker 40 20
Notice that this is the distribution of 100
matched pairs.
28
McNemars test
Case Case
Smoker Non-smoker
Control smoker 30 10
Control Non-smoker 40 20
Chi-square (test) statistic (40 10)2 /
(4010) 18 where degree of freedom is 1.
Odds ratio 40 / 10 4
29
Logistic regression analysis
  • Logistic regression is used to model the
    probability of a binary response as a function of
    a set of variables thought to possibly affect the
    response (called covariates).
  • 1 case (with the disease)
  • Y
  • 0 control (no disease)

30
  • One could imagine trying to fit a linear model
    (since this is the simplest model !) for the
    probabilities, but often this leads to problems
  • In a linear model, fitted probabilities can fall
    outside of 0 to 1. Because of this, linear models
    are seldom used to fit probabilities.

Probability
31
  • In a logistic regression analysis, the logit of
    the probability is modelled, rather than the
    probability itself.
  • P probability of getting disease
  • p
  • logit (p) log
  • 1-p
  • As always, we use the natural log. The logit is
    therefore the log odds,
  • since odds p / (1-p)

32
Simple logistic regression (with a continuous
covariate)
  • Suppose we give each of several beetles some dose
    of a potential toxic agent (xdose), and we
    observe whether the beetle dies (Y1) or lives
    (Y0). One of the simplest models we can consider
    is to assume that the relationship of the logit
    of the probability of death and the dose is
    linear, i.e.,
  • px
  • logit (px) log a b x
  • 1 px
  • where px probability of death for a given dose
    x, and a and b are unknown parameters to be
    estimated from the data.

33
  • The values of a and b will determine whether or
    not and how steeply the dose-response curve rises
    (or falls) and where it is centered.
  • If b 0 px is constant over x
  • b gt 0 px increases with x
  • b lt 0 px decreases with x
  • H0 b 0 is the null hypothesis in a test of
    trend when x is a continuous variable. Knowledge
    of b would give us insight to the direction and
    degree of association outcome and exposure.

e (abx) Px 1 e (abx)
34
Simple logistic regression (with a dichotomous
covariate)
  • Suppose we are considering a case-control study
    where the response variable is disease (case) /
    non-disease (control) and the predictor variable
    is exposed / non-exposed, which we code as an
    indicator variable, or dummy variable.
  • 1 D1 1 E1
  • Y x
  • 0 D0 0 E0
  • And px Prob (disease given exposure x)
  • P (Y 1 x) x 0, 1
  • Thus, p1 probability of disease among exposed
  • p0 probability of disease among non-exposed

35
  • In case of exposure (X1) logit(PE1)intercept
    b
  • In case of non-exposure (X0) logit (PE0)
    intercept
  •  
  • If you want to obtain odds ratio of exposure
    group, 
  •  OR(PE1 / (1-PE1))/ (PE0 / (1-PE0))
  • log(OR) log (PE1 / (1-PE1))/ (PE0 / (1-PE0))
  • log (PE1 / (1-PE1)) log(PE0 / (1-PE0))
  • logit (P for exposure) logit (P for
    non-exposure)
  • (intercept b) intercept
  • b

Definition of odds ratio
OR e b
36
Simple logistic regression (with a covariate
having more than two categories)
  • Suppose we are considering a case-control study
    where the predictor variable is current smoker /
    ex-smoker / non-smoker, which we code as a
    dummy variable.

Original data
Dummy variables
Case Smoking status SMK1 (X1) SMK2 (X2)
1 Current 1 0
0 Ex-smoker 0 1
1 Non-smoker 0 0
1 Ex-smoker 0 1
0 Non-smoker 0 0
0 Non-smoker 0 0
37
  • Logistic regression model of the previous example
  • logit (P) a b1(X1) b2 (X2)
  • In case of current smoker (X11, X20)
  • logit(Pcurrent) a b1
  • In case of ex-smoker (X10, X21)
  • logit(Pex) a b2
  • In case of non-smoker (X10, X20)
  • logit(Pnon) a

ORcurrent e b1
ORex e b2
ORnon 1 (referent)
38
Walds test for no association
  • The null hypothesis of no association between
    outcome and exposure corresponds to
  • H0 OR1 or H0 b logOR0
  • Using logistic regression results, we can test
    this hypothesis using standard coefficients or
    Walds test.
  • Note STATA and SAS present two-sided Walds test
    p-values.

39
Likelihood Ratio Test (LRT)
  • An alternative way of testing hypotheses in a
    logistic regression model is with the use of a
    likelihood ratio test. The likelihood ratio test
    is specifically designed to test between nested
    hypotheses.
  • H0 log (Px / (1-Px)) a
  • HA log (Px / (1-Px)) a bx
  • and we say that H0 is nested in HA.

40
Likelihood Ratio Test (LRT)
  • In order to test H0 vs. HA, we compute the
    likelihood ratio test statistic
  • G -2log(LH0 / LHA) 2 (log LHA log LH0)
  • (-2log LH0) (-2log LHA)
  • Where
  • LHA is the maximized likelihood under the
    alternative hypothesis HA and
  • LH0 is the maximized likelihood under the
    null hypothesis H0.
  • If the null hypothesis H0 were true, we would
    expect the likelihood ratio test statistic to be
    close to zero.

41
Walds test vs. LRT
  • In general, the LRT often works a little better
    than the Wald test, in that the test statistic
    more closely follows a X2 distribution under H0.
    But the Wald test often works very well and
    usually gives similar results.
  • More importantly, the LRT can more easily be
    extended to multivariate hypothesis tests, e.g.,
  • H0 b1 b2 0 vs. HA b1 b2 0

42
World J. Gastroenterology 2006
43
Recruitment of cases
81 cases were excluded
173 formalin-fixed paraffin-embedded blocks
2
1
216 CASES
4
3
We could not obtain the information on tumor
location for 23 cases, and those cases were
excluded from the tumor location specific
analysis.
44
Recruitment of controls
1
Matched by sex, age (5-year ), hospital, date of
administration Case control 1 2
2
POTENTIAL CONTROLS 528
431 CONTROLS
3
  • Major diseases of controls
  • cardiovascular diseases (208)
  • trauma (117)
  • infectious diseases (38)
  • urological disorders (21)

45
gastric cancer Smoking
0 1 Total ---------------
---------------------------- Never 0
188 78 266 Ex- 1
145 89 234 Current 2 98
49 147 ------------------------
------------------- Total 431
216 647
xilogistic casocon i.fumar i.fumar
_Ifumar_0-2 (naturally coded _Ifumar_0
omitted) Logistic regression
Number of obs 647
LR
chi2(2) 4.24
Prob gt chi2
0.1198 Log likelihood -409.93333
Pseudo R2
0.0051 ------------------------------------------
--------------------------------------------------
---- casocon Odds Ratio Std. Err. z
Pgtz 95 Conf. Interval --------------
--------------------------------------------------
------------------------------- _Ifumar_1
1.479399 .2817549 2.06 0.040 1.018526
2.148813 _Ifumar_2 1.205128 .2660901
0.85 0.398 .7817889
1.857706 -----------------------------------------
--------------------------------------------------
-----
Walts test p values
46
Results of conditional logistic regression
analysis using the same data
Case Control OR (95CI)
Fumar0 Fumar1 Fumar2
xiclogit casocon i.fumar, group(identi)
or Conditional (fixed-effects) logistic
regression Number of obs 647
LR
chi2(2) 4.64
Prob gt chi2
0.0982 Log likelihood -234.5745
Pseudo R2
0.0098 -------------------------------------------
--------------------------------------------------
------ casocon Odds Ratio Std. Err.
z Pgtz 95 Conf. Interval -------------
-------------------------------------------------
------------------------------------ _Ifumar_1
1.535023 .3061998 2.15 0.032
1.038295 2.269389 _Ifumar_2 1.219851
.2784042 0.87 0.384 .7799
1.907985 -----------------------------------------
--------------------------------------------------
-------- Walds test p values
Stata command
47
GC risk by smoking in Cali, Colombia results of
tumor-location specific analysis
P value by LRT
This test examines the difference in the
magnitude of the association between smoking and
GC risk among 3 tumor sites.
Write a Comment
User Comments (0)
About PowerShow.com