Biostatistical Basics for Genetic Epidemiology - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Biostatistical Basics for Genetic Epidemiology

Description:

... .30 0.31 27-30 5.31 1073.20 0.00 -1774.00 1734.60 1.05 30 -1456.20 1567.80 0.86 -520.38 2229.30 0.05 Scale 24493.50 233.77 32051.00 298 .65 ... – PowerPoint PPT presentation

Number of Views:109
Avg rating:3.0/5.0
Slides: 43
Provided by: NHRI1
Category:

less

Transcript and Presenter's Notes

Title: Biostatistical Basics for Genetic Epidemiology


1
Biostatistical Basics for Genetic Epidemiology
  • Interdisciplinary Genetic Research Course
  • Shanghai
  • October 6, 2008
  • Kung-Yee Liang
  • Department of Biostatistics
  • Johns Hopkins School of Public Health

2
A Brief Outline
  • Some basic concepts in epidemiology
  • Designs
  • Confounding effect modifications
  • Some statistical tools
  • Mantel-Haenszel method
  • Logistic regression
  • Interpretations
  • Cautions on conventional inferences
  • Matched analysis
  • Polytomous regression
  • Use in genetic epidemiology

3
Epidemiology
  • A discipline to study the distribution of
    diseases
  • (disorders) to provide the basics for developing
    and
  • evaluating preventive procedures and public
    health
  • practices
  • Disease control and prevention through health
    education and intervention
  • Health policy/expenditure implementations
  • Clinical implications prognosis, treatment
    strategy

4
Issues in Identifying Risk Factors
  • Designs
  • Measures of association
  • Confounding
  • Effect modification
  • Data analysis

5
Designs
  • Prospective (cohort) exposed and unexposed
  • subjects are followed up prospectively and
  • events of interest are observed over time
  • Study multiple endpoints
  • Temporal and causal relationship
  • Usually large scale time consuming and costly
  • Loss to follow-up (survivorship bias)
  • Breslow and Day (1987). The Design and Analysis
    of Cohort Studies, IARC

6
Designs (cont)
  • Retrospective (case-control) affected (cases)
    and
  • unaffected (controls) subjects are ascertained
    and
  • exposure information collected retrospectively
  • More efficient time-wise and budget-wise
  • Subject to biases recall, detection, etc.
  • More difficult to establish temporal and causal
    relationship
  • Association
  • Breslow and Day (1981). The Analysis of
    Case-Control Studies, IARC

7
Measures of Association


Odds Ratio (OR) for disease
  • RR ? 1 iff OR ? 1 no association
  • RRgt1 iff ORgt1 positive association
  • the larger RR(OR), the greater the association

8
Confounding
  • The distortion of the true association between
  • the disease and the risk factor due to the
  • association of other factors with both disease
  • and exposure, the latter association with the
  • disease being causal.

9
Confounding (cont)
350 200
150 300
550 450 1000
500 500
-
10
Confounding (cont)
C E D
E C D
  • For the latter, the intermediate variable is not
    a
  • confounder, rather a mediated variable
  • Smoking ? chronic cough ? lung cancer
  • How to avoid this confusion
  • Substantive knowledge

11
Table 3.2 The relationships between outpatient
expenditure and BMI categories
in Taiwan (Tobit censored model)
plt0.05 plt0.01 plt0.001
12
Table 4.2 The relationships between outpatient
expenditure and BMI categories
using Tobit censored model (control for chronic
disease)
plt0.01 plt0.001
13
Confounding (cont)
  • It appears that once adjusting for chronicle
    illnesses such as diabetes mellitus, hypertension
    and CVD, the effect of BMI on outpatient
    expenditures (OE) and physician visits (PV)
    disappears
  • What is more likely is that the effect of BMI on
    OE and PE is indeed real as it is mediated
    through those illnesses

14
How to Deal with Confounding?
  • Stratification
  • Sub-divide population into groups that are
  • homogeneous with regard to confounding
  • variables
  • Post stratification
  • Frequency matching
  • Individual matching

15
Stratification (cont)
  • Post-stratification

Data are imbalanced
16
Stratification (cont)
  • Frequency matching
  • Individual (one-to-one) matching







50 100
30 60
20 40
Control
27 29
3 4
Case
63
17
Effect Modification
  • The effect of risk factor on the disease (the
    association between risk factor and disease) is
    dependent on (modified by) the level of the
    confounding variable

18
Effect Modification (cont)
  • Also known as interaction between risk
  • factor and confounder
  • The confounder is called effect modifier
  • Useful to identify high risk group
  • It is model dependent
  • Quantitative interaction same direction
  • (2 versus 5)
  • Qualitative interaction different direction
    (0.5 versus 5)
  • More problematic

19
Statistical Analysis
  • Post-stratification / frequency matching
  • Mantel-Haenszel (M-H) estimator
  • Mantel-Haenszel test statistic

ni mi
ai bi
ci di
i 1,, K
ti Ni ni mi
Mantel-Haenszel (1959), JNCI Greenland, Breslow
Robbins (1986) Biometrics
20
One-to-One Matching
Control
a b
c d
Case
  • a,d concordant pairs
  • b,c discordant pairs
  • Mantel-Haenszel estimator b/c
  • McNemar Test statistics

21
Example Revisited

D Oesophageal cancer E ? 80g/day alcohol
consumption OR5.2 The odds (risk) of oesophageal
cancer for those who drink more than 80g/day is
about five times as high s those who drink less.
?
22
Example Revisited (cont)
D Endometrial cancer E Estrogen usage OR 29/3
9.67 The odds (risk) of endometrial cancer is
elevated by ten folds if using estrogen
?
23
Limitations of the Stratification / M-H Approach
  • All the variables (confounder, risk factor) are
    required to be discrete
  • One risk factor at time
  • Should NOT use it with qualitative interaction
  • Implications
  • Cant establish dose response relationship
  • The larger the exposure, the higher the risk
  • Cant examine joint effects of several risk
    factors simultaneously

24
An Alternative
  • Logistic regression model
  • Y 1(0) if affected (unaffected)
  • Z1, , Zq confounding variables
  • X1,, Xp risk factors of interest
  • LogPr(Y1)/Pr(Y0) log odds
  • All of the regression coefficients have the log
    odds ratio interpretations

25
How to Interpret Logistic Regression Coefficients?
  • log
  • X1 log Odds
  • 1
    ? ?1
  • 0
    ?
  • ß1 log Odds Ratio
  • log

? ?1X1
26
How to Interpret Logistic Regression
Coefficients? (cont)
  • Log
  • (X1, X2) log Odds
  • 1 0
  • 0 1
  • 0 0

? ?1X1 ß2X2
? ?1
? ?2
?
27
How to Interpret Logistic Regression
Coefficients? (cont)
  • log
  • (Z1, X1) log Odds
  • 1 1
  • 1 0
  • 0 1
  • 0 0

? ?1Z1 ß1X1
? ?1 ?1
? ?1
? ?1
?
28
How to Interpret Logistic Regression
Coefficients? (cont)
  • (Z1, X1) log Odds
  • 1 1
  • 1 0
  • 0 1
  • 0 0

log ? ?1Z1 ß1X1 dZ1?X1
? ?1 ?1 ?
? ?1
? ?1
?
29
How to Interpret Logistic Regression
Coefficients? (cont)
  • log
  • X1 log Odds
  • 5
  • 4
  • 31
  • 30

? ß1X1 X1 continuous
? 5?1
? 4?1
? 31?1
? 30?1
30
How to Interpret Logistic Regression
Coefficients? (cont)
  • log
  • X1 continuous Z 1(0)
  • log
  • X1 continuous Z 1(0)

? ?1Z1 ß1X1
? ?1Z1 ß1X1 dZ1?X1
31
In General
  • X (X1,., Xp) , 0 (0,., 0)
  • OR(X 0)
  • Multiplicative
    (log-linear) models

Pr(Y1X, Z) / Pr (Y0X, Z)
Pr(Y10, Z) / Pr (Y00, Z)
e ?1X1 ? e?2X2 ? ?e?PXP
32
In General (cont)
  • X (x11, x2,., xp), X? (x1, x2,., xP)
  • OR(X X?) OR (X 0) / OR (X? 0)
  • e?1

?1(x11) ?2x2,?PxP
e
?1x1?2x2,?PxP
e
33
For Matched Case-Control Studies
  • The conventional logistic regression method (e.g.
    SAS PROC LOGISTIC) is NOT adequate

34
For Matched Case-Control Studies
  • Matching variables can be used to examine their
    interactions with risk factors (effect
    modification)

  • Model
  • Variable 1
    2 3
  • Estrogen use 2.074
    1.431 2.074
  • (EST) (0.421)
    (0.826) (0.421)
  • ESTAGE1
    0.847

  • (1.034)
  • ESTAGE2
    0.780

  • (1.154)
  • ESTAGE?
    0.385

  • (0.616)


? AGE 0, 1 or 2
35
For Matched Case-Control Studies
  • 3. How many matching variables to consider?
  • Many confounding variables
  • May not find matched controls if all confounders
    are considered for matching
  • May run into over-matching problem
  • Recommendations
  • No more than two or three strong confounders
  • The rest are adjusted through regression

36
For Matched Case-Control Studies
  • 4. How many controls to match per case?
  • As a rule of thumb 4 matched controls per case
  • Efficiency of one controls versus R control
  • R/(R 1)
  • More controls maybe needed if
  • The risk factor considered is rare
  • The underlying degree of association is high
  • Breslow et al. (1983) JASA

37
Polytomous Logistic Regression
  • It is common that the response variable has three
    or more categories
  • Cell types in lung cancer
  • Severity of injury
  • Subtypes in oral cleft
  • Cleft lip w/o palate (CLP)
  • Cleft palate only (CP)

38
Polytomous Logistic Regression (cont)
Oral Cleft Oral Cleft Oral Cleft Oral Cleft
C2 CP CLP control
Present 27 32 24
Absent 97 177 142
OR 1.65 1.07 1.0
?
  • C2 target allele for the candidate gene
    transforming
  • factor alpha (TGFA)

39
Polytomous Logistic Regression (cont)
  • Polytomous logistic regression model
  • Y 0, 1, 2,, C
  • log aj ß x, j 1, 2,, C
  • ßj change in log odds (Y j versus Y 0) per
    unit change in x.
  • ßj -ßk change in log odds (Y j versus Y K)
    per unit change in x.

t j
40
Polytomous Logistic Regression (cont)
Variable CP/control CLP/control
Intercept (?) 2.756 (0.753) 3.388 (0.679)
TGFA 0.045 (0.406) -0.025 (0.580)
MS 0.821 (0.370) 1.071 (0.329)
TGFA MS 0.580 (0.746) -0.279 (0.714)
MA 0.108 (0.024) -0.112 (0.022)
MS Maternal smoking MA Maternal age
41
Summary
  • We have discussed
  • Some basic concepts in epidemiology
  • Designs
  • Cohort vs case-control
  • Confounding effect modifications
  • Some statistical tools
  • Mantel-Haenszel procedure
  • Logistic regression
  • Matching vs not
  • Dichotomous vs polytomous

42
Summary (cont)
  • These designs and methods are useful for genetic
    epidemiological research
  • Detection of familial aggregation
  • Identification of genetic subtypes
  • Test for genetic association
  • Examination of gene-environment interaction
  • Liang Beaty (2000) Stat Meth in Med Res
Write a Comment
User Comments (0)
About PowerShow.com