Causal Inference in Genetics - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Causal Inference in Genetics

Description:

... Center for Genomics and Human Genetics, Feinstein Institute for Medical Research, ... Causal inference for observed data is an active topic outside genetics ... – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 40
Provided by: stat70
Category:

less

Transcript and Presenter's Notes

Title: Causal Inference in Genetics


1
Causal Inference in Genetics
Wentian Li ???, Ph.D Robert S Boas Center for
Genomics and Human Genetics, Feinstein Institute
for Medical Research, North Shore LIJ Health
System, NY, USA
2
Statistical Correlations
  • Rain/sunshine and wet/dry ground
  • Wrinkle/smooth face and chance to have cancer
  • Genetic marker ?? and disease status (genetic
    association and genetic linkage)
  • One genetic marker and another genetic marker
  • Two biomarkers (indicators of a disease) of the
    same disease

3
Causal correlations
  • Rain causes the wet ground
  • Wrinkle does not cause the cancer
  • Genetic marker may or may not cause the disease
    (when it does, it is called functional)
  • Two markers being close to each on a chromosome
    linkage disequilibrium. (The cause/effect
    framework does not apply? Either one is a cause
    of another)
  • If two biomarkers are involved in a disease, the
    one upstream in a pathway is cause, and
    downstream is effect

4
Causal inference in Genetics ???, Epidemiology
????,Genomics ???? Is It Necessary?
  • Risk factors of a disease have to be evaluated.
    Concept of confounding factor ???? (variable that
    is related to one or more variables in the study)
  • No need for causal inference in
    genotype-phenotype correlation (only if Larmarck
    were right, inheritance of acquired trait,
    there is a reversal)
  • Im addressing the last example up/down stream
    in biochemical pathway of two biomarkers

5
It may not be necessary IF
  • Both biomarkers are followed in time
  • If one biomarker becomes positive ?? first, it is
    the more upstream
  • Unfortunately, temporal information is usually
    unavailable, and biomarkers are measured after
    the onset of the disease

6
Causal inference for observed data is an active
topic outside genetics
  • Machine Learning (e.g. causal relationship
    between words in a corpus ?? )
  • Economics and Social Science
  • Biology (Sewall Wrights path analysis)
  • Computer science (Bayesian network)

7
(No Transcript)
8
A typical graphic model
9
More complicated
10
Even more complicated
11
Outline of the talk
  • Introduction to the local causality discovery
  • Introduction to the dataset
  • Results

12
  • Introduction to the local causality discovery
  • Introduction to the dataset
  • Results

13
(No Transcript)
14
Data Mining and Knowledge Discovery (2000) v4,
pp.163-192
15
Problems with a Bayesian network modeling
  • All relevant variables have to be included
  • The emphasis is not on structure of the network
    (sometimes it is even given), but on quantitative
    modeling of the transition probabilities

16
In reality
  • We may not known which variables are relevant
    (part of the inference is to find out)
  • Not all variables are measured
  • We may just be interested in who is a cause, who
    is an effect, and not interested in quantitative
    conclusions

17
Local causality discovery
  • Focus on three variables only
  • It is OK that other relevant variables are not
    included
  • The principle of inference is the exclusion of
    causal models that are inconsistent with the data
  • which may not be successful

18
local causality discovery (LCD) (cont.)
  • Six assumptions 1.database completeness. 2.
    discrete variables. 3. Bayesian network model
    (directed acyclic ???? graph no loops). 4. 5.
    no selection bias. 6. valid statistical testing.
  • Three variables x,y,z
  • Hidden ??? variable is allowed to exist but not
    measured
  • Determine six correlations unconditional C(x,y),
    C(y,z), C(x,z), and conditional ??? C(x,zy),
    C(y,zx), C(x,zy)

19
An Example
20
Between two variables, there are only 6 causal
relationships (allowing confounding variable), 4
if x can only be a cause
confounding
x
no relationship
confoundingcausing
causing
NO
NO
confounding plus rev causing
Reverse causing
21
Number of causal relationships among three
variables
  • 6x6x6216 possibilities
  • 4x4x696 if x is not caused by either y or z (but
    can receive an arrow from a hidden variable)
    Cooper97 paper
  • 2x2x624 if x is not caused by y or z, and
    doesnt receive an arrow from hidden confounding
    variables Li and Wang, unpublished

22
Given a causal model
  • Unconditional ??? association between any two
    variables can be determined by whether they are
    connected by a path
  • Conditional ??? association can be determined by
    the so-called d-separation rule
  • Applying all possible causal models to the six
    correlations (yes or no) and exclude inconsistent
    models, which lead to

23
CCC causal inference rule
  • (Cooper version) if C(x,y), C(y,z), but
    C(x,zy)-,
  • then there are only three possible causal
    models x gt y gt z
  • x lt h gt y gt z
  • h gtx gt y gtz
  • (Silverstein et al. version) if C(x,y), C(y,z),
    C(x,z), but C(x,zy)-, C(x,yy), C(y,zx),
    then...

24
In words for a three-way correlated set
  • If one of the variable (x) is not an effect (only
    a cause)
  • AND
  • If correlation is lost between x and z
    conditionally,
  • THEN
  • y causes z

x gene y,z two intermediate phenotypes
25
biomarker 2
z
x is not an effect
x
causal inference is certain
gene
?
y
biomarker 1
26
  • Introduction to the local causality discovery
  • Introduction to the dataset
  • Results

27
Rheumatoid Arthritis (RA)??????
  • An autoimmune ????? disease
  • Chronic inflammation ?? of joints ??
  • Three times more likely to occur in women than
    men
  • Age of onset 40-60
  • Twin ??? concordance rates 12-15 for
    MZ???,????, 5 for DZ ????
  • Genetic and environmental (e.g. smoking) risk
    factors

28
MHC/HLA the main genetic contribution of RA
  • MHC (Major Histocompatibility Complex??????????)
    or HLA (Human leukocyte antigens ???????)
    HLA-DRB1 gene on chromosome 6 (6p21.3)
  • The RA associated alleles are HLA-DRB10401,
    0404, 0408 (Caucasian), not 0402, 0403, 0407
  • In Asian population, different DRB1 alleles are
    associated with RA (e.g. 0405, 0901)
  • A group of DRB1 risk alleles are called shared
    epitope (SE) ????, or rheumatoid epitope, code
    position 70-74 amino acids in the third
    hypervariable region

29
An update of recent whole genome association of RA
  • PTPN22 (ch1) (Begovich et al, AJHG 2004) has
    consistently replicated in Caucasian samples
  • STAT4 (ch2) (Remmers et al, NEJM, to appear)
  • (Plenge et al, submitted)
  • Wellcome trust (Nature, June 7, 2007)

30
Two Auto-antibodies are strongly associated with
RA RF and anti-CCP
  • RF (rheumatoid factor ?????) 80 of RA patients
    are RF positive
  • anti-CCP (anti-cyclic citrullinated peptide
    antibody ????????,?CCP??) even better predictor
    of RA in early stage
  • HLA-DRB1, RF, anti-CCP are all associated with
    the RA disease, and they are associated with each
    other. CCC rule can be applied!
  • ???,???,???,?, ??????????????????????,
    ?????,2004,2052-57

31
Q Between RF and anti-CCP, which one is the
cause and which is the effect?
32
  • Introduction to the local causality discovery
  • Introduction to the dataset
  • Results

33
(No Transcript)
34
1723 Caucasian RA patients
anti-CCP positive
anti-CCP negative
35
Association between RF and DRB1 genotype is lost
conditional on anti-CCP
36
biomarker 2
RF
z
SE
x
gene
anti-CCP
y
biomarker 1
37
Discussions/Issues
  • There are evidences that RA patients become
    anti-CCP positive before becoming RF positive
  • The three-way correlation might be lost in normal
    controls (here the data is case-only)
  • In-between anti-CCP and RF, other factors are
    possible (so the cause-effect may not be direct)
  • It is not clear where the smoking risk factor
    comes in

38
Co-Authors
  • Mingyi WANG (Zhejiang Univ, Computer Science
    Department, causal inference)
  • Patricia Irigoyen, Peter Gregersen (North Shore
    LIJ, RA data)

39
Advertisements
  • Bibliography on microarray data analysis
  • www.nslij-genetics/microarray/ and
    www.cbi.pku.edu.cn/mirror/microarray/microarray.ht
    ml
  • Bibliography on linkage disequilibrium mapping
    www.nslij-genetics.org/ld/
  • Bibliography on computational gene recognition
    www.nslij-genetics.org/gene/
  • Bibliography on features, patterns, correlations
    in DNA sequences www.nslij-genetics.org/dnacorr/
  • Comprehensive list of genetic analysis programs
    www.nslij-genetics.org/soft/ and
    linkage.rockefeller.edu/soft/
Write a Comment
User Comments (0)
About PowerShow.com