Causal Inference - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Causal Inference

Description:

Wentian Li, North Shore LIJ Health System. 1. Causal Inference???? ... Patricia Irigoyen, Peter Gregersen (North Shore LIJ, RA data) 6/21/09 ... – PowerPoint PPT presentation

Number of Views:253
Avg rating:3.0/5.0
Slides: 28
Provided by: nslijge
Category:

less

Transcript and Presenter's Notes

Title: Causal Inference


1
Causal Inference????
  • Of Intermediate ?? Phenotypes?? and Biomarkers
    ???? in Rheumatoid Arthritis ??????
  • An Application of Machine Learning ????
    Techniques to Genetic Epidemiology ??????

Wentian Li ???, Ph.D Feinstein Institute for
Medical Research
2
Genetic Association
  • Association ?? is not equivalent to causal ???
    relationship
  • Wrinkle-Cancer risk association does not mean one
    causes ?? another
  • Age is a confounding factor ????

3
When do we need to know cause and effect?
  • Rarely discussed in genetic analysis because
    genotype is always the cause ??, and phenotype is
    always the effect ??
  • In epidemiology ???? factor ??-disease ??
    association can belong to three situations (1)
    factor is a cause (2) reverse causality (3) a
    third confounding factor
  • For two intermediate phenotypes (biomarkers),
    causal arrow can point either way

4
Causal Inference in Machine Learning
  • Large text database (e.g. google)
  • Observational data (no controlled experiment, and
    no other approaches to determine causality)
  • Two-point association indeed cannot be used to
    claim causality
  • The key is a third variable, as well as
    conditional ??? association based on the third
    variable

5
(No Transcript)
6
(No Transcript)
7
Data Mining and Knowledge Discovery (2000) v4,
pp.163-192
8
An Example
9
Coopers Local Causality Discovery (LCD) Rule
  • Six assumptions 1.database completeness. 2.
    discrete variables. 3. Bayesian network model
    (directed acyclic ???? graph no loops). 4. 5.
    no selection bias. 6. valid statistical testing.
  • Three variables x,y,z
  • Hidden ??? variable is allowed (but not in the
    dataset)
  • Determine three correlations unconditional
    C(x,y), C(y,z) and conditional C(x,zy)

10
Between two variables, there are only 6(4) causal
relationships (allowing confounding variable)
confounding
no relationship
confoundingcausing
causing
NO
NO
confounding plus rev causing
Reverse causing
11
Number of causal relationships among three
variables
  • 6x6x6216 possibilities
  • 4x4x696 if x is not caused by either y or z (but
    can receive an arrow from a hidden variable)
    Cooper97 paper
  • 2x2x624 if x doesnt even receive an arrow from
    hidden confounding variables Li and Wang,
    unpublished

12
Given a causal model
  • Unconditional ??? association between any two
    variables can be determined by whether they are
    connected by a path
  • Conditional ??? association can be determined by
    the so-called d-separation rule

13
CCC causal inference rule
  • (Cooper version) if C(x,y), C(y,z), but
    C(x,zy)-,
  • then there are only three possible causal
    models x y z
  • x y z
  • h x y z
  • (Silverstein et al. version) if C(x,y), C(y,z),
    C(x,z), but C(x,zy)-, C(x,yy), C(y,zx),
    then...

14
In a three-way correlated set
  • If one of the variable (x) is not an effect (only
    a cause)
  • AND
  • If correlation is lost between x and z
    conditionally,
  • THEN
  • y causes z

x gene y,z two intermediate phenotypes
15
The use of a not-a-effect variable has an amazing
parallel in epidemiology
  • Called instrumental variable
  • Martjin Katans idea on cholesterol ???
  • cancer ?? association he proposed to use a
    genotype (apoliprotein ???? E) as the third
    variable (Lancer 1986, i507-508)
  • Katan did not use conditional correlation
  • This idea is now called Mendelian randomization

16
(No Transcript)
17
Rheumatoid Arthritis (RA)
  • An autoimmune ????? disease
  • Chronic inflammation ?? of joints ??
  • Three times more likely to occur in women than
    men
  • Age of onset 40-60
  • Twin ??? concordance rates 12-15 for
    MZ???,????, 5 for DZ ????
  • Genetic and environmental (e.g. smoking) risk
    factors

18
MHC/HLA the main genetic contribution of RA
  • MHC (Major Histocompatibility Complex??????????)
    or HLA (Human leukocyte antigens ???????)
    HLA-DRB1 gene on chromosome 6 (6p21.3)
  • The RA associated alleles are HLA-DRB10401,
    0404, 0408 (Caucasian), not 0402, 0403, 0407
  • In Asian population, different DRB1 alleles are
    associated with RA (e.g. 0405, 0901)
  • A group of DRB1 risk alleles are called shared
    epitope (SE) ????, or rheumatoid epitope, code
    position 70-74 amino acids in the third
    hypervariable region

19
Two Auto-antibodies are strongly associated with
RA RF and anti-CCP
  • RF (rheumatoid factor ?????) 80 of RA patients
    are RF positive
  • anti-CCP (anti-cyclic citrullinated peptide
    antibody ????????,?CCP??) even better predictor
    of RA in early stage
  • HLA-DRB1, RF, anti-CCP are all associated with
    the RA disease, and they are associated with each
    other. CCC rule can be applied!
  • ???,???,???,?, ??????????????????????,
    ?????,2004,2052-57

20
Q Between RF and anti-CCP, which one is the
cause and which is the effect?
21
1723 Caucasian RA patients
anti-CCP positive
anti-CCP negative
22
Association between RF and DRB1 genotype is lost
conditional on anti-CCP
23
By the CCC rule, anti-CCP is the cause, RF is the
effect
  • Or, anti-CCP is upstream and RF is downstream in
    a pathway

24
Discussions/Issues
  • There are evidences that RA patients become
    anti-CCP positive before becoming RF positive
  • The three-way correlation might be lost in normal
    controls (here we have a case-only analysis)
  • In-between anti-CCP and RF, other factors are
    possible (so the cause-effect may not be direct)
  • It is not clear where the smoking factor comes in
    (could be an intriguing analysis with smoking
    data!)

25
Revisit Katans Mendelian Randomization (MR) by
LCD Wang, Li, unpublished
  • MR needs a not-an-effect variable (gene)
  • Conditional association is not used
  • Only need a counter example (e.g. Apo E2 samples
    have low cholesterol, but NOT high cancer risk)
  • LCD needs a variable that is not a cause
  • Conditional association is used
  • Complete information of (G, IP, D) trio for all
    samples (e.g. Apo genotype, cholesterol level,
    cancer status)

26
Co-Authors
  • Mingyi WANG (Zhejiang Univ, Computer Science
    Department, causal inference)
  • Patricia Irigoyen, Peter Gregersen (North Shore
    LIJ, RA data)

27
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com