Title: Causal Inference
1Causal Inference????
- Of Intermediate ?? Phenotypes?? and Biomarkers
???? in Rheumatoid Arthritis ?????? -
- An Application of Machine Learning ????
Techniques to Genetic Epidemiology ??????
Wentian Li ???, Ph.D Feinstein Institute for
Medical Research
2Genetic Association
- Association ?? is not equivalent to causal ???
relationship - Wrinkle-Cancer risk association does not mean one
causes ?? another - Age is a confounding factor ????
3When do we need to know cause and effect?
- Rarely discussed in genetic analysis because
genotype is always the cause ??, and phenotype is
always the effect ?? - In epidemiology ???? factor ??-disease ??
association can belong to three situations (1)
factor is a cause (2) reverse causality (3) a
third confounding factor - For two intermediate phenotypes (biomarkers),
causal arrow can point either way
4Causal Inference in Machine Learning
- Large text database (e.g. google)
- Observational data (no controlled experiment, and
no other approaches to determine causality) - Two-point association indeed cannot be used to
claim causality - The key is a third variable, as well as
conditional ??? association based on the third
variable
5(No Transcript)
6(No Transcript)
7Data Mining and Knowledge Discovery (2000) v4,
pp.163-192
8An Example
9Coopers Local Causality Discovery (LCD) Rule
- Six assumptions 1.database completeness. 2.
discrete variables. 3. Bayesian network model
(directed acyclic ???? graph no loops). 4. 5.
no selection bias. 6. valid statistical testing. - Three variables x,y,z
- Hidden ??? variable is allowed (but not in the
dataset) - Determine three correlations unconditional
C(x,y), C(y,z) and conditional C(x,zy)
10Between two variables, there are only 6(4) causal
relationships (allowing confounding variable)
confounding
no relationship
confoundingcausing
causing
NO
NO
confounding plus rev causing
Reverse causing
11Number of causal relationships among three
variables
- 6x6x6216 possibilities
- 4x4x696 if x is not caused by either y or z (but
can receive an arrow from a hidden variable)
Cooper97 paper - 2x2x624 if x doesnt even receive an arrow from
hidden confounding variables Li and Wang,
unpublished
12Given a causal model
- Unconditional ??? association between any two
variables can be determined by whether they are
connected by a path - Conditional ??? association can be determined by
the so-called d-separation rule
13CCC causal inference rule
- (Cooper version) if C(x,y), C(y,z), but
C(x,zy)-, - then there are only three possible causal
models x y z - x y z
- h x y z
- (Silverstein et al. version) if C(x,y), C(y,z),
C(x,z), but C(x,zy)-, C(x,yy), C(y,zx),
then...
14In a three-way correlated set
- If one of the variable (x) is not an effect (only
a cause) - AND
- If correlation is lost between x and z
conditionally, - THEN
- y causes z
x gene y,z two intermediate phenotypes
15The use of a not-a-effect variable has an amazing
parallel in epidemiology
- Called instrumental variable
- Martjin Katans idea on cholesterol ???
- cancer ?? association he proposed to use a
genotype (apoliprotein ???? E) as the third
variable (Lancer 1986, i507-508) - Katan did not use conditional correlation
- This idea is now called Mendelian randomization
16(No Transcript)
17Rheumatoid Arthritis (RA)
- An autoimmune ????? disease
- Chronic inflammation ?? of joints ??
- Three times more likely to occur in women than
men - Age of onset 40-60
- Twin ??? concordance rates 12-15 for
MZ???,????, 5 for DZ ???? - Genetic and environmental (e.g. smoking) risk
factors
18MHC/HLA the main genetic contribution of RA
- MHC (Major Histocompatibility Complex??????????)
or HLA (Human leukocyte antigens ???????)
HLA-DRB1 gene on chromosome 6 (6p21.3) - The RA associated alleles are HLA-DRB10401,
0404, 0408 (Caucasian), not 0402, 0403, 0407 - In Asian population, different DRB1 alleles are
associated with RA (e.g. 0405, 0901) - A group of DRB1 risk alleles are called shared
epitope (SE) ????, or rheumatoid epitope, code
position 70-74 amino acids in the third
hypervariable region
19Two Auto-antibodies are strongly associated with
RA RF and anti-CCP
- RF (rheumatoid factor ?????) 80 of RA patients
are RF positive - anti-CCP (anti-cyclic citrullinated peptide
antibody ????????,?CCP??) even better predictor
of RA in early stage - HLA-DRB1, RF, anti-CCP are all associated with
the RA disease, and they are associated with each
other. CCC rule can be applied! - ???,???,???,?, ??????????????????????,
?????,2004,2052-57
20Q Between RF and anti-CCP, which one is the
cause and which is the effect?
211723 Caucasian RA patients
anti-CCP positive
anti-CCP negative
22Association between RF and DRB1 genotype is lost
conditional on anti-CCP
23By the CCC rule, anti-CCP is the cause, RF is the
effect
- Or, anti-CCP is upstream and RF is downstream in
a pathway
24Discussions/Issues
- There are evidences that RA patients become
anti-CCP positive before becoming RF positive - The three-way correlation might be lost in normal
controls (here we have a case-only analysis) - In-between anti-CCP and RF, other factors are
possible (so the cause-effect may not be direct) - It is not clear where the smoking factor comes in
(could be an intriguing analysis with smoking
data!)
25Revisit Katans Mendelian Randomization (MR) by
LCD Wang, Li, unpublished
- MR needs a not-an-effect variable (gene)
- Conditional association is not used
- Only need a counter example (e.g. Apo E2 samples
have low cholesterol, but NOT high cancer risk)
- LCD needs a variable that is not a cause
- Conditional association is used
- Complete information of (G, IP, D) trio for all
samples (e.g. Apo genotype, cholesterol level,
cancer status)
26Co-Authors
- Mingyi WANG (Zhejiang Univ, Computer Science
Department, causal inference) - Patricia Irigoyen, Peter Gregersen (North Shore
LIJ, RA data)
27(No Transcript)