Title: Genetic Diagnosis: are we there yet
1Genetic Diagnosis are we there yet?
- Wentian Li, Ph.D
- Robert S Boas Center for Genomics and Human
Genetics
2Personal genomics (SNP level)
3(No Transcript)
4George Churchs Personal Genome Project
http//www.personalgenomics.org/
5David Duncans Experimental Man Project
human guinea pig
molecular autobiography
http//experimentalman.com/
6Genetic factors
Environmental factors
7Confusion matrix
classification rate (ad)/n
nabcd
Case-control with 11. cant calculate risk,
attributable risk, etc.
classif. error(bc)/n
actually affected
actually healthy
type-I error
Classifier says DIS
Classifier says HEA
type-II error
sensitivitya/(ac)
specificityd/(bd)
power
proportion of cases explained
classrate(v2) (sensp)/2
odd-ratio (OR) ad/bc
8Relationship between classification rate and
odd-ratio
9Classification rate by an association signal
- p-value of the association signal tells us almost
nothing about the classifi. rate - Classifi. rate doesnt necessarily increase with
OR (not one-to-one) - OR2 ? max(classifi.rate)0.75, OR1.5 ?
max(classifi. rate)0.66 - Two complementary (a very rare condition) OR2
signals? max(classifi.rate)0.750.250.750.9375
10Rheumatoid arthritis data
- HLA-DRB1 (ch6) (Stastny, NEJM, 1978)
- PTPN22 (ch1) (Begovich et al. AJHG,2004)
- STAT4 (ch2) (Remmers et al. NEJM, 2007)
- BLK (ch8) (SLE Hom et al. NEJM,2008)
- TRAF1 (ch9) (Plenge et al. NEJM, 2007)
11Gene annotation
- HLA-DRB1 DRA-DRB heterodimer is anchored in
membrane for peptide presentation - PTPN22 a protein tyrosine phosphatase mainly
expressed in lymphoid tissue - STAT4 a member of STAT family of transcription
factors, mediating responses of IL12 in
lymphocytes - BLK B lymphoid tyrosine kinase
- TRAF1 a TNF-receptor associated factor
12Univariate performance
13HLA-DRB1
Dominant model has higher power, recessive model
has lower false-positive rate.
14ROC for HLA-DRB1-based classifier
receiver operating characteristic
15Multivariate analyses
- Multi-dimensional scaling
- Two-gene and multi-gene logistic regression
- Subset analysis
- AND/OR classifiers
- Recursive partitioning (decision tree) and
ensemble of trees (random forest)
161) Multidimensional scaling
172) Two-gene logistic regression
Each gene is represented by two dummy variables
baseline classRate is 59.8, though (sensp)/250
18Logistic regression with gt 2 genes
193) HLA-DRB1, SE1 samples only
27 clusters, as if only 3 genes (STAT4 doesnt
contribute). Overlapping pts.
20Single-gene classifier for SE1 samples only
214) Additive classifier (counting the total number
of risk alleles in 5/4/3 genes)
Threshold of three copies of risk alleles seem to
be optimal
22Counting the number of loci that contain risk
alleles
235) Recursive partitioning (decision tree)
81
97
DRB1
PTPN22
65
TRAF1
63
BLK
57
24Random forest
25Conclusions
- The relationship between OR and classification
rate is not 1-to-1 - One should not give sensitivity value (proportion
of cases explained) without giving the
specificity value (1-false positive rate) - For RA, classification rate above 70 can be
reached, mainly due to DRB1 - DRB1 SE1 is the difficult group to classify
- For complex diseases in general, with OR1.11.5
signals, classification rate (genetic diagnosis)
will not be great - When the true causal factors are not included in
the data (e.g. environmental, CNV,) we should
not expect 100 classification rate
26thanks
- Peter Gregersen
- Jan Freudenberg
- Jinfeng Xu (National Univ. Singapore)
27Future works
- Classification rate within stratified groups
- Relationship between heritability (defined in
quantitative traits) and classification rate - How many genes/loci are needed to explain all
genetic contribution to the disease. end of
genetic study? but beginning of gene-environment
study!