Bioinformatics Tools for Biomarkers Discovery - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

Bioinformatics Tools for Biomarkers Discovery

Description:

Bioinformatics Tools for Biomarkers Discovery Stephen GRANITE & Aik Choon TAN Prof. Raimond L. Winslow rwinslow_at_jhu.edu, Director, CCBM, – PowerPoint PPT presentation

Number of Views:203

Avg rating:3.0/5.0

Slides: 26

Provided by: actan

Category:

more less

Transcript and Presenter's Notes

Title: Bioinformatics Tools for Biomarkers Discovery

1
Bioinformatics Tools for Biomarkers Discovery

Stephen GRANITE Aik Choon TAN
Prof. Raimond L. Winslow rwinslow_at_jhu.edu,
Director, CCBM,
Prof. Donald Geman geman_at_jhu.edu,
Prof. Daniel Naiman daniel.naiman_at_jhu.edu,
Lei Xu leixu_at_jhu.edu,
Troy Anderson troy_anderson_at_jhu.edu
The Institute for Computational Medicine and
Center for Cardiovascular Bioinformatics and
Modeling (CCBM),
Johns Hopkins University

IBM/CCBM Post-Doc Research Fellow actan_at_jhu.edu
Director, Software/Database Development sgranite_at_j
hu.edu
2
Biomarkers Discovery Workflow
Clinical Applications
Candidate Biomarkers
Sample Collection
Follow-up Study
Decision Rules
Patients
Transcriptomics Pipeline
MAGE-DB2
Machine Learning
Store
Gene Expression Profiling
Relative Expression Reversal Classifiers
Experiments
Query
PROTEIN-DB2
Store
Mass Spectrometry
Query
Proteomics Pipeline
Available at CCBM
Store
Difference Gel Electrophoresis
3
Outline

Multi-scale biomedical data repositories
System Architecture
Relative Expression Reversal Classifiers
TSP k-TSP classifiers
Microarray Gene expression data
Results on binary multi-class disease
classification problems
Data Integration and Cross-platform analysis
Difference Gel Electrophoresis (DIGE) Proteomics
data
Results on disease classification
Conclusions

4
Multi-scale Biomedical Repositories

The MAGE-DB2 Project is developing a full
relational mapping of the MicroArray Gene
Expression (MAGE) object model (OM) optimized to
run on IBMs scalable, parallel database DB2.
The PROTEIN-DB2 Project is developing an
open-source relational implementation of the
Protein Standards Initiative (PSI) object model
for storing complete descriptions of a range of
proteomic experimental data and analyses.

(Granite et al)
5
PROTEIN-DB2 Primary Data / Analysis Storage

Two-dimensional Gel Electrophoresis
Images/Analyses
2D-PAGE / Nonlinear Dynamics Progenesis Analysis
DIGE / GE Amersham DeCyder Analysis
Two-dimensional Liquid Chromatography
Beckman-Coulter PF2D primary data
Protein Array
Beckman-Coulter A2 primary data
Mass Spectrometry (MS) primary data / mzXML
translation
Applied Biosystems Voyager
ABI/SCIEX QStar
Shimadzu Axima
ThermoFinnigan LCQ and LTQ
MS Search Results
Matrix Sciences Mascot HTML and XML output

(Granite et al)
6
MAGE-DB2/PROTEIN-DB2 Architecture
(Granite et al)
7
MAGE-DB2/PROTEIN-DB2 Webpages
http//proteomics.jhu.edu/dl/pathidb.php
http//lpar4.wbmei.jhu.edu/wps/portal
(Granite et al)
8
Relative Expression Reversal Classifiers

Pairwise rank-based comparisons (relative
expression values within each array)
Generates accurate and simple decision rules
TSP classifier Top Scoring Pair
k-TSP classifier k-disjoint Top Scoring Pairs
Data driven, parameter-free learning algorithm
Performance comparable to or exceeds that of
other machine learning methods
Easy to interpret, facilitating follow-up study
(small number of genes)

(Tan et al., 2005, Bioinformatics, 213896-3904)
9
Rank-based Classification

Novelty Replace the measured expression values
by their ranks within profiles, hence obtaining
invariance to normalization.
Example Differentiate between classes by finding
pairs of genes whose ordering typically changes
from Normal to Disease.
Simple Interpretation Inversion of mRNA
abundance.

(From D. Geman)
10
TSP Classifier

For each pair of genes (i, j), i ? j, 1 i, j
G, compute
Pij(Normal) (Ri gt Rj / Normal)
Pij(Disease) (Ri gt Rj / Disease)
?ij Pij(Normal) Pij(Disease)
Select only the top scoring pairs
(i, j) ?ij ?max
TSP classifier (hTSP) is based on these pairs
Example Let all the top scoring pairs vote
(Geman et al, 2004)
Example Select one unique top scoring pair,
based on maximizing difference in ranks (i, j)
(Tan et al, 2005)
Prediction Suppose Pij(Normal) gt Pij(Disease),
xnew new profile
If, on the other hand, if Pij(Disease) gt
Pij(Normal), then the decision rule is reversed.

(Tan et al., 2005, Bioinformatics, 213896-3904)
11
k-TSP Classifier

Uses exactly k top disjoint pairs in prediction.
k is determined by internal cross-validation
Ensemble learning to combine the discriminating
power of many weaker rules to make more
reliable predictions.
Prediction
Suppose xnew new profile, each gene pair (iu,
ju), u 1,, k, votes according (1).
The k-TSP classifier hk-TSP employs an unweighted
majority voting procedure to obtain the final
prediction of ynew.

(Tan et al., 2005, Bioinformatics, 213896-3904)
12
Microarray Data Sets
(Binary class Problems)
(Multi-class Problems)
(Tan et al., 2005, Bioinformatics, 213896-3904)
13
Results(LOOCV Binary Class Problems)
Number of Informative Genes
(Tan et al., 2005, Bioinformatics, 213896-3904)
14
Results(Test Accuracy for Multi-Class Problems)
Number of Informative Genes
(Tan et al., 2005, Bioinformatics, 213896-3904)
15
(a) TSP
ALL
AML
IF SPTAN1 ? CD33 THEN ALL ELSE AML ? 0.9787
(b) k-TSP
IF SPTAN1 ? CD33 THEN ALL ELSE AML ?
0.9787 IF HA-1 ? ZYX THEN ALL ELSE AML ?
0.9787 IF TCF3 gt APLP2 THEN ALL ELSE AML ?
0.9574 IF ATP2A3 ? CST3 THEN ALL ELSE AML ?
0.9387 IF DGKD gt MGST1 THEN ALL ELSE AML ?
0.9387 IF CCND3 ? NPC2 THEN ALL ELSE AML ?
0.9387 IF TOP2B gt PLCB2 THEN ALL ELSE AML ?
0.9387 IF Macmarcks ? CTSD THEN ALL ELSE AML ?
0.9362 IF PSMB8 ? DF THEN ALL ELSE AML ?
0.9200
Genes previously identified by Golub et al
(1999)
(Tan et al., 2005, Bioinformatics, 213896-3904)
16
Direct Data Integration
Lab A
Lab X
Lab B
Lab Y
Lab C
(Lei Xu et al, 2005, Bioinformatics, 213905-3911)
17
Data Sets
(Lei Xu et al, 2005, Bioinformatics, 213905-3911)
18
TSPs from Data Integration
(Lei Xu et al, 2005, Bioinformatics, 213905-3911)
19
Results on Test Set
Comparisons of Marker TSP with Individual TSPs
(Lei Xu et al, 2005, Bioinformatics, 213905-3911)
20
Marker TSP for Prostate Cancer

HPN (Hepsin) biomarker candidate for prostate
cancer
STAT6 (Signal transduction and translation
protein)

IF HPN gt STAT6 THEN Prostate Cancer ELSE Normal
PSA (Prostate Specific Antigen) Sn 67.5 80
, Sp 60 - 70 TSP (HPN, STAT6) Sn 91.7,
Sp 97.7 (From this study!)
(Lei Xu et al, 2005, Bioinformatics, 213905-3911)
21
DIGE Technology
(From http//www5.amershambiosciences.com)
Proteomics Data
Experimental Settings
Gels
18 experiments Cy2 Internal Standards (18) Cy3
Cancer gels (18) Cy5 Normal gels (18) 1098
protein spots (BVA ratios from DeCyder software)
(Troy Anderson et al)
22
Decision Rule
Decision Rule IF Ratio530 ? Ratio786 THEN
Cancer, ELSE Normal. LOOCV Results Accuracy
97.2 (35/36) Sensitivity 100
(18/18) Specificity 94.4 (17/18)
(Troy Anderson et al)
23
Protein Marker Spots
(Troy Anderson et al)
24
http//www.ccbm.jhu.edu
25
Conclusions

Bioinformatics tools to facilitate biomarkers
discovery
k-TSP is comparable with the state-of-the-art
classifiers (PAM, SVM) in classifying gene
expression profiles
k-TSP generates simple and accurate decision
rules
Biological significance
Easy to interpret
Potential clinical applications
Allow direct data integration without
performing normalization
Allow cross-platform analysis
Applicable to a wide-range of high-throughput
data