Title: All Systems Go proteomics, microarrays and biomarkers
1All Systems Go! proteomics, microarrays and
biomarkers
- Howard Edenberg, Samiran Ghosh, Dan Li, Xiaoman
Li, Lang Li, - Yunlong Liu, Malika Mahoui, Jeanette McClintick,
- Predrag Radivojac, Pedro Romero, Changyu Shen,
- Haixu Tang
- 2007 Indiana University Computational Biology and
Bioinformatics Retreat
2Our Group
Howard Edenberg
Xiaoman Li
Predrag Radivojac
Chancellor Prof. Biochemistry Ctr for Med
Genomics IUSM
Assistant Prof. Biostatistics/CCBB IUSM
Assistant Prof. School of Informatics IUB
Samiran Gosh
Yunlong Liu
Pedro Romero
Assistant Prof. Biostatistics/CCBB Ctr for Med
Genomics IUSM
Assistant Prof. Math. Dept. IUPUI
Assistant Prof. School of Informatics IUPUI
Shuyu Dan Li
Malika Mahoui
Changyu Shen
Sr. Scientist Informatics Eli Lilly and Comp
Assistant Prof. School of Informatics IUPUI
Assistant Prof. Biostatistics IUSM
Lang Li
Jeanette McClintick
Haixu Tang
Assistant Prof. Biochemistry Ctr for Med
Genomics IUSM
Associate Prof. Biostatistics IUSM
Assistant Prof. School of Informatics IUB
3Our group
Institution
Title
4Outline
- Overview of the field
- Important topics
- Highlights of the research contributions from our
group - Potential collaboration opportunities
- Discussion (Haixu Tang)
5Fishing expeditions vs. hypothesis-driven
It (the human genome project) was no more than a
big fishing expedition, a mindless factory
project that no scientists in their right minds
would join.
Data- and technology-driven studies are not
alternatives to hypothesis-driven studies, but
are complimentary and iterative partners with
them.
6Hypothesis/data-driven research
Kitano, Science 2002 Vol. 295. no. 5560, pp.
1662 - 1664
7Key steps of high-throughput analysis
Biological question
- Experimental design
- Biological techniques
- Sample preparation
- High-throughput platform selection
-
Experimental process
- Quantification and data transformation
- Normalization
- Data cleaning
- Statistical analysis
- Multiple hypothesis testing
-
Data extraction
- Regulatory networks
- Protein-protein/DNA interaction
- Networks/pathway analysis
- Sequence analysis
- Functional analysis
-
Data interpretation and modeling
Biological application
8Important topics in the field
- Platform selection, evaluation, and analysis
(McClintick, Li) - High dimensional data vs. underpowered experiment
(Gosh, McClintick) - Integration of experimental data and biological
knowledge to improve detection accuracy (Shen,
Li) - Integration of data from different technologies
(Tang) - Blind vs. targeted biomarker discovery (Tang)
- Huge amount of data vs. limited knowledge
(McClintick) - Software (McClintick)
- Data sharing (MIAME standard and GEO database)
(McClintick)
9- highlights of the research
- contributions from our group
10SAGE vs. microarray
- Results
- Significant discrepancies between the two
platforms only 30-40 genes exhibited positive
correlations - The discrepancies are not caused by heterogeneity
of tissue sources, microarray probe designs, mRNA
abundance, or gene function - Reason
- Errors in SAGE tag annotation
- Splice variants
- SAGE tags and array probesets represent different
regions of the same genes
Li S., Li Y. H., Wei T., Su E. W., Duffin K., and
Liao B. (2006). Biology Direct 1, 33
Shuyu Dan Li Eli Lilly
11Removing junks from valuables
Affymetrix platform Detection calls for each
probe set Present, Marginal, and Absent
Pre-filtering of microarray data to improve false
discovery rate.
Use of a threshold fraction of Present detection
calls (derived by MAS5) provided a simple method
that effectively eliminated from analysis probe
sets that are unlikely to be reliable while
preserving the most significant probe sets and
those turned on or off it thereby increased the
ratio of true positives to false positives.
Howard Edenberg
McClintick JN, Edenberg HJ. BMC Bioinformatics
2006, 749.
Jeanette McClintick
12Measuring undetectables?
Go fishing!
Conclusion Lake monroe has two times yellow fish
than blue fish.
- Peptide detectability
- Probability of observing a peptide in a standard
sample - An intrinsic property of the peptide sequences.
Predrag Radivojac
Protein abundance Protein detectability
Protein measurement
Tang et al. Bioinformatics. 2006 Jul
1522(14)e481-8.
Haixu Tang
13Finding partners
- Using an empirical Bayes model to analyze yeast
two-hybridization data. - Around 1 of the protein pairs are interacting
partners. Multi-protein pull-down experiment has
high specificity but mediocre sensitivity
(50-70) - There should be an average of about 20 true
associations per MPC (multiprotein complex),
almost 10 times as high as was previously
estimated.
Changyu Shen
Lang Li
Shen et al., Proteins function, structure, and
bioinformatics, 2006, 64, 436443
14Sampling motifs from my root
- Methods finding motifs by using (1)
overrepresentation and (2) evolutionary
conservation properties of motifs - Contribution
- Applicable to divergent species where alignment
is unrealiable - Greatly improved prediction accuracy.
Xiaoman Li Biostat, IUSM
Li et al. (2005) PNAS 1029481-6. Li et al.
(2005) PNAS 10216945-0.
15Finding controllers
Understand how transcription factors work
cooperatively to lead this global gene expression
patterns to emerge.
Quantitative relationship
Howard Edenberg
Liu et al. Genomics. 2006 Oct88(4)452-61.
Yunlong Liu
16Protein-DNA binding pattern matters
The first set of transcription factor binding
patterns that distinguish the ERa up/down
regulated targets in breast cancer cell lines.
(Li et al., Bioinformatics, 2006, 22 2210-2216)
Lang Li Biostat, IUSM
17Collaboration with other areas
- Databases and Datamining
- Networks and Pathways
- Proteomics, Microarrays, Biomarkers
- Structure and Function
- Machine Learning and Prediction
- Mutations and Disease
- Protein Ligand Interactions
- Cheminformatics and Cyberinfrastructure
- Academic Matters
18Collaborations with other areas
Biological question
Experimental process
Data extraction
Machine Learning and Prediction Dataming
Networks and Pathways Structure and Function
Data interpretation and modeling
Mutations and Disease
Protein Ligand Interactions
Biological application
Databases