Title: Introduction to BioinformaticsPractical Applications
1Introduction to Bioinformatics/Practical
Applications
- Uma Chandran, PhD,MSIS,
- Department of Biomedical Informatics
- University of Pittsburgh
- chandranur_at_msx.upmc.edu
- 412-623-7841
- 10/20/08
2My Background
- Bioinformatics Service
- UPCI
- Department of Biomedical Informatics
- Clinical Genomics Facility
- Runs expression, SNP and microRNA microarrays
- Bioinformatics tightly integrated with data
analysis - Expression, SNP, proteomic, integration of
proteomic and genomic data
3Outline of course
- What is Bioinformatics?
- Bioinformatics Concepts and Challenges
- Role in Pathology Cancer Biomarker Discovery
- Role in Translational research
4What is Bioinformatics?
- http//en.wikipedia.org/wiki/Bioinformatics
- Application of information technology to
molecular biology - Databases
- Algorithms
- Statistical techniques
5Bioinformatics in Pathology
- Cancer Biomarker discovery
- Prognostic markers
- Good v bad outcome
- Predictive markers
- Will patient benefit from a particular drug
treatment - Pharmacogenomic markers
- Effect of drug on tumor
- Genomic and proteomic technologies use
bioinformatics approaches - Potential to impact diagnostics
Nature 452548
6Why should pathologists develop bioinformatics
skills?
- Pathology will see a gradual shift from cell
morphology to molecular structure and function of
cells as an adjunct to the diagnostic process.
In the future, pathologists must be able to
render a tissue diagnosis - most particularly for
but not necessarily confined to tumorsthat takes
advantage of not only morphologic observations
but also related to genomic and proteomic testing
Bruce Friedman, U Mich
7What do these studies have in common?
- Use high throughput platforms to characterize
- DNA level single nucleotide polymorphisms
(SNP), CN and LOH SNP chips - Changes in mRNA profile -chips
- Changes in protein levels mass spec
- Tissue microarrays Staining many samples
simulatenously - microRNA microRNA chips
8 High throughput technology
- Ability to interrogate 1000s of genes/proteins
- Genomics
- DNA
- SNPs CN and LOH with disease
- RNA
- RNA profile with disease and normal
- Proteomics
- Mass Spec
- Protein Arrays
- Citations
- 2000 - 400 articles 2008 - 20000 articles!!!
9Expression chips
- Chip
- Probes for genes
- Labeled samples
- Hybridized to chips
- Detection
- Quantitation
- Comparison
10Bioinformatics (Biology Computation)
- Analysis
- Algorithms, statistics
- Advanced Analysis and visualization
- Annotation, pathways
- Databases to store and share, annotate
information
Lincoln Stein just another tool, like the
microscope to study biology
11History of Bioinformatics
- Computational Biology
- Margaret Dayhoff was pioneer
- 1965 - Created searchable protein databases
- 1982 DNA database GenBank
- Bioinformatics apps
- algorithms for assessing similarity
- What changes in proteins are tolerated?
- Search protein or DNA sequence for similarity
12Bioinformatics at your desktop
- Studies conducted by clinicians/scientist
- Translational benchside to bedside
- Need interdisciplinary team
- Clinicians, bioinformaticians, statisticians
- Researchers needs a basic understanding of the
methods
13Biomarker studies
- Prostate Cancer
- Gene expression profiles in tumors, adjacent
normals, donor normals, metastatic samples - microRNA regulation
- Tissue microarray study to look at fatty acid
synthase in prostate cancer - Endometrial Cancer
- Expression profile of Early Stage Cancer versus
Serous tumors - Proteomic profiles
- Renal Cell Carcinoma
- Copy Number and LOH
- Glioblastoma
- CN and LOH studies
- Thyroid Cancer
- CN and LOH studies
- Lung Cancer
- Serum and tisue proteomic profiles
- Ovarian cancer
- Biomarker profile
-
-
Acharya et al JAMA (2008) 1574
14Bionformatics Concepts 1
15 Experimental Design
SAMPLES
- Experimental Design is Critical!!!
- Chips are expensive
- What is the question
- Are there enough samples?
- Grade/Stage, Mets, Normal
- Row (genes), Columns (Samples)
- Consult with statistician before
G E N E S
16University of Pittsburgh Molecular
Reclassification of Prostate Cancer Study
- 60 tumors, AN, Donor, 25 Mets
- ObjectiveIdentify expression profiles for tumor,
normal, grade, stage, outcome, mets Results - Tumors different from donor normals
- Field effect
- Mets also a different profile specific pathways
- Unequal distribution of grade stage
- Unequal distribution by outcome
- 4 Met patients but multiple samples
- Are Met patients different from tumor?
- 4 v 60
- Are metastatic samples different from tumor
- 25 v 60
- Profile for organ of metastatis?
- Not enough samples
- Comparison to other prostate studies
- Different normal used small studies
- Different distribution of stages/grades
17Bioinformatics Concept 2
18Need for computational methods
- Data Management
- Each file for a chip experiment is large
- 100MG x 10 1G
- Generates Gigabytes of data
- Data analysis
- 1000s of genes (or SNPs) and few samples
- How to find differences between samples
- What statistical methods to use?
- Like finding needle in a haystack
19Analysis
- High dimensional data
- Data in proprietary formats
- How to open files
- How to analyze
- Are there commercial or academic software?
- Time-intensive, need dedicated time
- Statistical methods
- T tests, non parametric, others
- How to interpret results
- Fold change, p value, gene lists
- Pathways
- Resources
- Consult biostatisticians
- Bioinformaticians Core Service at UPCI
- HSLS offers software licenses
- Courses
- Microarray
- SNPs
- Statistics
20Data analysis
- Class discovery
- Are there novel subclasses within data?
- Class comparison
- How are tumor and normal different in expression?
- Which SNPs are different?
- Class prediction
- Predict class of new sample
- Advanced pathway Analysis
21Unsupervised analysis Class Discovery
- Are there novel subgroups that can be discovered
based on expression profiles - Need both analysis and visualization tools
- Hierarchical clustering, SOM, K means
- Principal component
- Challenge
- Discovery methods are borrowed from other domains
- Do not necessarily represent biological data
22Data analysis Class comparison Expression (mRNA)
- Expression study
- Tumor v Normal Mets v organ confined
- Genes that are differentially regulated
- Statistics
- What are the underlying assumptions?
- Is it normally distributed
- How to set cutoff?
- Multiple testing correction
23Single nucleotide polymorphisms (SNPs)
- Millions of variants
- Variants may be associated with disease
- SNP study
- Characterize in cancer
- Amplification, deletion (LOH), copy neutral
deletion, Amplification and simultaneuous
deletion - Higher resolution than CGH and detect LOH and
amplification on same data set - Consult bioinformatics group
24SNP chips to detect LOH
- Call are
- AB (heterozygous)
- BB/AA (homozygous)
- Compare tumor and normal samples
- Normal AB
- Tumor BB (or AA)
- This is LOH
- Is it in a known gene?
- Function of genes
25SNPs to detect Copy Number changes
amplification
amplification
diploid
deletion
26Challenges in SNP analysis
- Not many available tools
- SNP 6.0 measure about a million SNPs
- How to separate noise from real data
- Normal contamination
- How to confidently detect a region of copy number
changes - Statistical methods HMM, CBS
- How to link outputs to genomic information
27SNP study
- Whole genome SNP arrays as a potential diagnostic
tool for the detection of characteristic
chromosomal aberrations in renal epithelial
tumors Hagenkord et al - Methods very different from expression arrays
- Again, experimental design key!!!!
- Which normal, paired or unpaired
- Sample size enough cases to answer question
28(No Transcript)
29(No Transcript)
30Pathway Analysis
31Bioinformatics Concepts 3
- Data Quality (Variation on Experimental design)
32Molecular Markers - Expression
- Prognostic - breast cancer gene expression
signatures - Multigene recurrence expression signature
- ER with luminal A and luminal B subtypes
- ER with Her2 and basal subtypes
- Potentially predictive signature of genes
- Oncotype DX (Genomic Health), MammaPrint
(Agendia) and the H/I test (AviaraDx) - ER, lymph node negative patients
- Calculates a recurrence score
- 4000
- decide who should receive systemic therapy to
eliminate any remaining tumour cells (that is,
adjuvant therapy) after surgery, to reduce the
risk of relapse. - AMACR in prostate cancer
- SNP associated with gastric cancer PSCA gene
33Why more biomarkers not in clinical use
- QC
- Tissue
- Experimental
- Clinical
- Small Study sizes
- Experimental design
- Analytical Methods
34Variability in Tissues
- Sample QC
- tissue acquistion details
- Warm ischemia time,
- Storage medium
- Bulk, microdissection
- Sampling details, how far was tumor from normal
- What percentage of the sample was epithelial or
stromal - Experimental details such as quality of RNA,
methods - SNP studies
- FFPE v Frozen difficult to compare
- Samples from different lab are difficult to
compare - How to generate normals? Paired normals, lab
normals, Hap Map normals
35Experimental variability
- Expression
- Two rounds of amplification v single round
- Time for amplification
- Kits used
- Platform specific difference
- SNP
- Processing differences for frozen and FFPE
- fragmentation
36Clinical information
- Correlation between profiling subgroups and
outcome? Biomarker discovery - Patient annotation
- Clinical variables not available not often in
publications - Even if author contacted, may not be possible to
obtain - Outcome -cancer registries
- Case information - EMR
37Analytic methods many studies, many methods
Dupuy and Simon, JNCI 2007
38Bioinformatics Concepts 4
- Infrastructure for translational research
39Databases - Rich, highly annotated data sets
- Translational research benchside to beside
- deidentified clinical annotation, outcomes
- Tissue inventory and annotation
- Experimental data management
- Analysis tools
40Challenges
- Institutional differences in tissue banking and
access to deidetified Clinical info, Outcome from
Registry - Honest Broker, IRB
- Tissue annotation
- Different requirement for data elements
- Annotation of archival versus fresh
- Annotation for clinical trial
- Customizable
- LIMS for experimental
- Unlike traditional LIMS, dynamic every changing
landscape - Large data sizes, management challenges
- How to provide data to researchers, archival
- Need infrastructure and
- Analysis environment
- Integrate clinical and research data in a
dynamic, customizable analysis interface
41Translational research workflow
42INTEGRATION VISION
Pathology Department
Cancer Registry
Tissue Bank
Clinical Gene Expression Labs
Proteomics Labs
CoPath LIS
IMPATH
caTissue
De-identification
Organ Specific Databases (of clinical and
outcomes data) Oracle 9i
Repository Annotated, deidentified clinical,
tissue, experimental
Analysis Interface
43Biorepository- use this towards the end
- Pathology also must understand its stewardship of
a resource that is likely to become more critical
to bioinformatics research the tissue or
biorepository. Says Dr. Becich - Every tissue form surgical benches and every
blood sample that comes into our clinical
pathology laboratories has to be highly managed
and refined to allow controlled genomic and
therapeutic investigation - Clinical annotation, processing annotations are
critical
44Clinical Annotation
- Deidentified integrated view of information from
cancer registry and patient/tissue annotation - Challenge
- Develop organ specific databases
- Multiple data source
- What do researchers and clinicians need
- is data available
- Develop standards
45Data Sharing
- Need to get data to other investigators
- Consortia
- PI group
- Public
- Data are very large so need a way to exchange
experimental data - Annotate using standard vocabulary
- Clinical, tissue, experimental
- Provide a minimum set of annotations so that the
data is useful and can be analyzed - Interface for mining, analysis
46National efforts to build a translational
research environment
- Need for the informatics infrastructure to
facilitate translational research - data management, analysis, data annotation,
integration, exchange - caBIG Cancer Bioinformatics Grid
- Enable translational research, biomarker
discovery - Grid computing, interoperable tools
- Many pieces have to come together in a workflow
- caTISSUE, caTIES, caArray standards based
- Clinical Translational Sciences Award (CTSA) to
Pitt - Pilot grants for translational studies where a
researcher has to work with a clinician - Example of studies expression and or/SNP with
many diseases
47Challenges
- Cancer Informatics Services
- Tissue Banking Tools
- Registry, Honest Broker
- Clinical Trials Management
- IT support
- Bioinformatics Service
- Clinical Genomics
- Clinical Protoemics
48Conclusion
- Translational research and Bioinformatics are
evolving rapidly - Stay tuned!!