Title: Introduction to BioinformaticsPractical Applications
1Introduction to Bioinformatics/Practical
- Uma Chandran, PhD,MSIS,
- Department of Biomedical Informatics
- University of Pittsburgh
- chandranur_at_msx.upmc.edu
- 412-623-7841
- 10/20/08
2My Background
- Bioinformatics Service
- Department of Biomedical Informatics
- Clinical Genomics Facility
- Runs expression, SNP and microRNA microarrays
- Bioinformatics tightly integrated with data
analysis - Expression, SNP, proteomic, integration of
proteomic and genomic data
3Outline of course
- What is Bioinformatics?
- Bioinformatics Concepts and Challenges
- Role in Pathology Cancer Biomarker Discovery
- Role in Translational research
4What is Bioinformatics?
- http//en.wikipedia.org/wiki/Bioinformatics
- Application of information technology to
molecular biology - Databases
- Algorithms
- Statistical techniques
5Bioinformatics in Pathology
- Cancer Biomarker discovery
- Prognostic markers
- Good v bad outcome
- Predictive markers
- Will patient benefit from a particular drug
treatment - Pharmacogenomic markers
- Effect of drug on tumor
- Genomic and proteomic technologies use
bioinformatics approaches - Potential to impact diagnostics
Nature 452548
6Why should pathologists develop bioinformatics
- Pathology will see a gradual shift from cell
morphology to molecular structure and function of
cells as an adjunct to the diagnostic process.
In the future, pathologists must be able to
render a tissue diagnosis - most particularly for
but not necessarily confined to tumorsthat takes
advantage of not only morphologic observations
but also related to genomic and proteomic testing
Bruce Friedman, U Mich
7What do these studies have in common?
- Use high throughput platforms to characterize
- DNA level single nucleotide polymorphisms
(SNP), CN and LOH SNP chips - Changes in mRNA profile -chips
- Changes in protein levels mass spec
- Tissue microarrays Staining many samples
simulatenously - microRNA microRNA chips
8 High throughput technology
- Ability to interrogate 1000s of genes/proteins
- Genomics
- SNPs CN and LOH with disease
- RNA profile with disease and normal
- Proteomics
- Mass Spec
- Protein Arrays
- Citations
- 2000 - 400 articles 2008 - 20000 articles!!!
9Expression chips
- Chip
- Probes for genes
- Labeled samples
- Hybridized to chips
- Detection
- Quantitation
- Comparison
10Bioinformatics (Biology Computation)
- Analysis
- Algorithms, statistics
- Advanced Analysis and visualization
- Annotation, pathways
- Databases to store and share, annotate
Lincoln Stein just another tool, like the
microscope to study biology
11History of Bioinformatics
- Computational Biology
- Margaret Dayhoff was pioneer
- 1965 - Created searchable protein databases
- 1982 DNA database GenBank
- Bioinformatics apps
- algorithms for assessing similarity
- What changes in proteins are tolerated?
- Search protein or DNA sequence for similarity
12Bioinformatics at your desktop
- Studies conducted by clinicians/scientist
- Translational benchside to bedside
- Need interdisciplinary team
- Clinicians, bioinformaticians, statisticians
- Researchers needs a basic understanding of the
13Biomarker studies
- Prostate Cancer
- Gene expression profiles in tumors, adjacent
normals, donor normals, metastatic samples - microRNA regulation
- Tissue microarray study to look at fatty acid
synthase in prostate cancer - Endometrial Cancer
- Expression profile of Early Stage Cancer versus
Serous tumors - Proteomic profiles
- Renal Cell Carcinoma
- Copy Number and LOH
- Glioblastoma
- CN and LOH studies
- Thyroid Cancer
- CN and LOH studies
- Lung Cancer
- Serum and tisue proteomic profiles
- Ovarian cancer
- Biomarker profile
Acharya et al JAMA (2008) 1574
14Bionformatics Concepts 1
15 Experimental Design
- Experimental Design is Critical!!!
- Chips are expensive
- What is the question
- Are there enough samples?
- Grade/Stage, Mets, Normal
- Row (genes), Columns (Samples)
- Consult with statistician before
16University of Pittsburgh Molecular
Reclassification of Prostate Cancer Study
- 60 tumors, AN, Donor, 25 Mets
- ObjectiveIdentify expression profiles for tumor,
normal, grade, stage, outcome, mets Results - Tumors different from donor normals
- Field effect
- Mets also a different profile specific pathways
- Unequal distribution of grade stage
- Unequal distribution by outcome
- 4 Met patients but multiple samples
- Are Met patients different from tumor?
- 4 v 60
- Are metastatic samples different from tumor
- 25 v 60
- Profile for organ of metastatis?
- Not enough samples
- Comparison to other prostate studies
- Different normal used small studies
- Different distribution of stages/grades
17Bioinformatics Concept 2
18Need for computational methods
- Data Management
- Each file for a chip experiment is large
- 100MG x 10 1G
- Generates Gigabytes of data
- Data analysis
- 1000s of genes (or SNPs) and few samples
- How to find differences between samples
- What statistical methods to use?
- Like finding needle in a haystack
- High dimensional data
- Data in proprietary formats
- How to open files
- How to analyze
- Are there commercial or academic software?
- Time-intensive, need dedicated time
- Statistical methods
- T tests, non parametric, others
- How to interpret results
- Fold change, p value, gene lists
- Pathways
- Resources
- Consult biostatisticians
- Bioinformaticians Core Service at UPCI
- HSLS offers software licenses
- Courses
- Microarray
- SNPs
- Statistics
20Data analysis
- Class discovery
- Are there novel subclasses within data?
- Class comparison
- How are tumor and normal different in expression?
- Which SNPs are different?
- Class prediction
- Predict class of new sample
- Advanced pathway Analysis
21Unsupervised analysis Class Discovery
- Are there novel subgroups that can be discovered
based on expression profiles - Need both analysis and visualization tools
- Hierarchical clustering, SOM, K means
- Principal component
- Challenge
- Discovery methods are borrowed from other domains
- Do not necessarily represent biological data
22Data analysis Class comparison Expression (mRNA)
- Expression study
- Tumor v Normal Mets v organ confined
- Genes that are differentially regulated
- Statistics
- What are the underlying assumptions?
- Is it normally distributed
- How to set cutoff?
- Multiple testing correction
23Single nucleotide polymorphisms (SNPs)
- Millions of variants
- Variants may be associated with disease
- SNP study
- Characterize in cancer
- Amplification, deletion (LOH), copy neutral
deletion, Amplification and simultaneuous
deletion - Higher resolution than CGH and detect LOH and
amplification on same data set - Consult bioinformatics group
24SNP chips to detect LOH
- Call are
- AB (heterozygous)
- BB/AA (homozygous)
- Compare tumor and normal samples
- Normal AB
- Tumor BB (or AA)
- This is LOH
- Is it in a known gene?
- Function of genes
25SNPs to detect Copy Number changes
26Challenges in SNP analysis
- Not many available tools
- SNP 6.0 measure about a million SNPs
- How to separate noise from real data
- Normal contamination
- How to confidently detect a region of copy number
changes - Statistical methods HMM, CBS
- How to link outputs to genomic information
27SNP study
- Whole genome SNP arrays as a potential diagnostic
tool for the detection of characteristic
chromosomal aberrations in renal epithelial
tumors Hagenkord et al - Methods very different from expression arrays
- Again, experimental design key!!!!
- Which normal, paired or unpaired
- Sample size enough cases to answer question
28(No Transcript)
29(No Transcript)
30Pathway Analysis
31Bioinformatics Concepts 3
- Data Quality (Variation on Experimental design)
32Molecular Markers - Expression
- Prognostic - breast cancer gene expression
signatures - Multigene recurrence expression signature
- ER with luminal A and luminal B subtypes
- ER with Her2 and basal subtypes
- Potentially predictive signature of genes
- Oncotype DX (Genomic Health), MammaPrint
(Agendia) and the H/I test (AviaraDx) - ER, lymph node negative patients
- Calculates a recurrence score
- 4000
- decide who should receive systemic therapy to
eliminate any remaining tumour cells (that is,
adjuvant therapy) after surgery, to reduce the
risk of relapse. - AMACR in prostate cancer
- SNP associated with gastric cancer PSCA gene
33Why more biomarkers not in clinical use
- QC
- Tissue
- Experimental
- Clinical
- Small Study sizes
- Experimental design
- Analytical Methods
34Variability in Tissues
- Sample QC
- tissue acquistion details
- Warm ischemia time,
- Storage medium
- Bulk, microdissection
- Sampling details, how far was tumor from normal
- What percentage of the sample was epithelial or
stromal - Experimental details such as quality of RNA,
methods - SNP studies
- FFPE v Frozen difficult to compare
- Samples from different lab are difficult to
compare - How to generate normals? Paired normals, lab
normals, Hap Map normals
35Experimental variability
- Expression
- Two rounds of amplification v single round
- Time for amplification
- Kits used
- Platform specific difference
- Processing differences for frozen and FFPE
- fragmentation
36Clinical information
- Correlation between profiling subgroups and
outcome? Biomarker discovery - Patient annotation
- Clinical variables not available not often in
publications - Even if author contacted, may not be possible to
obtain - Outcome -cancer registries
- Case information - EMR
37Analytic methods many studies, many methods
Dupuy and Simon, JNCI 2007
38Bioinformatics Concepts 4
- Infrastructure for translational research
39Databases - Rich, highly annotated data sets
- Translational research benchside to beside
- deidentified clinical annotation, outcomes
- Tissue inventory and annotation
- Experimental data management
- Analysis tools
- Institutional differences in tissue banking and
access to deidetified Clinical info, Outcome from
Registry - Honest Broker, IRB
- Tissue annotation
- Different requirement for data elements
- Annotation of archival versus fresh
- Annotation for clinical trial
- Customizable
- LIMS for experimental
- Unlike traditional LIMS, dynamic every changing
landscape - Large data sizes, management challenges
- How to provide data to researchers, archival
- Need infrastructure and
- Analysis environment
- Integrate clinical and research data in a
dynamic, customizable analysis interface
41Translational research workflow
Pathology Department
Cancer Registry
Tissue Bank
Clinical Gene Expression Labs
Proteomics Labs
CoPath LIS
Organ Specific Databases (of clinical and
outcomes data) Oracle 9i
Repository Annotated, deidentified clinical,
tissue, experimental
Analysis Interface
43Biorepository- use this towards the end
- Pathology also must understand its stewardship of
a resource that is likely to become more critical
to bioinformatics research the tissue or
biorepository. Says Dr. Becich - Every tissue form surgical benches and every
blood sample that comes into our clinical
pathology laboratories has to be highly managed
and refined to allow controlled genomic and
therapeutic investigation - Clinical annotation, processing annotations are
44Clinical Annotation
- Deidentified integrated view of information from
cancer registry and patient/tissue annotation - Challenge
- Develop organ specific databases
- Multiple data source
- What do researchers and clinicians need
- is data available
- Develop standards
45Data Sharing
- Need to get data to other investigators
- Consortia
- PI group
- Public
- Data are very large so need a way to exchange
experimental data - Annotate using standard vocabulary
- Clinical, tissue, experimental
- Provide a minimum set of annotations so that the
data is useful and can be analyzed - Interface for mining, analysis
46National efforts to build a translational
research environment
- Need for the informatics infrastructure to
facilitate translational research - data management, analysis, data annotation,
integration, exchange - caBIG Cancer Bioinformatics Grid
- Enable translational research, biomarker
discovery - Grid computing, interoperable tools
- Many pieces have to come together in a workflow
- caTISSUE, caTIES, caArray standards based
- Clinical Translational Sciences Award (CTSA) to
Pitt - Pilot grants for translational studies where a
researcher has to work with a clinician - Example of studies expression and or/SNP with
many diseases
- Cancer Informatics Services
- Tissue Banking Tools
- Registry, Honest Broker
- Clinical Trials Management
- IT support
- Bioinformatics Service
- Clinical Genomics
- Clinical Protoemics
- Translational research and Bioinformatics are
evolving rapidly - Stay tuned!!