Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers

1 / 29
About This Presentation
Title:

Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers

Description:

Centre of Excellence in Cancer Genetics Faculty of Medicine University of Helsinki ... The Cancer Genome Atlas (TCGA) has published data from 500 GBM patients: ... –

Number of Views:285
Avg rating:3.0/5.0
Slides: 30
Provided by: SampsaHa
Category:

less

Transcript and Presenter's Notes

Title: Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers


1
Analysis and Integration of Large-scale Molecular
and Clinical Data in Cancers
  • Sampsa Hautaniemi, DTech
  • Systems Biology Laboratory
  • Institute of Biomedicine
  • Genome-Scale Biology Research Program
  • Centre of Excellence in Cancer Genetics
  • Faculty of Medicine
  • University of Helsinki

2
Table of Contents
  • The essence of systems biology Iteration and
    collaboration.
  • Iteration in ovarian cancer.
  • The essence of systems biology II Multi-level
    data.
  • Multi-levelity of breast cancer.
  • The essence of systems biology III Computation.
  • Anduril computational framework glioblastoma
    multiforme.

3
Systems Biology Iteration
Adapted from a slide by Peter Sorger
4
Ovarian Cancer
  • Epithelial ovarian cancer is the fifth most
    frequent cause of female cancer deaths, with an
    overall 5-year survival rate below 50.
  • The standard chemotherapy for high-grade serous
    ovarian cancer (HGS-OvCa) is platinum-taxane
    combination.
  • Majority of patients suffer relapse lt18 months.
  • No clinically applicable methods to predict the
    prognostic outcome or even to identify the
    patients unresponsive to current therapies.

5
Aims of the HGS-OvCa Study
  • To identify poor response and good response
    subtypes of HGS-OvCa.
  • Report biomarkers that allow to identify whether
    a HGS-OvCa patient responds to the platinum
    treatment.
  • We developed a computational method that
    integrates transcriptomics and clinical data in
    subtype finding step.
  • We used transcriptomics and clinical data from
    184 HGS-OvCa patients treated with platinum and
    taxane from TCGA repository.

6
Three Subtypes of HGS-OvCa
Chen et al. In preparation.
7
Validation, validation, validation
  • We also used an independent prospective HGS-OvCa
    cohort of 29 patients.
  • Data measured with qRT-PCR.

Chen et al. In preparation.
8
Pathway Analysis
  • Our pathway analysis (too) identified TR3 as a
    potential driver for platinum resistance.

9
TR3 Inhibition with Two Drugs
  • We identified two signaling pathway regulators
    for TR3 and associated inhibitors.
  • The use of two inhibitors should transform the
    HGS-OvCa cells sensitive to platinum.

AKT inh
AKT inh ERK5 inh
Chen et al. In preparation.
10
Systems Biology II Multi-level Data
  • While cancer cells are clearly visible the exact
    molecular causes for are still unknown.
  • Need to study cancer samples at multiple levels.

11
Multiple Levels of Data
100 samples lead to 200 million data points.
12
Multiple level data Estrogen Receptor
13
Why Is This Important?
  • Estrogen receptor is the most important clinical
    variable in determining how to treat a breast
    cancer patient.
  • There are several anti-cancer drugs targeting
    estrogen receptor pathway.
  • Currently unknown which tumors do not response to
    therapy.
  • Finding genes respond to estrogen receptor
    stimulus may give clues which genes are important
    in ER inhibition resistance.

Hugo Simberg Garden of Death
14
Data
  • We used chromatin immunoprecipitation combined
    with massive parallel sequencing (ChIP-seq) to
    determine genome-wide occupancy (eight time
    points) after estradiol stimuli in MCF-7 breast
    cancer cell line
  • Estrogene receptor a
  • RNA polymerase II
  • Histone marks (H3K4me3, H2A.Z)
  • These experiments resulted in gt2.0 billion data
    points to the initial analysis.

15
SYNERGY database
  • SYNERGY database is available and fully
    operational.
  • http//csblsynergy.fimm.fi/

16
Finding ER Responsive Genes
17
Results
  • We identified 777 estrogen receptor early
    responding genes.
  • Interestingly, the major estrogen receptor
    related changes in cells were due to non-genomic
    action.

18
Results
  • Next we searched for genes that have survival
    association in a breast cancer cohort of 150
    ER/HER2-/postmenopausal patients in The Cancer
    Genome Atlas (TCGA) cohort.
  • Based on Kaplan-Meier analysis we identified 23
    genes with survival plt0.05.
  • The best survival associated gene was ATAD3B.

19
Kaplan-Meier for ATAD3B
20
Intermission
  • Pol2 activity is much better way of searching for
    responsive genes to a cue that mRNA.
  • In deep sequencing, the sequencing depth is
    important (with our 200 mill. short-read Pol2
    data, we found many ER responsive genes not found
    in 20 mill. short-read GRO-seq).
  • How to systematically analyze multi-level data?

21
Multi-level Cancer Research Requires
Computational Methods
  • Storing the data and computing power are the
    first (but relatively small) hurdles.
  • Analysis of large-scale, heterogeneous data is
    much more challenging than single genomics or
    proteomics data analysis.
  • There is a need for computational infrastructure.
  • Writing an analysis program fast without proper
    infrastructure will lead to delays and errors in
    larger projects.

22
Infrastructure Anduril
  • Anduril is a computational framework to integrate
    large-scale and heterogeneous data, knowledge in
    bio-databases and analysis tools.
  • The main design principles are
  • Modular pipeline analysis approach
  • Scalable
  • Open source, thorough documentation
  • http//www.anduril.org/
  • Method written in any programming language
    executable from the command prompt can be
    included.
  • Produces automatically the result PDF and website
    containing the results.

23
Complex Pipelines Are Fragile
24
Glioblastoma Multiforme (GBM)
  • Glioblastoma multiforme (GBM) is one of the
    deadliest cancers.
  • The Cancer Genome Atlas (TCGA) has published data
    from gt500 GBM patients
  • comparative genomic hybridization arrays
  • single nucleotide polymorphism arrays
  • exon and gene expression arrays
  • microRNA arrays
  • methylation arrays
  • clinical data
  • Which genes or genetic regions have survival
    effect?

25
GBM Results in Anduril Website
26
Latest on moesin in GBM
27
(Sequence) Component Libraries
  • Over 400 Anduril components already available.
  • Pipelines
  • ChIP-seq (EMBO J 2011, Cancer Res 2012, ...)
  • RNA-seq (not published)
  • miRNA-seq (not published)
  • DNA methylation-seq (not published)
  • Whole-genome sequence exome-sequence (not
    published)
  • Image analysis (manuscript)

28
Summary
  • Characterization of a complex disease first
    requires identifying the key variables.
  • This requires integration data from multiple
    levels, iterative mode of research and
    collaboration.
  • Multi-level data integration requires
    computational infrastructure and data-intensive
    computing.
  • We have developed Anduril to organize large-scale
    data analysis projects (imaging, deep sequencing,
    database usage, conversions, etc.)
  • The need for computational infrastructure is
    evident in particular when analyzing deep
    sequencing data.
  • All our methods are (will be) freely available.

http//research.med.helsinki.fi/gsb/hautaniemi/sof
tware.html
29
Acknowledgements
Systems Biology Lab
Funding Academy of Finland Finnish Cancer
Organizations Sigrid Jusélius Foundation EU
FP7 ERA-NET SysBio Biocenter Finland Biocentrum
Helsinki
Collaborators Olli Carpén Henk
Stunnenberg George Reid Jukka Westermarck
Write a Comment
User Comments (0)
About PowerShow.com