Introduction to - PowerPoint PPT Presentation

1 / 63
About This Presentation
Title:

Introduction to

Description:

Introduction to – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 64
Provided by: desm5
Category:

less

Transcript and Presenter's Notes

Title: Introduction to


1
Introduction to Gene expression profiling
Christine Desmedt PhD Translational Research
Unit Université Libre de Bruxelles Institut Jules
Bordet Brussels, Belgium
2
  • Introduction
  • The technique
  • Standardization
  • Bio-informatics
  • Use of Gene expression profiling in breast cancer

3
Introduction
4
From genes to proteins
Replication
Transcription
Translation
DNA
Protein
RNA
Gene expression
Protein expression
SNPs Mutations Amplifications
Deletions Gene Fusions Chromosomal
aberrations
5
Sequence of the GENOME
Progress in technology and Bio-informatics
Era of molecular medicine
Gene expression profiles
6
Microarray experiments
  • Advantages
  • Measure expression of several thousands of genes
    simultaneously.
  • Possibility to discover important pathways and
    genes relevant to the clinical problem
  • Examine snapshot of gene expression of tumor
    and environment, rather than genes individually
  • Disadvantages
  • Huge volume of data produced- chance of false
    discoveries- fishing expeditions.
  • How to interpret and generate useful biological
    information remains a significant challenge
  • Clinical validation and utility of findings still
    remains unclear

7
The technique
8
Microarray platforms
  • Probe implementation
  • Full-length cDNA
  • Oligonucleotides (presynthesized, in situ
    synthesized)
  • Target-labeling strategies
  • Single-color detection
  • Dual-color detection

9
Probe implementation
cDNA
Oligonucleotide
Short (Affymetrix)
Long (Agilent)
10
Full-length CDNA Printing  in house 
384 well Plate
slide
Platter 100 slides
11
Printinh head and pin
12
Short oligonucleotide array Affymetrix
13
Short oligonucleotide array Affymetrix
Fully integrated instrument that maximizes data
reproducibility and laboratory productivity by
minimizing user intervention
14
Signal detectionDUAL color detection
15
Normal
Wavelength 635
Tumor
Wavelength 532
Composite image
16
Normal tissue Wavelenght 635 nm
1 2 3
1 2 3
Tumor Wavelenght 532 nm
1 2 3
Ratio (635 nm/532 nm)
17
Image Analysis
18
Results
19
Signal detectionSINGLE color detection
(Affymetrix)
Streptavidin-phycoerythrin (SAPE) Counterstained
with biotinylated anti-streptavidine
20
Signal detectionSINGLE color detection
(Affymetrix)
47,000 transcripts !
21
Signal detectionSINGLE color detection
(Affymetrix)
22
Signal detectionSINGLE color detection
(Affymetrix)
23
Standardization
24
Standardization (1)
  • MIAME guidelines
  • Information About a Microarray Experiment
    checklist helps define the level of detail that
    should exist and is being adopted by many
    journals as a requirement for the submission of
    papers incorporating microarray results.

25
Standardization (2)
MIAME (Minimal Information about a Microarray
Experiment) http//www.mged.org/Workgroups/MIAME/
miame.html
26
Short oligonucleotide array Affymetrix
Standardization (2) public repositories
  • Gene Expression Omnibus (GEO)
  • (http//www.ncbi.nlm.nih.gov/geo/)
  • a gene expression/molecular abundance repository
    supporting MIAME compliant data submissions, and
    a curated, online
  • resource for gene expression data browsing,
    query and retrieval.
  • Array Express
  • (http//www.ebi.ac.uk/arrayexpress/)
  • ArrayExpress is a public repository for
    microarray data, which is aimed at storing
    MIAME-compliant data in accordance with
    recommendations. The ArrayExpress Data Warehouse
    stores gene-indexed expression profiles from a
    curated subset of experiments in the repository.

27
Short oligonucleotide array Affymetrix
Standardization (2) public repositories
28
Short oligonucleotide array Affymetrix
Standardization (2) public repositories
29
Standardization MIAME
  • The raw data for each hybridisation (e.g., CEL or
    GPR files)
  • The final processed (normalised) data for the set
    of hybridisations in the experiment (study)
    (e.g., the gene expression data matrix used to
    draw the conclusions from the study)
  • The essential sample annotation including
    experimental factors and their values (e.g.,
    compound and dose in a dose response experiment)
  • The experimental design including sample data
    relationships (e.g., which raw data file relates
    to which sample, which hybridisations are
    technical, which are biological replicates)
  • Sufficient annotation of the array (e.g., gene
    identifiers, genomic coordinates, probe
    oligonucleotide sequences or reference commercial
    array catalog number)
  • The essential laboratory and data processing
    protocols (e.g., what normalisation method has
    been used to obtain the final processed data)

30
Standardization (3) MAQC
  • MAQC initiative
  • MicroArray Quality Control (MAQC) Project" is
    being conducted by the FDA to develop standards
    and quality control metrics which will eventually
    allow the use of MicroArray data in drug
    discovery, clinical practice and regulatory
    decision-making.

31
  • 4 mRNA samples
  • 5 replicates
  • 6 microarray platforms
  • 3 laboratories

Nature Biotechnology, 24, 9, 1151-1161, 2006
32
Coefficient of variation
  • 4 mRNA samples
  • 5 replicates
  • 6 microarray platforms
  • 3 laboratories

CV 5-15 within laboratories 10-20 between
laboratories
Nature Biotechnology, 24, 9, 1151-1161, 2006
33
MAQC Findings
Microarray data are
  • Repeatable within a laboratory
  • Reproducible across laboratories
  • Concordant across platforms
  • Comparable with quantitative technologies, e.g.,
    QPCR
  • Reflective of biology regardless of the
    differences in technology.

if we look at differential gene expression in
terms of fold-change (FC) ranking.
34
Two Phases of the MAQC Project, Addressing Two
Types of Microarray Applications
I. Class Comparison What makes the two
populations different?
Differentially Expressed Genes (DEGs)
MAQC-I
Better understanding of the biological mechanisms
II. Class Prediction Can the outcome of new
individuals be predicted?
Predictive Models (Classifiers)
MAQC-II
Diagnosis, treatment outcome, prognosis,
personalized medicine
35
Bio-informatics and examples
36
Collection, transformation and representation of
the data
Raw data (single, dual-color)
Background correction, data transformation,Normali
zation (differences in labeling, hybridization
and detection methods)
filtering (elimination of genes with minimal
variance)
37
Development of an expression matrix
38
Unsupervised analyses
Supervised analyses
Discover classes oftumors/specimens or genes
Discover genes associated with phenotype and
building of prediction model
39
Discover classes oftumors/specimens or genes
Unsupervised analyses
Discover classes oftumors/specimens or genes
  • Cluster analysis algorithms
  • Hierarchical
  • K-means
  • Self-Organizing Maps
  • Multitude of others

40
Unsupervised analysis clustering
ie we measure the expression of 3 genes for a
set of patients samples are displayed in this
gene space
axis gene expression level
41
Unsupervised analysis clustering
Patients can be grouped in three different
clusters
axis gene expression level
42
Unsupervised analysis clustering
  • Widely used Hartigan, 1975 Eisen et al., 1998
  • Organizing objects in a hierarchical tree
    (dendrogram) based on their degree of
    dissimilarity
  • Linkage distance between two clusters of
    objects
  • Assess quality stability and robustness

43
Unsupervised analysis EXAMPLE
  • 65 human BC samples from 42 individuals
  • 20 patients had profiles before and after CT
  • Unsupervised hierarchical clustering method was
    used to group genes on basis of similarity of
    patterns of gene expression
  • Genes are ranked vertically and samples
    horizontally
  • Found that ER status was a major discriminator of
    subtypes
  • Breast tumors show great variation in the gene
    expression
  • Gene expression is multi-dimensional- ie many
    different gene sets are differently expressed
  • Tumor samples from the same patient clustered
    together

Perou et al. 2000
44
Supervised analysis
  • Class Comparison
  • To compare the gene expression profiles of 2 or
    more groups of patients
  • Statistical test
  • -binary class t-test, Wilcoxon rank sum test
  • more than 2 classes ANOVA, Kruskal-Wallis test
  • Significance p-value
  • Multiple testing
  • - many hypotheses are tested simultaneously
  • - Example 10,000 genes and p-valuelt0.05 500
    false positives
  • Correction needed
  • Problems with traditional methods (ex
    Bonferroni)
  • Most assume variable independence
  • Many are considered too stringent
  • Trade off between biological information and
    false positive

45
Supervised analysis
The apparent lack of reproducibility in
identifying differentially expressed genes across
different platforms and sites ? P-value ranking
only!
FC-ranking should be used in combination with a
nonstringent P threshold to select a DEG list
that is reproducible, specific, and sensitive,
and a joint rule is recommended as a baseline
practice.
46
Supervised analysis
Class prediction to create a multi-gene predictor
Split data set randomly into training set and
test set
Training set
Test set
1. Identify discriminating genes between two
groups of interest
6. Validate predictor accuracy on independent data
2. Construct classifier by combining genes with
predictive machine learning algorithms
3. Estimate classification error rate in
leave-one-out cross validation by repeating 12
4. Select best classifier
5. Test significance in permutation test
47
Supervised analysis EXAMPLE
70-gene signature
  • Found 231 genes correlated to DM
  • Ranked in order of significance (p value)
  • 70 genes were chosen
  • Validation in 19 patients, 17/19 correctly
    predicted
  • Established clinical utility- could outperform St
    Gallen and NCI criteria by predicting who needed
    CT (ie who relapsed) and who did not

good signature
78 tumors
poor signature
Vant Veer et al., Nature, 2002
Validation series n151 node patients ( 144
node patients) Van de Vijver, NEJM, 2002
48
Supervised analysis EXAMPLES
META-ANALYSIS of PUBLICLY AVAILABLE DATA
DIFFERENT PROGNOSTIC GENE SIGNATURES
  • Proliferation is the common denominator of the
    different signatures
  • These signatures are mainly performant in ER
    patients
  • Immune response and tumor invasion may
    differentiate tumors with better and worse
    prognosis in ER- and HER2 patients

Mammaprint
Genomic Grade
76-gene signature
Oncotype
Wound signature

49
Independent validation studies
  • Role to confirm the results of a previous study,
    in order to reduce play of chance and the
    potential for bias
  • Ransohoff, Nat Rev Cancer 2004, 2005
  • Common mistakes
  • Include part of the initial sample of patients
  • To include other types of patients
  • To use another measurement technique RT-PCR
  • To change the prediction rule to adapt it to the
    new set playing with data- different algorithm,
    different cut-off, changing genes etc.

50
JNCI 2007
Critical review of 90 outcome-related
statistical analyses of microarray studies
published between 2000 and 2004
Development of a check-list
1. Need for clear objectives and study objectives
should influence pt selection! 2. Class-discovery
methods nor really suited for outcome-related
analyses, more for ex to elucidate pathways 3.
Class comparison analyses are appropriate when
outcomes are discrete. If outcome is survival we
lose information by making discrete groups. 4.
Data used for developing predictor should be
distinct from data used to validate it.
51
Some  easy  tools
BRB-ArrayTools Developed by Richard Simon
BRB-ArrayTools Development Team
52
Molecular classification
Tumor
microenvironment
CTCs
Prediction of treatment efficacy
Understanding the biology
53
Questions are welcome!
54
Back up
55
Better understanding of Tumor Biology
56
Understanding biological mechanisms
  • Extracting biological insight from microarray
    data remains a major challenge
  • Often long gene lists are produced, these genes
    change with different datasets
  • After correcting for multiple testing, no
    significant genes may be found
  • Single gene analysis may miss important pathway
    effects
  • Often highly depends on laboratory/supervisors
    area of expertise

57
Gene set enrichment analysis
  • Gene sets used rather than single genes
  • Fold change in all genes in one gene set is
    significant rather than a dramatic single gene
    fold change
  • Cellular processes often affect sets of genes

Genes are ranked based on correlation with a
phenotype Enrichment Score is caluculated which
reflects degree to which the gene set is
overrepresented at the extremes
Mootha V et al, Nat Gen 2003 Subramanian A et
al, PNAS 2005, Segal et al, Nat Gen 2004
58
Ingenuity Pathways Analysis
  • View gene lists within framework of functional
    networks
  • Protein-protein interactions curated from the
    literature
  • Generate hypotheses for experimental validation

www.ingenuity.com
Top pathway insulin receptor signaling
59
Connectivity Map
  • Database of gene expression profiles of common
    cell lines treated with drugs.
  • Multiple batches, different doses
  • Connect gene signatures with signatures of drug
    response

http//www.broad.mit.edu/cmap/ Lamb J et al,
Science 2006
60
Publically available dataset of MCF7 cells were
treated with estradiol and profiled. Their
profiles were highly correlated with those in the
Connectivity map of MCF7 cells treated with
estradiol and negatively correlated with
anti-estrogens
61
Clusters, if they exist, are consistent
  • Different methods look at different structures in
    data. However, if the separation is clear, the
    resulting clusters should be similar.

62
Illumina
Low sample input requirements Just 50100 ng of
total RNA required Low per-sample cost Less
than half the price of other commercial arrays
63
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com