Introduction to

About This Presentation

Title:

Introduction to

Description:

Introduction to – PowerPoint PPT presentation

Number of Views:60

Avg rating:3.0/5.0

Slides: 64

Provided by: desm5

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to

1
Introduction to Gene expression profiling
Christine Desmedt PhD Translational Research
Unit Université Libre de Bruxelles Institut Jules
Bordet Brussels, Belgium
2

Introduction
The technique
Standardization
Bio-informatics
Use of Gene expression profiling in breast cancer

3
Introduction
4
From genes to proteins
Replication
Transcription
Translation
DNA
Protein
RNA
Gene expression
Protein expression
SNPs Mutations Amplifications
Deletions Gene Fusions Chromosomal
aberrations
5
Sequence of the GENOME
Progress in technology and Bio-informatics
Era of molecular medicine
Gene expression profiles
6
Microarray experiments

Advantages
Measure expression of several thousands of genes
simultaneously.
Possibility to discover important pathways and
genes relevant to the clinical problem
Examine snapshot of gene expression of tumor
and environment, rather than genes individually
Disadvantages
Huge volume of data produced- chance of false
discoveries- fishing expeditions.
How to interpret and generate useful biological
information remains a significant challenge
Clinical validation and utility of findings still
remains unclear

7
The technique
8
Microarray platforms

Probe implementation
Full-length cDNA
Oligonucleotides (presynthesized, in situ
synthesized)
Target-labeling strategies
Single-color detection
Dual-color detection

9
Probe implementation
cDNA
Oligonucleotide
Short (Affymetrix)
Long (Agilent)
10
Full-length CDNA Printing in house
384 well Plate
slide
Platter 100 slides
11
Printinh head and pin
12
Short oligonucleotide array Affymetrix
13
Short oligonucleotide array Affymetrix
Fully integrated instrument that maximizes data
reproducibility and laboratory productivity by
minimizing user intervention
14
Signal detectionDUAL color detection
15
Normal
Wavelength 635
Tumor
Wavelength 532
Composite image
16
Normal tissue Wavelenght 635 nm
1 2 3
1 2 3
Tumor Wavelenght 532 nm
1 2 3
Ratio (635 nm/532 nm)
17
Image Analysis
18
Results
19
Signal detectionSINGLE color detection
(Affymetrix)
Streptavidin-phycoerythrin (SAPE) Counterstained
with biotinylated anti-streptavidine
20
Signal detectionSINGLE color detection
(Affymetrix)
47,000 transcripts !
21
Signal detectionSINGLE color detection
(Affymetrix)
22
Signal detectionSINGLE color detection
(Affymetrix)
23
Standardization
24
Standardization (1)

MIAME guidelines
Information About a Microarray Experiment
checklist helps define the level of detail that
should exist and is being adopted by many
journals as a requirement for the submission of
papers incorporating microarray results.

25
Standardization (2)
MIAME (Minimal Information about a Microarray
Experiment) http//www.mged.org/Workgroups/MIAME/
miame.html
26
Short oligonucleotide array Affymetrix
Standardization (2) public repositories

Gene Expression Omnibus (GEO)
(http//www.ncbi.nlm.nih.gov/geo/)
a gene expression/molecular abundance repository
supporting MIAME compliant data submissions, and
a curated, online
resource for gene expression data browsing,
query and retrieval.
Array Express
(http//www.ebi.ac.uk/arrayexpress/)
ArrayExpress is a public repository for
microarray data, which is aimed at storing
MIAME-compliant data in accordance with
recommendations. The ArrayExpress Data Warehouse
stores gene-indexed expression profiles from a
curated subset of experiments in the repository.

27
Short oligonucleotide array Affymetrix
Standardization (2) public repositories
28
Short oligonucleotide array Affymetrix
Standardization (2) public repositories
29
Standardization MIAME

The raw data for each hybridisation (e.g., CEL or
GPR files)
The final processed (normalised) data for the set
of hybridisations in the experiment (study)
(e.g., the gene expression data matrix used to
draw the conclusions from the study)
The essential sample annotation including
experimental factors and their values (e.g.,
compound and dose in a dose response experiment)
The experimental design including sample data
relationships (e.g., which raw data file relates
to which sample, which hybridisations are
technical, which are biological replicates)
Sufficient annotation of the array (e.g., gene
identifiers, genomic coordinates, probe
oligonucleotide sequences or reference commercial
array catalog number)
The essential laboratory and data processing
protocols (e.g., what normalisation method has
been used to obtain the final processed data)

30
Standardization (3) MAQC

MAQC initiative
MicroArray Quality Control (MAQC) Project" is
being conducted by the FDA to develop standards
and quality control metrics which will eventually
allow the use of MicroArray data in drug
discovery, clinical practice and regulatory
decision-making.

4 mRNA samples
5 replicates
6 microarray platforms
3 laboratories

Nature Biotechnology, 24, 9, 1151-1161, 2006
32
Coefficient of variation

4 mRNA samples
5 replicates
6 microarray platforms
3 laboratories

CV 5-15 within laboratories 10-20 between
laboratories
Nature Biotechnology, 24, 9, 1151-1161, 2006
33
MAQC Findings
Microarray data are

Repeatable within a laboratory
Reproducible across laboratories
Concordant across platforms
Comparable with quantitative technologies, e.g.,
QPCR
Reflective of biology regardless of the
differences in technology.

if we look at differential gene expression in
terms of fold-change (FC) ranking.
34
Two Phases of the MAQC Project, Addressing Two
Types of Microarray Applications
I. Class Comparison What makes the two
populations different?
Differentially Expressed Genes (DEGs)
MAQC-I
Better understanding of the biological mechanisms
II. Class Prediction Can the outcome of new
individuals be predicted?
Predictive Models (Classifiers)
MAQC-II
Diagnosis, treatment outcome, prognosis,
personalized medicine
35
Bio-informatics and examples
36
Collection, transformation and representation of
the data
Raw data (single, dual-color)
Background correction, data transformation,Normali
zation (differences in labeling, hybridization
and detection methods)
filtering (elimination of genes with minimal
variance)
37
Development of an expression matrix
38
Unsupervised analyses
Supervised analyses
Discover classes oftumors/specimens or genes
Discover genes associated with phenotype and
building of prediction model
39
Discover classes oftumors/specimens or genes
Unsupervised analyses
Discover classes oftumors/specimens or genes

Cluster analysis algorithms
Hierarchical
K-means
Self-Organizing Maps
Multitude of others

40
Unsupervised analysis clustering
ie we measure the expression of 3 genes for a
set of patients samples are displayed in this
gene space
axis gene expression level
41
Unsupervised analysis clustering
Patients can be grouped in three different
clusters
axis gene expression level
42
Unsupervised analysis clustering

Widely used Hartigan, 1975 Eisen et al., 1998
Organizing objects in a hierarchical tree
(dendrogram) based on their degree of
dissimilarity
Linkage distance between two clusters of
objects
Assess quality stability and robustness

43
Unsupervised analysis EXAMPLE

65 human BC samples from 42 individuals
20 patients had profiles before and after CT
Unsupervised hierarchical clustering method was
used to group genes on basis of similarity of
patterns of gene expression
Genes are ranked vertically and samples
horizontally
Found that ER status was a major discriminator of
subtypes
Breast tumors show great variation in the gene
expression
Gene expression is multi-dimensional- ie many
different gene sets are differently expressed
Tumor samples from the same patient clustered
together

Perou et al. 2000
44
Supervised analysis

Class Comparison
To compare the gene expression profiles of 2 or
more groups of patients
Statistical test
-binary class t-test, Wilcoxon rank sum test
more than 2 classes ANOVA, Kruskal-Wallis test
Significance p-value
Multiple testing
- many hypotheses are tested simultaneously
- Example 10,000 genes and p-valuelt0.05 500
false positives
Correction needed
Problems with traditional methods (ex
Bonferroni)
Most assume variable independence
Many are considered too stringent
Trade off between biological information and
false positive

45
Supervised analysis
The apparent lack of reproducibility in
identifying differentially expressed genes across
different platforms and sites ? P-value ranking
only!
FC-ranking should be used in combination with a
nonstringent P threshold to select a DEG list
that is reproducible, specific, and sensitive,
and a joint rule is recommended as a baseline
practice.
46
Supervised analysis
Class prediction to create a multi-gene predictor
Split data set randomly into training set and
test set
Training set
Test set
1. Identify discriminating genes between two
groups of interest
6. Validate predictor accuracy on independent data
2. Construct classifier by combining genes with
predictive machine learning algorithms
3. Estimate classification error rate in
leave-one-out cross validation by repeating 12
4. Select best classifier
5. Test significance in permutation test
47
Supervised analysis EXAMPLE
70-gene signature

Found 231 genes correlated to DM
Ranked in order of significance (p value)
70 genes were chosen
Validation in 19 patients, 17/19 correctly
predicted
Established clinical utility- could outperform St
Gallen and NCI criteria by predicting who needed
CT (ie who relapsed) and who did not

good signature
78 tumors
poor signature
Vant Veer et al., Nature, 2002
Validation series n151 node patients ( 144
node patients) Van de Vijver, NEJM, 2002
48
Supervised analysis EXAMPLES
META-ANALYSIS of PUBLICLY AVAILABLE DATA
DIFFERENT PROGNOSTIC GENE SIGNATURES

Proliferation is the common denominator of the
different signatures
These signatures are mainly performant in ER
patients
Immune response and tumor invasion may
differentiate tumors with better and worse
prognosis in ER- and HER2 patients

Mammaprint
Genomic Grade
76-gene signature
Oncotype
Wound signature

49
Independent validation studies

Role to confirm the results of a previous study,
in order to reduce play of chance and the
potential for bias
Ransohoff, Nat Rev Cancer 2004, 2005
Common mistakes
Include part of the initial sample of patients
To include other types of patients
To use another measurement technique RT-PCR
To change the prediction rule to adapt it to the
new set playing with data- different algorithm,
different cut-off, changing genes etc.

50
JNCI 2007
Critical review of 90 outcome-related
statistical analyses of microarray studies
published between 2000 and 2004
Development of a check-list
1. Need for clear objectives and study objectives
should influence pt selection! 2. Class-discovery
methods nor really suited for outcome-related
analyses, more for ex to elucidate pathways 3.
Class comparison analyses are appropriate when
outcomes are discrete. If outcome is survival we
lose information by making discrete groups. 4.
Data used for developing predictor should be
distinct from data used to validate it.
51
Some easy tools
BRB-ArrayTools Developed by Richard Simon
BRB-ArrayTools Development Team
52
Molecular classification
Tumor
microenvironment
CTCs
Prediction of treatment efficacy
Understanding the biology
53
Questions are welcome!
54
Back up
55
Better understanding of Tumor Biology
56
Understanding biological mechanisms

Extracting biological insight from microarray
data remains a major challenge
Often long gene lists are produced, these genes
change with different datasets
After correcting for multiple testing, no
significant genes may be found
Single gene analysis may miss important pathway
effects
Often highly depends on laboratory/supervisors
area of expertise

57
Gene set enrichment analysis

Gene sets used rather than single genes
Fold change in all genes in one gene set is
significant rather than a dramatic single gene
fold change
Cellular processes often affect sets of genes

Genes are ranked based on correlation with a
phenotype Enrichment Score is caluculated which
reflects degree to which the gene set is
overrepresented at the extremes
Mootha V et al, Nat Gen 2003 Subramanian A et
al, PNAS 2005, Segal et al, Nat Gen 2004
58
Ingenuity Pathways Analysis

View gene lists within framework of functional
networks
Protein-protein interactions curated from the
literature
Generate hypotheses for experimental validation

www.ingenuity.com
Top pathway insulin receptor signaling
59
Connectivity Map

Database of gene expression profiles of common
cell lines treated with drugs.
Multiple batches, different doses
Connect gene signatures with signatures of drug
response

http//www.broad.mit.edu/cmap/ Lamb J et al,
Science 2006
60
Publically available dataset of MCF7 cells were
treated with estradiol and profiled. Their
profiles were highly correlated with those in the
Connectivity map of MCF7 cells treated with
estradiol and negatively correlated with
anti-estrogens
61
Clusters, if they exist, are consistent

Different methods look at different structures in
data. However, if the separation is clear, the
resulting clusters should be similar.

62
Illumina
Low sample input requirements Just 50100 ng of
total RNA required Low per-sample cost Less
than half the price of other commercial arrays
63
(No Transcript)

Write a Comment

User Comments (0)

About PowerShow.com

Introduction to - PowerPoint PPT Presentation

Introduction to

Introduction to – PowerPoint PPT presentation