Title: Gene%20expression%20studies%20of%20cancer:%20gene%20transcription%20signatures
1Gene expression studies of cancer gene
transcription signatures
- Chad Creighton
- February 2009
2Oncogenic signaling pathways in cancer
Mutation/deregulation of a handful of genes can
make cells into cancer cells.
Hanahan and Weinberg. Cell. 2000 10057-70
3(No Transcript)
4Widespread deregulation of gene expression in
cancer
- Gene expression profiling distinguishes prostate
cancer from normal prostate and from BPH.
Dhanasekaran et al. Nature. 2001 Aug
23412(6849)822-6.
5Widespread deregulation of gene expression in
cancer
- Gene expression profiling identifies different
subtypes of breast cancer.
Sorlie et al. PNAS. 2003 100(14)8418-23
6A gene-expression signature as a predictor of
survival in breast cancer
www.agendia.com
Van de Vijver et al. NEJM 2002 347(25)1999-2009.
7A 21-gene assay to predict recurrence of breast
cancer
Paik et al. NEJM 2004 351(27)2817-26.
8Oncogenic pathway signatures in human cancers as
a guide to targeted therapies
- Use oncogenic signatures to predict response of
cell lines to targeted therapy.
Bild et al. Nature. 2006 439(7074)353-7.
9Oncogenic signatures of ERBB2, EGFR, MEK, RAF,
and MAPK in breast cancer cells
Creighton et al. Cancer Res. 2006 66(7)3903-11.
10Preliminary gene expression profiling studies of
cancer
- Hundreds of genes are deregulated in cancer.
- Different subtypes of cancer are defined by gene
expression profiling. - Gene expression signatures may predict cancer
patient survival. - Gene expression signatures of oncogenic signaling
pathways can be defined using experimental models
(cell lines, mice).
11Potential uses for gene expression profiling of
cancer
- Define and understand the molecular pathways that
underlie cancer. - Define subgroups of patients for the purposes of
optimizing treatment. - Determine whether or not a patient would benefit
from a given therapy (e.g. chemotherapy). - Determine what specific pathways are deregulated
in the tumor and treat the tumor with therapies
that target that pathway (e.g. hormone therapy
for ER breast cancer).
12General concepts of gene expression analysis
- Low level analysis
- Processing image files
- Normalization
- Quality Control (QC)
- High level analysis
- Clustering
- Selecting differentially expressed genes
- Enrichment analysis or Meta-analysis
13Publicly available gene expression profile data
represents a rich resource
- When publishing studies using gene expression
profile data, authors are encouraged to make the
data available to everyone. - Subsequent studies can re-analyze the data with
different questions in mind from what the
original authors had.
14- GEO database (http//www.ncbi.nlm.nih.gov/geo/)
make thousands of expression profile datasets
publicly available. - Many top journals require microarray studies to
make data public on GEO
15Pathway-related gene sets Gene Ontology (GO)
terms
- The Gene Ontology project provides a controlled
vocabulary to describe gene attributes. - Three major categories
- Cellular component
- Biological process
- Molecular function
- The controlled vocabularies are structured so
that they can be queried at different levels - For example, use GO to find all gene products
involved in signal transduction, or zoom in on
all receptor tyrosine kinases.
www.geneontology.org
16Pathway-related gene sets Molecular Signature
Database (mSigDB)
- From the Broad Institute
- Collection of gene sets curated from the
literature (including gene expression profiling
studies). - Current version represents over 1800
pathway-associated genes sets
http//www.broad.mit.edu/gsea/msigdb/index.jsp
17Gene signatures
- Will be loosely defined here to mean a set of
genes that are functionally associated with each
other in some way. - Ways to define gene signatures
- Gene annotation (e.g. Gene Ontology terms)
- Curated pathway-associated gene sets
- Literature review articles
- Gene expression signature, gene signature
defined using expression profiling data - e.g. what genes go up or down in response to
treatment in an experimental model)
18Gene expression signatures
- When using expression profiling to define genes,
a gene expression signature consists of two
things - A set of genes going up (relative to
something). - A set of genes going down (relative to
something). - Relative direction of the genes (up-regulated vs
down-regulated, or over-expressed vs
under-expressed) is important. - Keep the up genes separated from the down
genes.
19How do we relate gene expression profile results
from different datasets to each other?
20Methods for determining enrichment of gene
signatures within the overall patterns of another
expression profile dataset
OR How do we relate gene expression profile
results from different datasets to each other?
21The enrichment problem
- A Given a gene set or sets of interest.
- i.e. a gene signature
- B Given an independent expression dataset with
the profiled genes being ranked by a specified
metric. - e.g. cancer vs. normal or correlation with
MYC. - Are the genes in (A) enriched within (B)?
- i.e. do the results of (A) and (B) overlap
significantly?
22Methods for determining enrichment
- Venn diagram, or marble jar approach
- Take the top set of genes from the expression
dataset (dataset B), tabulate the amount of
overlap with the independent gene set of interest
(dataset A). - Rank-based approach
- Use the entire dataset, including genes of
borderline significance or showing a weak trend
towards significance. - Correlation approach
- For a set of genes, compute correlation between
two sets of weighting factors (based on different
profiling datasets).
23Venn diagram enrichment analysis
- Requires us to make a cut to define what the
top genes are. - Significance of overlap may be determined by
chi-square or one-sided Fishers exact tests.
24Venn diagram enrichment analysis
Define gene set of interest
- Requires us to make a cut to define what the
top genes are. - Significance of overlap may be determined by
chi-square or one-sided Fishers exact tests.
25Venn diagram enrichment analysis
Define differentially expressed genes
- Requires us to make a cut to define what the
top genes are. - Significance of overlap may be determined by
chi-square or one-sided Fishers exact tests.
26Venn diagram enrichment analysis
Determine overlap between the two gene sets
- Requires us to make a cut to define what the
top genes are. - Significance of overlap may be determined by
chi-square or one-sided Fishers exact tests.
27Hypergeometric formula (one-sided Fishers exact
test)
- Number of genes in total population G
- Genes in G falling under pre-defined class A
- Number of genes selected k
- Number of selected genes k in class A n
- The number of genes expected to overlap by
chance (k X A)/G - One-sided Fishers exact test determines whether
n is significantly greater than (kXA)/G
28Hypergeometric formula (one-sided Fishers exact
test)
- Number of genes in total population G
- Genes in G falling under pre-defined class A
- Number of genes selected k
- Number of selected genes k in class A n
- The probability P for the term occurring n or
more times within a set of k genes randomly
selected from the population
29What is the total gene population (G)?
- Can represent the number of genes profiled on the
array chip. - What if two different array platforms were used
(a different set of genes are typically
represented in each)? - Use the common set of genes represented on both
array chips as the total population (do not
consider genes not represented on both arrays) - Use ONE of the two array platforms to define the
gene population (do not consider genes on the
other array platform that are not represented on
the first platform)
30A gene signature of mutation of EGFR in NSCLC
cell lines
- Compared lung cancer cell lines with or without
an activating mutation in EGFR. - Wanted to compare this gene signature with
another gene signature of EGFR
Lung cancer cell lines
Choi, Creighton, et al., PLoS ONE 2(11) e1226.
31Oncogenic signatures of ERBB2, EGFR, MEK, RAF,
and MAPK in breast cancer cells
- Does the published MCF-7EGFR signature overlap
with the NSCLC EGFR signature?
Creighton et al. Cancer Res. 2006 66(7)3903-11.
32Compare NSCLC EGFR mutant signature with a
signature of EGFR-transfected MCF-7 cells
- EGFR wt NSCLC genes 119
- MCF7 EGFR genes 1152
- Genes shared between MCF7/NSCLC array platforms
11079 - Genes shared between MCF7/NSCLC gene signatures
44
significance of overlap plt1E-10
One-sided Fishers exact test
Choi, Creighton, et al., PLoS ONE 2(11) e1226.
33A gene signature of mutation of EGFR in NSCLC
cell lines is enriched with EGFR-depended genes.
Choi, Creighton, et al., PLoS ONE 2(11) e1226.
34Experimental models versus clinical tumors
- Molecular data from experimental models represent
dynamic information, but clinical relevance is
not always clear (e.g. could represent
experimental artifacts). - Data from clinical tumor specimens represent more
static information, where the associations
observed may be pathologically relevant.
35Experimental models versus clinical tumors
- From clinical data, cannot distinguish
cause-and-effect associations from correlation
alone. - In cancer studies, important to combine the
experimental with the clinical. - Some researchers may doubt the validity of
experimental results unless they can be shown to
apply to human tissues
36Ranked-based enrichment analysis
Locations of genes from set B
Rank ordered genes from dataset A
- Rank-based approaches use all of the genes from
one of the datasets to determine enrichment (does
not make a cut).
37GSEA (rank-based) enrichment analysis
38GSEA (rank-based) enrichment analysis
All the genes in the dataset are used here
Subramanian, Aravind et al. (2005) Proc. Natl.
Acad. Sci. USA 102, 15545-15550
- Start from the top of the Ranked list.
- Add points to Random walk for each gene you
find in S. - Remove points from Random walk for each gene
not in S.
39GSEA Kolmogorov-Smirnov statistic
Consider the genes R1,.., RN that are ordered on
the basis of the difference metric between the
two classes and a gene set S containing G
members. We define                            Â
                         if Ri is not a member
of S, or                                      Â
               if Ri is a member of S.We then
compute a running sum across all N genes. The ES
is defined as                                 Â
                    or the maximum observed
positive deviation of the running sum.
40GSEA Kolmogorov-Smirnov statistic
- The ES score (the peak of the Random walk) is
just a number. - Need to evaluate the significance of the number
by some type of permutation testing - Permute the sample labels many times, OR
- Permute the gene sets (i.e. randomly generate
gene sets). - In either case, compare distribution of scores
from random tests with the actual score.
41GSEA (rank-based) enrichment analysis
Subramanian, Aravind et al. (2005) Proc. Natl.
Acad. Sci. USA 102, 15545-15550
Examples of GSEA running enrichment scores
42GSEA (rank-based) enrichment analysis
Subramanian, Aravind et al. (2005) Proc. Natl.
Acad. Sci. USA 102, 15545-15550
Sets with genes not located at the top of the
ranked gene population may still yield
significant enrichment scores.
43A mechanism of cyclin D1 action encoded in the
patterns of gene expression in human cancer
Lamb, et al. Cell 114323-34, 2003
44The Connectivity Map of gene signatures induced
by 164 different small molecule inhibitors
Lamb et al., Science. 2006 313(5795)1929-35
45The Connectivity Map
(Scoring derived from GSEA statistic)
46Q1-Q2 analysis (another ranked based approach)
- Q1 Compare enrichment pattern to that for
randomly select gene sets - Q2 Compare enrichment pattern to that for
randomly permuted labels in the reference profile
dataset
Tian, et al. PNAS 10213544-13549, 2003
47A gene expression signature of Akt overexpression
from a transgenic mouse model
Majumder et al. Nat Med 10 594601, 2004
48Creighton CJ, Oncogene. 2007 264648-55
49Venn diagram vs Rank-based methods
- Venn diagram results more easily interpretable.
- For rank-based methods, genes that are not at all
significant individually may contribute to
enrichment. - What gene do you go after for validation?
- With venn diagram, have to make a cut.
- May not include enough genes in the test.
50Venn diagram vs Rank-based methods
51Venn diagram vs Rank-based methods, what is a
significant p-value?
- If using the Venn diagram method in expression
studies, p-value should be very low if working
with sizable gene sets (e.g. lt1E-6). - If using rank-based method, can consider a
nominally significant p-value (e.g. plt0.05) to be
good if permuting the sample labels is involved. - Can always try both ways in order to be certain
of an enrichment association.
52Rank-based Q1-Q2 versus GSEA
- Q1-Q2 enrichment score is much simpler
- Take the sum of the t-statistic values for each
gene in the set. - GSEA scoring is more complicated.
- GSEA has user-friendly public software
(http//www.broad.mit.edu/gsea/) - No software yet for Q1-Q2, have to write your own.
53Correlation-based approach
- Take the correlation between two sets of
profiling results from different datasets. - May use all of the genes profiled or a specified
subset (e.g. genes in a gene signature). - The correlation metric may be any one of a number
of valid metrics (e.g. Pearsons or Spearmans
rank).
54Correlation-based approach
- Each gene used in the correlation may be
weighted in a number of ways - t-statistic, comparing two groups
- Mean-centered expression values
- 1 or -1 for up or down, respectively
- Again, direction of the genes is important
- Positive correlation indicates similar overall
patterns between the two datasets. - Example IGF activation score from Creighton et
al., JCO 2008.
55Example analyses comparing gene transcription
signatures from different studies
56Gene expression signatures of oncogenic pathways
from published studies
- Includes
- MYC
- c-Src
- beta-catenin
- Cell cycle
- cyclin D1
- E2F3
- androgen
- estrogen
- Ras
- Akt
57Multiple public gene expression profile datasets
of prostate tumors
58Experimentally-derived oncogenic signatures in
human prostate cancer
- Are there patterns of interest shared between the
oncogenic signatures and the prostate tumors? - Examine gene that are high/low with oncogene
expression in the human tumors. - Is the corresponding oncogenic signature enriched
in those tumors. - Use both Q1-Q2 (rank based) and one-sided
Fishers exact (Venn diagram) methods
59A mechanism of cyclin D1 action encoded in the
patterns of gene expression in human cancer
Lamb, et al. Cell 114323-34, 2003
Use same idea from Lamb et al., only look at
multiple signatures in multiple prostate tumor
datasets
60Genes up-regulated by a specific oncogene in
experimental models are co-expressed as a group
with the oncogene in clinical prostate tumors
Enrichment results using Q1-Q2 rank-based method
61Number of genes overlapping between oncogenic signatures and clinical tumor correlates. Number of genes overlapping between oncogenic signatures and clinical tumor correlates. Number of genes overlapping between oncogenic signatures and clinical tumor correlates. Number of genes overlapping between oncogenic signatures and clinical tumor correlates. Number of genes overlapping between oncogenic signatures and clinical tumor correlates. Number of genes overlapping between oncogenic signatures and clinical tumor correlates.
Oncogenic signature Signature genes Prostate tumor genes Expected overlap Actual overlap P-value
androgen_up_Chen 559 176 7 29 1.2E-11
Myc_up_Bild 993 150 10 33 5.5E-10
Src_up_Bild 1566 118 12 23 0.002
erbB-2_up_Creighton 1315 307 27 91 1.1E-26
EGFR_up_Creighton 734 28 1 1 0.75
cyclin_D1_up_Lamb 206 139 2 8 0.0006
Akt_up_Majumder 770 280 14 47 3.8E-13
     Â
Number of genes positively correlated with corresponding oncogene/biomarker in human prostate tumors (Criterion plt0.05 in at least three out of four profile datasets). Number of genes positively correlated with corresponding oncogene/biomarker in human prostate tumors (Criterion plt0.05 in at least three out of four profile datasets). Number of genes positively correlated with corresponding oncogene/biomarker in human prostate tumors (Criterion plt0.05 in at least three out of four profile datasets). Number of genes positively correlated with corresponding oncogene/biomarker in human prostate tumors (Criterion plt0.05 in at least three out of four profile datasets). Number of genes positively correlated with corresponding oncogene/biomarker in human prostate tumors (Criterion plt0.05 in at least three out of four profile datasets). Number of genes positively correlated with corresponding oncogene/biomarker in human prostate tumors (Criterion plt0.05 in at least three out of four profile datasets).
By one-sided Fisher's exact test. By one-sided Fisher's exact test. By one-sided Fisher's exact test. By one-sided Fisher's exact test. By one-sided Fisher's exact test. By one-sided Fisher's exact test.
62A gene signature of Insulin-like growth factor I
(IGF-I)
- Substantial evidence implicates insulin-like
growth factor I (IGF-I) signaling in the
development and progression of breast cancer. - Gene expression profiling of IGF-I-stimulated
MCF-7 cells was performed. - An IGF-I gene signature was examined in human
breast tumors, as well as in experimental models
for specific oncogenic signaling pathways.
Creighton CJ, et al., Lee AV. JCO. 264078-85.
63Genes altered by IGF-I at 3hr or 24hr or both
64A gene signature of Insulin-like growth factor I
(IGF-I)
65(No Transcript)
66Oncogenic pathway signatures in human cancers as
a guide to targeted therapies
- Examine previously published dataset for
oncogenic signatures overlapping with IGF
signature
Bild et al. Nature. 2006 439(7074)353-7.
67The IGF is enriched for transcriptional targets
of the Ras pathway
68The Connectivity Map of gene signatures induced
by 164 different small molecule inhibitors
Lamb et al., Science. 2006 313(5795)1929-35
69The IGF is enriched for transcriptional targets
of the PI3K/Akt/mTOR pathway
70IGF signature is present in human breast cancers
71Widespread deregulation of gene expression in
cancer
- Gene expression profiling identifies different
subtypes of breast cancer.
Sorlie et al. PNAS. 2003 100(14)8418-23
72IGF signature is present in luminal B and basal
breast tumors
Data from Sorlie et al. PNAS. 2003
100(14)8418-23
73IGF signature is associated with poor prognosis
in ER breast tumors
74Relating gene expression profile results from
different datasets to each other by unsupervised
clustering methods USUALLY NOT A GOOD IDEA
- Unsupervised clustering is a technique for data
analysis that partitioning a data set into
subsets whose elements share common traits - Many groups will try to relate a gene signature
to another dataset by clustering the samples in
the dataset using the genes in the signature - The main problem with this Unsupervised
clustering does not take the direction of the
genes in the signature into account.
75- Identification ofa Common Serum Response (CSR)
gene signature in fibroblasts - Starve fibroblasts, then give them serum and see
what genes are up-regulated or down-regulated.
Chang et al., PLoS Biol. 2004 Feb2(2)E7
76Survey of fibroblast CSR geneexpression in human
cancers
- Using the genes in the CSR signature, cluster
human tumors. - Tumor form two major groups.
Chang et al., PLoS Biol. 2004 Feb2(2)E7
77Prognostic value of fibroblast CSR in epithelial
tumors
- Tumors in the activated group had worse outcome.
Chang et al., PLoS Biol. 2004
78What issues are these with this type of analysis
approach?
- The clustering method does not tell us which
direction the CSR gene are moving. - Are genes up in the CSR signature also up in the
Activated tumor set?
79What issues are these with this type of analysis
approach?
- These bars indicate the direction of the CSR
genes in these clusters (redup) - CSR pattern does appear here to be manifested in
half the tumors.
80Excel functions/features you will need for the
computational exercise
81TTEST Worksheet function
TTEST(array1,array2,tails,type)
- Array1Â Â Â is the first data set.
- Array2Â Â Â is the second data set.
- Tails   specifies the number of distribution
tails (Use 2 for the computational exercise.) - Type   is the kind of t-Test to perform (Use
2).
82AVERAGE Worksheet function
AVERAGE(number1, number2)
- Number1, number2, ...   are 1 to 30 numeric
arguments for which you want the average. - The arguments must either be numbers or be names,
arrays, or references that contain numbers.
83Data-gtFilter-gtAutoFilter
- When you use the AutoFilter command, AutoFilter
arrows appear to the right of the column labels
in the filtered range. - Microsoft Excel indicates the filtered items with
blue. - You use custom AutoFilter to display rows that
meet complex criteria for example, you might
display rows that contain values within a
specific range (e.g. plt0.01)
- Unfiltered range
- Filtered range
84MATCH Worksheet function
MATCH(lookup_value,lookup_array,match_type)
- Lookup_value  is the value you use to find the
value you want in a table. - Lookup_value is the value you want to match in
lookup_array. For example, when you look up
someone's number in a telephone book, you are
using the person's name as the lookup value, but
the telephone number is the value you want. - Lookup_value can be a value (number, text, or
logical value) or a cell reference to a number,
text, or logical value. - Lookup_array  is a contiguous range of cells
containing possible lookup values. Lookup_array
must be an array or an array reference. - Match_type  should be set to 0 for our purposes.
85COUNT Worksheet function
- If an argument is an array or reference, only
numbers in that array or reference are counted.
Empty cells, logical values, text, or error
values in the array or reference are ignored.
86(No Transcript)
87(No Transcript)
88(No Transcript)
89(No Transcript)
90(No Transcript)
91(No Transcript)
92(No Transcript)
93(No Transcript)
94(Dont forget the )
95(No Transcript)
96R functions you will need for the computational
exercise
97dhyper function in R
- Example
- 100 balls
- 10 of the balls are red
- I grab 20 balls
- Five of my 20 balls are red
- Was the number of red balls I selected a
significant number ?
gt mlt-10 number of red balls gt nlt-90
number of other balls (total pop-m) gt klt-20
number of balls selected gt xlt-0k vector of
successes gt 1-sum(dhyper(x,m,n,k)15) 1
0.02546455
98Compare NSCLC EGFR mutant signature with a
signature of EGFR-transfected MCF-7 cells
- EGFR wt NSCLC genes 119
- MCF7 EGFR genes 1152
- Genes shared between MCF7/NSCLC array platforms
11079 - Genes shared between MCF7/NSCLC gene signatures
44
significance of overlap plt1E-10
One-sided Fishers exact test
Choi, Creighton, et al., PLoS ONE 2(11) e1226.
99dhyper function in R
- EGFR mutant signature example
- 11079 Genes shared between MCF7/NSCLC array
platforms - 119 EGFR wt NSCLC genes
- 1162 MCF7 EGFR genes
- 44 genes shared between MCF7/NSCLC gene signatures
gt mlt-119 number of EGFR wt NSCLC genes gt
nlt-11079-119 number of other genes gt klt-1162
number of MCF7 EGFR genes gt xlt-0k
vector of successes gt 1-sum(dhyper(x,m,n,k)144
) 1 1.265654e-14
100General concepts of gene expression analysis
101General concepts of gene expression analysis
- Low level analysis
- Processing image files.
- Normalization
- QC
- High level analysis
- Clustering
- Selecting differentially expressed genes.
- Enrichment analysis
102Processing image files
- From CEL, GPR, or TXT files with image
information, want to generate gene expression
values - For two color arrays (e.g. Stanford cDNA arrays),
can use Bioconductor - For one channel array (e.g. Affymetrix), can use
dChip or Bioconductor
103Normalization
- Purpose To adjust the overall chip brightness of
the arrays to a similar level - Methods
- Two channel arrays
- Loess normalization is good
- One channel arrays
- Total intensity normalization
- Quantile normalization
- Invariant set normalization
104Before Normalization
After Normalization
www.dchip.org
105High level analysis
- Selecting differentially expressed genes
- Account for multiple testing
- Clustering
- Hierarchical clustering
- Principal Components analysis
- K-means clustering
- Enrichment analysis or Meta-analysis
106Selecting differentially expressed genes
- Students t-test or ANOVA typically used
- Works best on log-transformed data
- Other criteria
- fold change
- Higher average signal intensity might indicate
greater abundance - What p-value cutoff do you choose?
- No right answer
- Need to balance between false positives and false
negatives - More stringent p-value, fewer false positives,
more false negatives - Less stringent p-value, fewer false negatives,
more false positives
107Multiple testing
- When evaluating thousands of genes, some will
show a nominally significant P-value by chance
alone - Somewhat like buying lots and lots of lottery
tickets your chances of winning greatly improve. - Want to estimate false discovery rate (FDR)
108Multiple testing
- Estimate FDR by method from Storey et al. (PNAS
2003 1009440-5). - Use permutation testing (e.g. SAM analysis,
Tusher et al., PNAS 2001 985116-21) - Randomly assign sample labels and do the test
- Do it many times to get a distribution of false
positives
Number of genes on the array X nominal
P-value Number of genes significant with that
P-value
FDR
109Cluster analysis
- Cluster analysis relates to grouping or
segmenting a collection of objects (e.g. genes or
samples) into subsets or "clusters", such that
those within each cluster are more closely
related to one another than objects assigned to
different clusters. - Central to cluster analysis is the notion of
degree of similarity (or dissimilarity) between
the individual objects being clustered.
110Cluster analysis
- Major methods of clustering include hierarchical
clustering, k-means clustering, and principal
components analysis (PCA) - Heirarchical clustering most common for
expression profile data analysis - Cluster and JavaTreeview public software
programs fomr Eisen et al. (http//rana.lbl.gov/)
are handy for cluster analysis and/or generating
heat maps
111Hierarchical clustering 3 methods for measuring
distance between clusters
- Single linkage, using the members of each cluster
that are closest to each other
http//www.resample.com/xlminer/help/HClst/HClst_i
ntro.htm
112Hierarchical clustering 3 methods for measuring
distance between clusters
- Complete linkage, using the members of each
cluster that are furthest from each other
http//www.resample.com/xlminer/help/HClst/HClst_i
ntro.htm
113Hierarchical clustering 3 methods for measuring
distance between clusters
- Average linkage, using the average of each
cluster, most commonly used.
http//www.resample.com/xlminer/help/HClst/HClst_i
ntro.htm
114Widespread deregulation of gene expression in
cancer
- Gene expression profiling identifies different
subtypes of breast cancer.
Sorlie et al. PNAS. 2003 100(14)8418-23
115Final words on gene expression profile analysis
- All good roads lead to Rome.
- i.e., there are many ways to go about exploratory
analysis, which can lead to the same overall
conclusions - Whats important
- Be clear and concise about what you did (so
others can understand it and repeat it) - Dont try to fool anybody (including yourself)