Title: Techniques for Analysing Microarrays
1Techniques forAnalysing Microarrays
- Which genes are involved in ovarian and
prostate cancer?
2Common Questions
- Which genes are up or down in different
conditions - Cancer patient versus Normal
- Non-invasive cancer versus invasive cancer
- (2) Which genes can differentiate between cancer
sub-types? - (3) Which genes relate to the survival of the
patient? - (4) Which genes may be in the same pathway as a
gene of interest?
3EOS chips
- Use Affymetrix GeneChip technology
- 25mers
- 8 probes in a probe set
- 59,000 probe sets 46,000 gene clusters(all
human expressed sequences known at time) - Normalised distributions of all chips to each
other (gamma distribution) - Single measure of intensity for each probe set
(Tukeys trimean)
4Variance increases with mean
Data after normalisation
Variance (log scale)
mean
After the fix.. (Add constant and log2)
Variance (linear scale)
mean
5Which genes are differentially expressed between
ovarian cancer and normal ovaries?
- 6 normal ovaries
- 38 ovarian cancers
- 3 mucinous
- 5 endometriod
- 30 serous
6Statistical techniques
- ranked t-statistics (unequal variance)
- quantile-quantile plots against normal
distribution - Westfall and Young permutation test
- http//stat-www.berkeley.edu/users/terry/zarray/Ht
ml/ - S. Dudoit, Y.H. Yang, M. J. Callow and
T.P.Speed. Statistical methods for identifying
differentially expressed genes in replicated cDNA
microarray experiments. August 2000 - Ratios of Cancer/Normal
- .
7t statistic
- The tstat gets more extreme as
- Difference in means
- The standard deviation of each of the two samples
- The size of the samples
-ve
0
ve
tstats ranked
8Quantile-Quantile Plot
R library(sma) or R library(base)
9Westfall and Young PermutationtpWY program
http//www.cbil.upenn.edu/tpWY/
- 6 normal ovaries, 38 ovarian cancers
- Randomise labels (OvCa, N)
- Compute tstats
- 100,000 iterations
- Unadjusted p valueProportion of iterations
where - p value adjusted for multiple testing
10How many genes were statistically significant?
- Ovarian Cancer Normal(Candidates for
antibody therapy?) - 110 candidates (adjusted plt0.01)
- 181 candidates (adjusted p lt0.05)
- Ovarian Cancer Normal
- (Candidates for tumor suppressor genes?)
- 7 candidates (adjusted plt0.01)
- 15 candidates (adjusted plt0.05)
11High in cancer
Excel
12Low in cancer
How can we deal with(a) Biological
variation? (b) More than one cause for cancer?
Excel
13Which genes are differentially expressed between
non-invasive and invasive ovarian cancer?
No. samples. Non-invasive Invasive Mucinou
s 5 4 Endometriod 1 7 Serous 2 33
Future Model all variables together Now
ranked t-stats, qqplots
14Assume equal variance for t-stats?
eg.mucinous cancer
S2 invasive (n4)
S2 non-invasive (n5)
Ratio variances
Theoretical quantiles (F distribution)
15What to do when n2?
Assume equal variance? Error model?
16Limitations of Westfall Young permutation method
No. samples. No. Permut. Non-invasive Inv
asive Mucinous 5 4 126 Endometriod 1 7 -
-- Serous 2 33 595
Not enough power when small sample sizes?
17Mucinous non-invasive versus invasive
R library(base)
18Which genes relate to prognosis of patients with
prostate cancer?
- 72 patients with prostate cancer
- Treatment Radical prostatectomy
- 17 relapsed PSA rise gt0.4ng/ml
Methods R survival package SAS
19Cox Proportional Hazards Model
Exponential(InvolvesGene PSA Independent of
Time)
Baseline hazard (Independent of gene expression
or PSA)
20A
B
relapsed
21Survival Curves Gene PSA model
High (gt 25th percentile)
Low (lt 25th percentile).
S(t)
S(t)
Time(disease free months)
Time(disease free months)
B
22Hazard Ratio 75th/25th percentile
Probe set Hazards Ratio unadjusted p value
A 0.26 (95 CI
0.12 to 0.54) 0.000351 B 0.32 (95 CI 0.16 to
0.67) 0.002151 False discovery rate for top
50 candidates is 20 (SAM)
23Summary
- Which genes are up or down in different
conditions? - - ranked t-statistics
- - qq plots (normal distribution)
- - Westfall Young permutations (multiple
testing) - (2) Which genes relate to the survival of the
patient? - - Cox proportional hazards
- - SAM multiple testing
24Acknowledgements
- Garvan
- Sue Henshall, Rob Sutherland,Patricia Vanden
Bergh - EOS
- Jordan Hiller, Daniel Afar, Kurt Gish, David Mack
- Royal Hospital for Women
- Nigel Hacker
- ANU/John Curtin
- John Maindonald
- Yvonne Pittelkow
- Walter and Elisa Hall Institute
- Terry Speed, Natalie Thorne
- University of Queensland
- Jessica Marr