Title: Statistical Issues in Gene Expression
1- Statistical Issues in Gene Expression
- Probabilistic Clustering And Classification
- of
- Profiles in Microarray Experiments
Nick Heard, David Stephens Department of
Mathematics, Imperial College London
Joint Work with Chris Holmes, David Hand
(Mathematics) George Dimopoulos (Molecular
Microbiology, IC Centre for Structural Biology)
2(No Transcript)
3EG Eisen Data (time course)
4EG Golub Data (independent samples)
5EG Our Data (time course)
6(No Transcript)
7(No Transcript)
8(No Transcript)
9(No Transcript)
10 However. Differential Expression is NOT
restricted to Mean Shifts
11(No Transcript)
12(No Transcript)
13- Different Approaches to Two-Sample Testing
- T-test (common variance) T1
- T-test (different variances) T2
- Mann-Whitney-Wilcoxon MWW
- Kolmogorov-Smirnov KS
- Bayesian Parametric/Non-parametric
- Do the tests concur ?
14Golub Data
Most differently expressed genes using the T
statistic
15Golub Data
Most differently expressed genes using the KS
statistic
16(No Transcript)
17(No Transcript)
18Multiple Testing
- Key issue Typically carrying out thousands of
hypothesis tests. - Can choose to
- Correct (Bonferroni Step-down)
- Or
- Calibrate (sampling under H0 to get true null
distribution bootstrap/permutation tests) - Note Ranking of genes unaltered by such
methods.
19Cluster Analysis
- Objective To find genes that have similar
relative expressions within the test sample. Can
be applied to - replicate experiments
- expression profiles
Typical Approach Hierarchical Clustering Choice
of Metric (usually Euclidean distance) and
Method (Minimum/Average/Maximum distance)
20Typical Approach Hierarchical Clustering (e.g.
in SPLUS/R)
21(No Transcript)
22(No Transcript)
23(No Transcript)
24Anopheles Mosquito
Anopheles Mosquito
Mode of Parasitic infection
25Anopheles Mosquito/Plasmodium Data
Malaria vector mosquito Plasmodium
Parasite Expression levels for selected genes
tracked over 7 time points after blood meal. (6,
20, 40, 96, 192, 336, 480 hours) Test
samples/cDNA library contain 1400 mosquito
genes 1400 parasite genes 1400 genes of unknown
origin Replicate experiments (3-7 repeat arrays)
Further experiments Variation of
experimental conditionsTime course expression
profile for rat infection
26Mosquito Data
- Mosquito gene expression only
-
- growth data (embryo (DAY 1) adult (DAY 14))
- E. coli other infection
- stress
-
- Expression levels for selected genes tracked over
6-8 time points - Replication
- Experiment has 2800 genes
- Ref Dimopoulos et al. (2000), PNAS
27(No Transcript)
28(No Transcript)
29(No Transcript)
30(No Transcript)
31(No Transcript)
32(No Transcript)
33(No Transcript)
34(No Transcript)
35- Can use the Marginal Likelihood as the basis for
a probabilistically based hierarchical clustering
process - Metric based on distance in marginal likelihood
terms - Compatible with more sophisticated clustering
approaches - Many off-line calculations speed-up overall
clustering process - Model-based clustering via design matrix X
- Utilize spline/wavelet basis formulations
- Prior modelling readily available (and
advantageous ?)
36(No Transcript)
37(No Transcript)
38(No Transcript)
39(No Transcript)
40(No Transcript)
41(No Transcript)
42(No Transcript)
43Anopheles
44Plasmodium
45Differential Expression
46Model choice for larvae data
47(No Transcript)
48(No Transcript)
49(No Transcript)
50(No Transcript)
51(No Transcript)
52(No Transcript)
53(No Transcript)
54(No Transcript)
55References Innate immune defense against malaria
infection in the mosquito, Dimopoulos et al.
Current Opinion in Immunology, 137988,
2001. Genome expression analysis of Anopheles
gambiae Responses to injury, bacterial challenge
and malaria infection, Dimopoulos et al. PNAS
(2002), to Appear. Acknowledgements Wellcome
Trust, IC Centre for Structural Biology
(BBSRC) Contact n.heard_at_ic.ac.uk,
d.stephens_at_ic.ac.uk c.holmes_at_ic.ac.uk, Software
http//stats.ma.ic.ac.uk/naheard Technical
Reports Code - WWW stats.ma.ic.ac.uk/smgb/