Title: Analysis of time-course gene expression data
1Analysis of time-course gene expression data
- Shyamal D. PeddadaBiostatistics Branch
- National Inst. Environmental
- Health Sciences (NIH)Research Triangle Park, NC
2Outline of the talk
- Some objectives for performing long series
time-course experiments - Single cell-cycle experiment
- A nonlinear regression model
- Phase angle of a cell cycle gene
- Inference
- Open research problems
- Multiple cell-cycle experiments
- Coherence between multiple cell-cycle
experiments - Illustration
- Open research problems
3Objectives
- Some genes play an important role during the
cell division cycle process. They are known as
cell-cycle genes. - Objectives Investigate various characteristics
of cell-cycle and/or circadian genes such as - Amplitude of initial expression
- Period
- Phase angle of expression (angle of maximum
expression for a cell cycle gene)
4Phases in cell division cycle
5A brief description
- G1 phase
- "GAP 1". For many cells, this phase is the
major period of cell growth during its lifespan. - S ("Synthesis) phase
- DNA replication occurs.
6A brief description
- G2 phase
- "GAP 2 Cells prepare for M phase. The G2
checkpoint prevents cells from entering mitosis
when DNA was damaged since the last division,
providing an opportunity for DNA repair and
stopping the proliferation of damaged cells. - M (Mitosis) phase
- Nuclear (chromosomes separate) and cytoplasmic
(cytokinesis) division occur. Mitosis is further
divided into 4 phases.
7Single, long series experiment
8Whitfield et al. (Molecular Biology of the Cell,
2002)
- Basic design is as follows
- Experimental units Human cancer cells (HeLa)
- Microarray platform cDNA chips used with approx
43000 probes (i.e. roughly 29000 genes) - 3 different patterns of time points (i.e. 3
different experiments) - One of the goals of these experiments was to
identify periodically expressed genes.
9Whitfield et al. (Molecular Biology of the Cell,
2002)
- Experiment 1 (26 time points)
- Hela cancer cells arrested in the S-phase using
double thymidine block. - Sampling times after arrest (hrs)
- 0 1 2 3 4 5 6 7 8 9 10 11 12 14 15 16 18 20 22
24 26 28 32 36 40 44.
10Whitfield et al. (2002)
- Experiment 2 (47 time points)
-
- Hela cancer cells arrested in the S-phase using
double thymidine block. - Sampling times after arrest (hrs)
- every hour between 0 and 46.
11Whitfield et al. (2002)
- Experiment 3 (19 time points)
-
- Hela cancer cells arrested arrested in the
M-phase using thymidine and then by nocodazole. - Sampling times after arrest (hrs)
- 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34
36.
12Whitfield et al. (2002)Phase marker genes
- Cell Cycle Phase Genes
- ------------------ -------
- G1/S CCNE1, CDC6, PCNA,E2F1
- S RFC4, RRM2
- G2 CDC2, TOP2A, CCNA2, CCNF
- G2/M STK15, CCNB1, PLK, BUB1
- M/G1 VEGFC, PTTG1, CDKN3, RAD21
13Questions
- Can we describe the gene expression of a
cell-cycle gene as a function of time? - Can we determine the phase angle for a given
cell-cycle gene? i.e. can we quantify the
previous table in terms of angles on a circle? - What is the period of expression for a given
gene? - Can we test the hypothesis that all cell-cycle
genes share the same time period? - Etc.
14Profile of PCNA based on experiment 2 data
15Some important observations
- Gene expression has a sinusoidal shape
- Gene expression for a given gene is an average
value of mRNA levels across a large number of
cells - Duration of cell cycle varies stochastically
across cells - Initially cells are synchronized but over time
they fall out of synchrony - Gene expression of a cell-cycle gene is expected
to decrease/decay over time. This is because
of items 2 and 4 listed above!
16Random Periods Model (PNAS, 2004)
- a and b background drift parameters
- K the initial amplitude
- T the average period
- the attenuation parameter
- the phase angle
17Fitted curves for some phase marker genes
18(No Transcript)
19(No Transcript)
20(No Transcript)
21Whitfield et al. (2002)Phase marker genes
- Phase Genes Phase angles (radians)
- -------- ------- ------------------------
- G1/S CCNE1, CDC6, PCNA,E2F1 0.56,
5.96, 5.87, 5.83 - S RFC4, RRM2 5.47, 5.36
- G2 CDC2, TOP2A, CCNA2, CCNF 4.24, 3.74, 3.55,
3.25 - G2/M STK15, CCNB1, PLK, BUB1 3.06,
2.67, 2.61, 2.51 - M/G1 VEGFC, PTTG1, CDKN3, RAD21 2.66, 2.40,
2.25, 1.81
22A hypothesis of biological interest
- Do all cell cycle genes have same T and same
but the other 4 parameters are gene specific? - i.e.
23An Important Feature
- Correlated data
- Temporal correlation within gene
- Gene-to-gene correlations
24Test Statistic
- Wald statistic for heteroscedastic linear and
non-linear models - Zhang, Peddada and Rogol (2000)
- Shao (1992)
- Wu (1986)
25The Null Distribution
- Due to the underlying correlation structure
- Asymptotic approximation is not
appropriate. - Use moving-blocks bootstrap technique on the
residuals of the nonlinear model. - Kunsch (1989)
26Moving-blocks Bootstrap
- Step 1 Fit the null model to the data and
compute the residuals. - Step 2 Draw a simple random sample (with
replacement) from all possible blocks , of a
specific size, of consecutive residuals.
27Moving-blocks Bootstrap
- Step 3 Add these residuals to the fitted curve
under the null hypothesis to obtain the bootstrap
data set - Step 4 Using the bootstrap data fit the model
under the alternate hypothesis and compute the
Wald statistic.
28Moving-blocks Bootstrap
- Step 5 Repeat the above steps a large number of
times. - Step 6 The bootstrap p-value is the proportion
of the above Wald statistics that exceed the Wald
statistic determined from the actual data.
29Analysis of experiment 2
- The bootstrap p-value for testing
-
- using Experiment 2 data of Whitfield et al.
(2002) is 0.12. - Thus our model is biologically plausible.
30Statistical inferences on the phase angle
Multiple experiments
31Some questions of interest
- How to evaluate or combine results from multiple
cell division cycle experiments? - Are the results consistent across experiments?
- How to evaluate this?
- What could be a possible criterion?
32Data
-
- RPM estimate of phase angle of a
cell-cycle gene g - from the experiment.
33Representation using a circle
- Consider 4 cell cycle genes A, B, C, D. The
vertical line in the circle denotes the reference
line. The angles are measured in a
counter-clockwise. - Thus the sequential order
- of expression in this
- example is A, B, D, C.
A
B
C
D
34Coherence in multiple cell-cycle experiments
- A group of cell cycle genes are said to be
coherent across experiments if their sequential
order of the phase angles is preserved across
experiments.
B
A
D
B
Exp 2
D
A
C
D
C
C
Exp 3
B
A
Exp 1
35Geometric Representation
- We shall represent phase angles from multiple
cell cycle experiments using concentric circles. - Each circle represents an experiment.
- Same gene from a pair of experiments is connected
by a line segment. - A figure with non-intersecting lines indicates
perfect coherence. - If there is no coherence at all then there will
be many intersecting lines.
36Example Perfectly Coherent
37Example Perfectly Coherent
38Example No coherence
39Estimated Phase Angles
- Due to statistical errors in estimation, the
estimated phase angles from multiple cell cycle
experiments need not preserve the sequential
order even though the true phase angles are in a
sequential order.
40How to evaluate coherence?
41Some background on regression for circular data
42Experiment B
Experiment A
Question Can we determine a rotation matrix A
such that we can rotate the circle representing
Experiment A to obtain the circle representing
Experiment B?
43Angle of rotation for a rigid body
- Yes! By solve the following minimization problem
44Determination of Coherence Across k Experiments
45The Basic Idea
- Consider a rigid body rotating in a plane.
Suppose the body is perfectly rigid with no
deformations. - Let denote the 2x2 rotation
matrices from - experiment i to i1 (k1 1). Then
- Alternatively
46The Basic Idea
- Equivalently, if
- Then under perfect rigid body motion we should
have
47Problem!
- In the present context we do NOT necessarily have
a rigid body! - Not all experiments are performed with same
precision. - The time axis may not be constant across
experiments. - Number of time points may not be same across
experiments. - Etc.
48Example Not a rigid motion but perfectly
coherent
49Consequence
- Rotation matrix A alone may not be enough to
bring two circles to congruence! - An additional association/scaling parameter may
be needed as see in the previous figure!
50Circular-Circular regression model for a pair of
experiments (Downs and Mardia, 2002)
- For , let
denote a pair of - angular variables.
-
- Suppose is von-Mises distributed
with - mean direction and concentration
parameter
51Circular-Circular Regression Model (Downs and
Mardia, 2002)
The regression model is given by the link function
52Back to the toy examples
53(No Transcript)
54(No Transcript)
55Determination Of Coherence
- Suppose we have K experiments, labeled as
- 1, 2, 3, , K. Let denote the angle of
rotation - for the regression of i on j for a group of g
genes. - Compute
- Note .
56Determination Of Coherence
- We expect under no coherence
- to be stochastically larger than
- under coherence.
57Comparison of Cumulative Distribution Functions
Blue line Coherence Pink line No Coherence
58Determination Of Coherence
- For a given data compute
- Generate the bootstrap distribution of
-
- under the null hypothesis of no coherence.
59Bootstrap P-value For Coherence
- Let denote the angle of rotation
using - the bootstrap sample. Then the P-value is
60Illustration Whitfield et al. data
- There are 3 experiments. The phase angles of each
gene was estimated using Liu et al., (2004)
model. - A total of 47 common cell-cycling genes were
selected from the three experiments.
61Estimates
- The estimated values of interest are
- Note that
62Conclusion
- Since the bootstrap P-value lt 0.05, we conclude
that the three experiments are coherent.
63(No Transcript)
64Statistical inferences on the phase angle- Some
open problems
65Estimation subject to inequality constraints
- It is reasonable to hypothesize that for a normal
cell division cycle, the p phase marker genes
must express in an order around the unit circle. - Thus they must satisfy
66Open problems- data from single experiment
- How to estimate the phase angles subject to the
simple order restriction? - More generally - wow to estimate the phase angles
subject isotropic simple order restriction? - How to test the above hypothesis? What are the
null and alternative hypotheses?
67Open problems data from multiple experiments
- How do we estimate the phase angles from multiple
experiments under the order restriction on the
phase angles of cell cycle genes? - What are the statistical errors associated with
such an estimator? - How to construct confidence intervals and test
hypotheses?
68Acknowledgments
- Delong Liu (former Post-doc at NIEHS)
- David Umbach (NIEHS)
- Leping Li (NIEHS)
- Clare Weinberg (NIEHS)
- Pat Crocket (Constella Group)
- Cristina Rueda (Univ. of Valladolid, Spain)
- Miguel Fernandez (Univ. of Valladolid, Spain)