Title: Analysis of the yeast transcriptional regulatory network
1Analysis of the yeast transcriptional regulatory
network
2Transcription Factor (TF)
- A TF is a protein that binds to DNA sequences and
regulates the transcriptions of corresponding
genes. - Usually the binding site of a TF is one small
segment of specific promoter sequence. - The activity of a TF is regulated according to
the cells need, largely through signal
transduction. It may not be directly observed,
but can be reflected by the genes it regulates.
3Expression regulatory network
- Identifying the expression regulatory network is
a crucial step towards understanding the cellular
regulation system. - Inferring network from microarray data alone
- Inferring network from microarray data and TF-TG
(Target Gene) Information
4Ihmels J, Friedlander G, Bergmann S, Sarig O, Ziv
Y, Barkai N. Revealing modular organization in
the yeast transcriptional network. Nat Genet.
2002 Aug31(4)370-7.
5Segal E et al. Module networks identifying
regulatory modules and their condition-specific
regulators from gene expression data. Nat Genet.
2003 Jun34(2)166-76.
6TF Activity
- Use TF-TG relation benefit the regulatory network
identification - TF expression level is not a good measure of the
TF activity. The activated protein level of a TF,
rather than its expression level, is what
controls gene expression. - The activity of a transcription factor is
regulated according to the cells need, largely
through signal transduction. It may not be
directly observed, but can be reflected by the
genes it regulates.
7Identify TF Activity by NCA
- Network Component Analysis
- Liao JC et al. Network component analysis
reconstruction - of regulatory signals in biological systems.
- Proc Natl Acad Sci U S A. 2003 Dec
23100(26)15522-7.
8NCA compared with PCA, ICA
9NCA Model
Without further constraints, E cannot be
uniquely decomposed to A and P.
10Criteria for Unique NCA E AP
- The connectivity matrix A must have full-column
rank. - When a node in the regulatory layer is removed
along with all of the output nodes connected to
it, the resulting network must be characterized
by a connectivity matrix that still has
full-column rank. This condition implies that
each column of A must have at least L-1 zeros. - P must have full row rank. In other words, each
regulatory signal cannot be expressed as a linear
combination of the other regulatory signals.
11Criteria 2
12Estimation of EAP
Iteratively estimate A and P A0 ? P1 ? A1 ?
P2 until convergence Convergence criterion
decrease of least square error lt cutoff
13NCA, infer TF activity in Yeast
E A P
How to define the restrictions to CS? i.e. which
CSi,j0?
14Identify the TF-TG relation by ChIP-chip
experiment
15Yeast cell cycle regulation
441 genes vs. 33 transcription factors
16Inference of regulatory network by Two-stage
constrained factor analysis
Yu T, Li KC. Inference of transcriptional
regulatory network by two-stage constrained
space factor analysis. Bioinformatics. 2005 Nov
121(21)4033-8.
17Inference of regulatory network by Two-stage
constrained factor analysis
Shortcoming of Liao et. al.s approach E
AP Let Cij IEij, the constraint of where the
loading matrix A can be non-zero C comes from
very noisy source. Estimate C, A, P
simultaneously.
18Model setting
Up to here, it is the NCA model by Liao et al.
19Model Fitting
20Model Fitting
Difficulties Simultaneous estimation of both
the structure and coefficients amounts to finding
optimum in a very complex function. The
number of parameters to be estimated is
overwhelming.
Solution Find a reasonable local optimum.
Use the high-confidence set to find a starting
point as close to the global optimum as possible.
Implementation Stepwise model fitting.
Start with a network backbone with only the
high-confidence set, and grow the network
gradually, drawing new connections from the
low-confidence set.
21Set CCMIN, estimate each activity profile tk by
the consensus of the expression of the regulated
genes.
Fix estimate of T, regress each gene expression
profile on the activity profiles of TFs that are
associated with it in CMAX. Use BIC and p-value
to select TFs.
22Result
Data Regular growth ChIP data cell-cycle
microarray data
99 TFs enter our study. Start with 891
evidenced relationships and 29154
lower-confidence relationships.
Final network has 3846 TF-gene connections.
23TFs that exhibit correlated expression and
activity
24Time-shifting between a TFs activity profile and
its expression profile
- Fit the activity profile using cubic spline
- interpolate the spline to get shifted profile
- obtain correlation between the expression
profile and shifted activity profile - maximize absolute correlation with regard to
minute shift.
25TFs that have activity lagging behind expression
26TFs that have activity lagging behind expression
SWI4
27Between-TF regulations