Michael%20A.%20Beer%20and%20Saeed%20Tavazoie

About This Presentation

Title:

Michael%20A.%20Beer%20and%20Saeed%20Tavazoie

Description:

Dept. of Molecular Biology. Mike Beer. Postdoctoral Researcher ... From Molecular Biology of the Cell, 4th edition. Motif combinatorics and predictive accuracy ... – PowerPoint PPT presentation

Number of Views:227

Avg rating:3.0/5.0

Slides: 21

Provided by: donmo3

Category:

more less

Transcript and Presenter's Notes

Title: Michael%20A.%20Beer%20and%20Saeed%20Tavazoie

1

Michael A. Beer and Saeed Tavazoie
Cell 117, 185-198 (16 April 2004)

2
The Authors
Saeed Tavazoie (middle) Professor Dept. of
Molecular Biology
Mike Beer Postdoctoral Researcher Ph.D, Princeton
(1995)
3
The Question

Transcription factor binding sites are
relatively well-characterized in Saccharomyces
cerevisiae
But - the presence of a TF binding site alone
is not sufficient to predict expression of a gene
Multiple regulatory factors are often involved
How do you identify the elaborate rules for gene
regulation?

4
Simple regulatory structures
Each possible combination of TFs must be tested
in the lab This is a hugely time-consuming task..
5
Problems with predicting gene regulation
Regulatory motif sequences have low consensus
e.g. The well known TATA box has a consensus
of TATA(A/T)A(A/T)(A/G)
Numerous transcription factors can bind to any
one motif
Many genes have multiple known motifs upstream of
ATG
6
Example of cis-regulatory logic
From Yuh et al (1998), Science 279, 1896-1902
7
The Approach
1. Using microarray expression data, the authors
built clusters of genes with similar expression
patterns.
From brain expression data in Wen et al (1998),
PNAS 95, 334-339
8
The Approach, cont.
2. From groups of genes with similar expression
patterns, a search is undertaken for consensus
sequence motifs within 800bp upstream of ATG in
each cluster.
9
The Approach, cont

3. The authors built a Markov model using the
TF sequence motifs as parent nodes, and the
expression data as data values.
This can be applied to a gene of interest by
identifying the upstream TF motifs for that gene,
and finding the model(s) that best fits the known
upstream TF motifs.
If the expression data is within the parameters
predicted by the model, then there is a decent
chance that its associated gene regulatory
structure can be verified experimentally.

10
Two examples from yeast
Both clusters have at least 10 genes each, and
there is some confidence that genes with the same
upstream TFs will exhibit the same expression
pattern as these clusters.
11
Constructing the models
Using expression data from 30 microarrays, the
authors identified 5547 genes with significant
expression levels in yeast, and this data was
used to construct 49 models of expression
patterns.
12
Predictive accuracy
These 49 models were applied to five test sets of
expression data, using only the upstream 800 bp
region as input. They found that the expression
pattern was correctly predicted for 1898 genes
out of the test set(s) of 2587 genes. This
amounts to 73 accuracy (random would be 1/49, or
2).
13
Application to C. elegans
Given the larger amount of regulatory sequences
in higher order organisms, and the potential for
more complex regulation, the authors had low
expectations for applying this model to C.
elegans. Using 2000 bp of upstream sequence,
and microarray expression data including Hill
(2000), the authors were surprised to learn that
they could predict expression patterns for
roughly half of the genes in the C. elegans
dataset.
14
An example from C. elegans
15
Is it really so simple?
Gene regulation involves a complex combinatorial
dance of numerous factors aside from the presence
or absence of TF binding sites. The authors have
deliberately limited their scope to cis-acting
upstream factors-- ignoring regulatory elements
in introns or downstream regions, as well as the
effects of operons, alternative splicing, histone
modifications, methylation, et cetera
16
Model constraints

Several bits of information were found to be
significant factors in improving the predictive
accuracy of the models
Motif orientiation ( lt--- or ---gt )
Distance from the start codon
The particular order of various TFs
The presence of multiple copies of the same TF
All of those factors were included in the model
as priors.

17
Why is distance from the start codon significant?
From Harbison et al (2004), Nature 431, 99-104
18
The number of copies of a TF binding site is
relevant..
From Molecular Biology of the Cell, 4th edition
19
Motif combinatorics and predictive accuracy
Combinatoric models are more accurate than
single-TF models (unless a gene is under the
control of only one TF).
The order of various TFs is significant
20
Future directions..
Because of the sensitivity of the model(s), even
a very small amount of ambiguity can yield junk
results. For this reason, SAGE data is not
particularly suitable, as only unique SAGE tags
can be said to be unambiguous this in turn
excludes all sorts of potentially useful
data. However, we could use the microarray-based
predictions to pick gene regulatory structures to
investigate..

Write a Comment

User Comments (0)