Michael%20A.%20Beer%20and%20Saeed%20Tavazoie - PowerPoint PPT Presentation

About This Presentation
Title:

Michael%20A.%20Beer%20and%20Saeed%20Tavazoie

Description:

Dept. of Molecular Biology. Mike Beer. Postdoctoral Researcher ... From Molecular Biology of the Cell, 4th edition. Motif combinatorics and predictive accuracy ... – PowerPoint PPT presentation

Number of Views:225
Avg rating:3.0/5.0
Slides: 21
Provided by: donmo3
Category:

less

Transcript and Presenter's Notes

Title: Michael%20A.%20Beer%20and%20Saeed%20Tavazoie


1
  • Michael A. Beer and Saeed Tavazoie
  • Cell 117, 185-198 (16 April 2004)

2
The Authors
Saeed Tavazoie (middle) Professor Dept. of
Molecular Biology
Mike Beer Postdoctoral Researcher Ph.D, Princeton
(1995)
3
The Question
  • Transcription factor binding sites are
    relatively well-characterized in Saccharomyces
    cerevisiae
  • But - the presence of a TF binding site alone
    is not sufficient to predict expression of a gene
  • Multiple regulatory factors are often involved
  • How do you identify the elaborate rules for gene
    regulation?

4
Simple regulatory structures
Each possible combination of TFs must be tested
in the lab This is a hugely time-consuming task..
5
Problems with predicting gene regulation
Regulatory motif sequences have low consensus
e.g. The well known TATA box has a consensus
of TATA(A/T)A(A/T)(A/G)
Numerous transcription factors can bind to any
one motif
Many genes have multiple known motifs upstream of
ATG
6
Example of cis-regulatory logic
From Yuh et al (1998), Science 279, 1896-1902
7
The Approach
1. Using microarray expression data, the authors
built clusters of genes with similar expression
patterns.
From brain expression data in Wen et al (1998),
PNAS 95, 334-339
8
The Approach, cont.
2. From groups of genes with similar expression
patterns, a search is undertaken for consensus
sequence motifs within 800bp upstream of ATG in
each cluster.
9
The Approach, cont
  • 3. The authors built a Markov model using the
    TF sequence motifs as parent nodes, and the
    expression data as data values.
  • This can be applied to a gene of interest by
    identifying the upstream TF motifs for that gene,
    and finding the model(s) that best fits the known
    upstream TF motifs.
  • If the expression data is within the parameters
    predicted by the model, then there is a decent
    chance that its associated gene regulatory
    structure can be verified experimentally.

10
Two examples from yeast
Both clusters have at least 10 genes each, and
there is some confidence that genes with the same
upstream TFs will exhibit the same expression
pattern as these clusters.
11
Constructing the models
Using expression data from 30 microarrays, the
authors identified 5547 genes with significant
expression levels in yeast, and this data was
used to construct 49 models of expression
patterns.
12
Predictive accuracy
These 49 models were applied to five test sets of
expression data, using only the upstream 800 bp
region as input. They found that the expression
pattern was correctly predicted for 1898 genes
out of the test set(s) of 2587 genes. This
amounts to 73 accuracy (random would be 1/49, or
2).
13
Application to C. elegans
Given the larger amount of regulatory sequences
in higher order organisms, and the potential for
more complex regulation, the authors had low
expectations for applying this model to C.
elegans. Using 2000 bp of upstream sequence,
and microarray expression data including Hill
(2000), the authors were surprised to learn that
they could predict expression patterns for
roughly half of the genes in the C. elegans
dataset.
14
An example from C. elegans
15
Is it really so simple?
Gene regulation involves a complex combinatorial
dance of numerous factors aside from the presence
or absence of TF binding sites. The authors have
deliberately limited their scope to cis-acting
upstream factors-- ignoring regulatory elements
in introns or downstream regions, as well as the
effects of operons, alternative splicing, histone
modifications, methylation, et cetera
16
Model constraints
  • Several bits of information were found to be
    significant factors in improving the predictive
    accuracy of the models
  • Motif orientiation ( lt--- or ---gt )
  • Distance from the start codon
  • The particular order of various TFs
  • The presence of multiple copies of the same TF
  • All of those factors were included in the model
    as priors.

17
Why is distance from the start codon significant?
From Harbison et al (2004), Nature 431, 99-104
18
The number of copies of a TF binding site is
relevant..
From Molecular Biology of the Cell, 4th edition
19
Motif combinatorics and predictive accuracy
Combinatoric models are more accurate than
single-TF models (unless a gene is under the
control of only one TF).
The order of various TFs is significant
20
Future directions..
Because of the sensitivity of the model(s), even
a very small amount of ambiguity can yield junk
results. For this reason, SAGE data is not
particularly suitable, as only unique SAGE tags
can be said to be unambiguous this in turn
excludes all sorts of potentially useful
data. However, we could use the microarray-based
predictions to pick gene regulatory structures to
investigate..
Write a Comment
User Comments (0)
About PowerShow.com