Identification of Transcription Factor Binding Sites - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Identification of Transcription Factor Binding Sites

Description:

Going over ALL possibilities. Taking the best one. Principal Methods: Disadvantage : ... Z-score Motif over-representation. Pmax(X) Probability of Zscore = X ... – PowerPoint PPT presentation

Number of Views:153
Avg rating:3.0/5.0
Slides: 47
Provided by: ofer5
Category:

less

Transcript and Presenter's Notes

Title: Identification of Transcription Factor Binding Sites


1
Identification of Transcription Factor Binding
Sites
  • Presenting
  • Mira Tali

2
Goal
Regulatory regions
Motif Binding site???
AGCCA
3
Why Bother?
UNDERSTAND
Gene expression regulation
Co-regulation
4
Difficulties
  • Multiple factors for a single gene
  • Variability in binding sites
  • The nature of variability is NOT well understood
  • Usually Transitions
  • Insertions and deletions are uncommon
  • Location, location, location

5
Experimental methods
  • EMSA Electrophoretic mobility shift assay
  • Nuclease protection assay

NOT ENOUGH!!!!!
6
So, what can we do?
  • Find conserved sequences in regulation regions
  • 1. Define what you want to find
  • 2. Define what is a good result
  • 3. Decide how to find it

7
Principal Methods
  • Global optimum
  • Enumerative methods
  • Going over ALL possibilities
  • Taking the best one

Disadvantage Limited to small search spaces
Advantage Certainty
8
Principal Methods
  • Local optimum
  • Gibbs sampling, AlignACE
  • Start somewhere (arbitrary)
  • Next step direction proportional to what we
    gain from it
  • We can get anywhere with some probability

Disadvantage You can never know
Advantage Basically good results, faster
9
Articles Overview
  • Identifying motifs
  • Expression patterns
  • Phylogenetic footprinting
  • Identifying networks
  • Common motifs in expression clusters
  • Combinatorial analysis

10
Discovery of novel trancription factor binding
sites by statistical overrepresentationS. Sinha,
M. Tompa
Enumeration YMF algorithm
11
What constitutes a motif?(tailored for
S.cerevisiae)
  • In S.cerevisiae typically 6-10 conserved bases
    The motif
  • Spacers varying in length (1-11bp)
  • Usually located in the middle

ACCNNNNNNGTT
Taken from SCPD S.cerevisiae promoter database
12
How do we measure motifs?
  • Z-score Motif over-representation
  • Pmax(X) Probability of Zscore gt X

13
YMF algorithmYeast Motif Finder
Transition Matrix
A set of promoter regions
  • Motif length - l
  • modest values

Maximum number of spacers allowed - w
14
YMF algorithm
FindExplanators artificial overrepresentation
Co-expression score
W-score
TCACGCT (motif)
CACGCTA (artifact)
15
Experiments
  • Validate YMF results
  • Running YMF on regulons with known binding sites
    (SCPD)
  • Run YMF on MIPS catalogs
  • (MIPS - Munich Information center for Protein
    Sequences)
  • Functional
  • Mutant phenotype

16
Validation
17
New binding sites or false positives?
18
A novel site candidate
19
Further research
  • Validation of novel binding sites and
    transcription factors
  • Modification of the algorithm to be applicable
    for other organisms

20
Systematic determination of genetic network
architectureSaeed Tavazoie, Jason D. Hughes,
Michael J. Campbell, Raymond J. Cho, George M.
Church
AlignACE Aligns Nucleic Acid Conserved Elements
21
Clusters
  • Cluster a group of genes with a similar
    expression pattern
  • Clusters members
  • Tend to participate in common processes
  • Tend to be co-regulated

22
Clusters
23
Identifying motifs
  • Using AlignACE 18 motifs from 12 clusters were
    found.
  • 7 of the found motifs were identified
    experimentally

And what about the others????
24
Scanning for more binding sites
  • Once a significant motif was found the whole
    genome was scanned for it
  • Most motifs were cluster specific

25
Why so few motifs?
  • Too stringent rules for defining a significant
    motif
  • Post transcriptional regulation (mRNA stability)
  • Some clusters represent noise

26
Tightness
  • Tightness of a cluster
  • how close are the cluster members of a particular
    cluster to its mean
  • A strong correlation between the presence of
    significant motifs and the tightness of a
    cluster

27
Things to remember
  • Discovering regulons and motifs using expression
    based clustering
  • Minimal biases
  • Validation as a methodology for new organisms
  • Identifying expected cis-regulatory motif EACH
    TIME!!

28
Identifying regulatory networks by combinatorial
analysis of promoter elementsby Yitzhak Pilpel,
Priya Sudarsanam George M.Church
Goals
Identify motif combinations affecting expression
patterns in yeast
Understand transcriptional network
29
Basic definitions
  • Expression coherence score-
  • Synergistic motifs
  • EC(ab) gt EC(a\b) , EC(b\a)

30
Methods
A database of motifs
Gene sets
Calculating EC score
Significant synergistic combinations
Visualizing the transcriptional network
Understanding the effect of individual and
combination of motifs
31
GMC
  • GMC Gene Motif Combination.
  • Motif numbers
  • (m1, m2, m3, m4, m5) (1,0,1,1,0)
  • Synergistic motif combination-
  • EC(n motifs) gt max(EC(n-1 motifs))
  • GMC what is it good for?

32
Combinograms
  • Clustering
  • GMCs

33
Combinograms what is it good for?
  • They help visualizing the single motif -
    specific expression pattern connection
  • They also show which motif is more critical in
    determining expression pattern.

34
Motif synergy mapvisualizing transcription
networks
35
conclusion
  • The combinogram importance
  • The motif synergy map importance

36
Phylogenetic footprinting of transcription
factor binding sites in proteobacterial
genomesLee Ann McCue, William Thompson, C.Steven
Carmack, Michael P.Ryan, Jun S.Liu, Victoria
Derbyshire and Charles E.Lawrence
Goals
Identifying novel TF binding sites in E.coli
Describing transcription regulatory network
Local optimum Gibbs sampling algorithm
37
Methods
One E.coli gene and orthologs
38
Applying the method in a small scale Validation
  • Choosing 190 E.coli genes.
  • Creating 184 data sets.
  • Running Gibbs sampling algorithm.
  • More than 67 success in the prediction for the
    most probable motif.

39
Motif Model
40
Identification of the YijC binding sites
  • A strongly predicted site was upstream of the
    fabA, fabB and yqfA genes.
  • Chromatography identifying the factor.

41
Identifying the YijC binding sites and predicting
gene function
  • Mass spectrometry identification YijC
  • Predicting a function for yqfA.

weight
fabA
fabB
yqfA
fadB
42
Applying the method genome wide
  • Choosing 2113 E.coli ORFs.
  • For 2097 a TF-binding site was predicted.

43
Map scores- ortholog distribution
Study set
Full set
44
Adding binding sites for known TFs
  • Building a TF binding site model for known TFs.
  • Scanning E.coli upstream regions.
  • 187 new probable sites.

45
Building a regulatory network
  • Required steps
  • Identifying motif models
  • Clustering the models
  • Problem
  • Specifity

46
Conclusion
  • What have we gained so far?
  • A better prediction of gene function.
  • New possibilities for identification of TF
    binding site and the TF which binds them!!!
Write a Comment
User Comments (0)
About PowerShow.com