Ab initio motif finding - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Ab initio motif finding

Description:

Ab initio motif finding. Ryo Shimizu. Agenda. Background ... curate. Identification of DNA-binding residues. PROSITE motif pattern: CX(2 4)CX(11 13)HX(3 5)H ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 43
Provided by: aiSta
Category:
Tags: curate | finding | initio | motif

less

Transcript and Presenter's Notes

Title: Ab initio motif finding


1
(No Transcript)
2
Ab initio motif finding
  • Ryo Shimizu

3
Agenda
  • Background / motivation
  • Paper 1
  • Paper 2
  • Conclusion

4
Central Dogma
Transcription
Translation
mRNA A,C,G,U
Amino Acid
DNA A,C,G,T
Protein
Folding
Image credits DNA, mRNA, Protein, Amino acid
5
Impacts of gene regulation
  • Functioning of an organism
  • Development of an organism
  • Evolution of organisms

6
Transcription
  • Process in which mRNA is made using DNA as a
    template
  • Only genes are transcribed
  • Regulated by transcription factors

7
Transcription movie
8
Binding Site
  • Region on a protein, DNA, or RNA to which ligands
    attach

9
Motif
  • Common sequence pattern in the binding sites of
    a transcription factor
  • A succinct way of capturing variability among the
    binding sites

credit
10
Motif representation
  • Consensus Sequence

XTCATCAX
  • Position Specific Scoring Matrix
  • A graph

PSSM graph
11
Ab initio Motif finding
  • Say a transcription factor (TF) controls five
    different genes
  • Each of the five genes will have binding sites
    for the TF in their promoter region

12
Ab initio Motif finding
  • GIVEN promoter regions of the five genes G1,,G5
  • FIND binding sites of TF, without prior
    knowledge

13
Agenda
  • Background / motivation
  • Paper 1
  • Paper 2
  • Conclusion

14
Paper 1
  • Ab Initio Prediction of Transcription Factor
    Targets Using Structural Knowledge
  • Tommy Kaplan1,2, Nir Friedman1, Hanah Margalit2

15
Overview
Known binding sites of others in the same protein
family
Identify binding site of new proteins (of that
family)
Same family !? Same binding specifity of residues
Prediction
16
Application Cys2His2 Zinc Finger
Largest known DNA-binding family in multicellular
organism
Extensively studied
Has a stringent binding model
17
Curation of Zinc Finger sequences and their
binding sites
31 Experimentally determined canonical domains
Classified as Canonical / Non-canonical
output
train
Profile HMM
curate
Cys2His2 in TRANSFAC database
input
61 Canonical Fingers 455 protein-binding site
pairs
18
Identification of DNA-binding residues
canonical binding model of solved proteinDNA
complex of Egr-1
PROSITE motif pattern CX(24)CX(1113)HX(35)H
19
Estimating DNA Recognition Preferences
  • INPUT set of pairs of transcription factors and
    their target DNA sequences.

TF
Target DNA sequence
20
Probabilistic model of binding preferences
Set of interacting residues in the 4 positions, p
of the k fingers
E.g. A1,2 ? Set of interacting residue for finger
1 at position 2
C.Prob of interaction with DNA subseq, starting
from jth pos in DNA
Pp(NA) ? c. prob. of nucleotide N given amino
acid A at position p. N1,NL ? target DNA sequence
21
Where did the P2 term go?
22
Estimating DNA Recognition Preferences
  • Apply Expectation Maximization

Identify binding locations
Optimize recognition preferences
23
Expectation Maximization algorithm
E
M
Initial guess of DNA recog. Pref.
Compute expected posterior probability of binding
locations for all proteinDNA pair.
Maximize likelihood of the current binding
positions for all proteinDNA pairs based on the
distribution of possible binding locations
24
Probabilistic model output
Tot. height info content Rel. height
probability C intensity confidence
(Phenylalanine, Cytocine) Prevalent in position 2
(Lysine,Guanine) ? irrespective of position
25
Ab initio genome wide prediction
16,201 putative gene products
29 canonical fingers
DNA recognition preference
Binding site model
Scan the promoter regions
image
26
ResultsD.Melanogaster
Transcription Factors
Match prior biological knowledge!!
GO Terms
Blue cellssignificant enrichment of GO terms
27
ResultsD.Melanogaster
Transcription Factors
Embryogenesis phase
At least one significant embryogenesis experiment
28
Agenda
  • Background / motivation
  • Paper 1
  • Paper 2
  • Conclusion

29
Paper 2
  • MotifCut regulatory motif finding with maximum
    density subgraphs
  • Eugene Fratkin, Brian Naughton, Douglass Brutlag,
    Serafim Batzoglou

30
Drawbacks of existing methods
  • As an optimization problem
  • Intractable
  • Relies on EM or local heuristic search

31
Drawbacks of existing methods
WRONG!
Indeependence assumption biologically unrealistic
32
Overview
  • Nodes k-mers of input sequence
  • Edges pairwise k-mer similarity
  • Motif search ? maximum density subgraph

33
MotifCut Algorithm
  • Convert sequence into a collection of k-mers
  • Each overlap/duplicate considered distinct

34
MotifCut Algorithm
  • For every pair of vertices (vi, vj) create an
    edge with weight wij
  • wij f( mismatches bet. k-mers in vi, vj)

M ? k-mers of binding site B ? background k-mers
35
Resulting graph
Note should be maximally connected!
36
MotifCut Algorithm
  • Find the maximum density subgraph
  • Parametric flow algorithm (Gallo et al, 1989)
  • A type of fractional programming
  • Iteratively apply push/relable to find max-flow
    and min-cut
  • O(VElog(V2E)) ? too slow!

37
MDS optimization
Pick a center of neighborhood
Discard edges with weight lt w
Re-introduce all edges in neighborhood
Run MDS in neighborhood
Repeat for every vertex
38
Results
  • Synthetic Data
  • vs MEME(Bailey et al, 1995)
  • vs AlignAce (Hughes et al, 2000)
  • vs BioProspector (Liu et al, 2001)
  • Yeast Data

39
Synthetic benchmark results
40
Results Running time and yeast data
41
Agenda
  • Background / motivation
  • Paper 1
  • Paper 2
  • Conclusion

42
Conclusion
  • Ab initio motif finding
  • Use of structural knowledge
  • Graph representation of motifs
Write a Comment
User Comments (0)
About PowerShow.com