Gibbs Sampling in Motif Finding Slides by Serafim Batzoglou - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Gibbs Sampling in Motif Finding Slides by Serafim Batzoglou

Description:

Remove one sequence xi. Recalculate model. Pick a new location of motif in xi according to probability the location is a motif occurrence ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 14
Provided by: root47
Category:

less

Transcript and Presenter's Notes

Title: Gibbs Sampling in Motif Finding Slides by Serafim Batzoglou


1
Gibbs Sampling in Motif FindingSlides by
Serafim Batzoglou
2
Gibbs Sampling
  • Given
  • x1, , xN,
  • motif length K,
  • background B,
  • Find
  • Model M
  • Locations a1,, aN in x1, , xN
  • Maximizing log-odds likelihood ratio

3
Gibbs Sampling
  • AlignACE first statistical motif finder
  • BioProspector improved version of AlignACE
  • Algorithm (sketch)
  • Initialization
  • Select random locations in sequences x1, , xN
  • Compute an initial model M from these locations
  • Sampling Iterations
  • Remove one sequence xi
  • Recalculate model
  • Pick a new location of motif in xi according to
    probability the location is a motif occurrence

4
Gibbs Sampling
  • Initialization
  • Select random locations a1,, aN in x1, , xN
  • For these locations, compute M
  • That is, Mkj is the number of occurrences of
    letter j in motif position k, over the total

5
Gibbs Sampling
  • Predictive Update
  • Select a sequence x xi
  • Remove xi, recompute model

M
  • where ?j are pseudocounts to avoid 0s,
  • and B ?j ?j

6
Gibbs Sampling
  • Sampling
  • For every K-long word xj,,xjk-1 in x
  • Qj Prob word motif M(1,xj)??M(k,xjk-1)
  • Pi Prob word background B(xj)??B(xjk-1)
  • Let
  • Sample a random new position ai according to the
    probabilities A1,, Ax-k1.

Prob
0
x
7
Gibbs Sampling
  • Running Gibbs Sampling
  • Initialize
  • Run until convergence
  • Repeat 1,2 several times, report common motifs

8
Advantages / Disadvantages
  • Very similar to EM
  • Advantages
  • Easier to implement
  • Less dependent on initial parameters
  • More versatile, easier to enhance with heuristics
  • Disadvantages
  • More dependent on all sequences to exhibit the
    motif
  • Less systematic search of initial parameter space

9
Repeats, and a Better Background Model
  • Repeat DNA can be confused as motif
  • Especially low-complexity CACACA AAAAA, etc.
  • Solution
  • more elaborate background model
  • 0th order B pA, pC, pG, pT
  • 1st order B P(AA), P(AC), , P(TT)
  • Kth order B P(X b1bK) X, bi?A,C,G,T
  • Has been applied to EM and Gibbs (up to 3rd
    order)

10
Example Application Motifs in Yeast
  • Group
  • Tavazoie et al. 1999, G. Churchs lab, Harvard
  • Data
  • Microarrays on 6,220 mRNAs from yeast Affymetrix
    chips (Cho et al.)
  • 15 time points across two cell cycles

11
Processing of Data
  • Selection of 3,000 genes
  • Genes with most variable expression were selected
  • Clustering according to common expression
  • K-means clustering
  • 30 clusters, 50-190 genes/cluster
  • Clusters correlate well with known function
  • AlignACE motif finding
  • 600-long upstream regions
  • 50 regions/trial

12
Motifs in Periodic Clusters
13
Motifs in Non-periodic Clusters
Write a Comment
User Comments (0)
About PowerShow.com