Title: Highresolution computational models of genome binding events
1High-resolution computational models of genome
binding events
- Yuan (Alan) Qi
- Joint work with Gifford and Young labs
Dana-Farber Cancer Institute Jan 2007
2ChIP-chip Experiments
- ChIP-chip data
- Encode valuable information about protein-DNA
binding events. - Goal
- Decode accurate binding information from the
noisy data. - Challenges
- Noise
- Joint influence of multiple binding events
3Joint Binding Deconvolution
Data Likelihood
Prior Distributions
Hyper Prior Distributions
JBD generative probabilistic graphical model.
4Shear Distribution
(b) An influence function is derived from the
measured fragment size distribution.
(a) The distribution of DNA fragment sizes
produced in the ChIP protocol were experimentally
measured and statistically modeled.
5Approximate Bayesian Inference
Exact Bayesian posterior of binding events
Where and
Non-conjugate models, thousands of variables -gt
Intractable calculations of the exact posterior
distribution!
Message passing algorithm (Expectation
propagation)
EP iteratively refines the factor approximations
(i.e., messages) to improve the posterior
approximation.
6EP in a Nutshell
- Approximate a probability distribution by
simpler parametric terms - Each approximation term lives in an
exponential family (e.g., Gaussian or Gamma
distributions).
7EP in a Nutshell
- Three key steps
- Deletion Approximate the leave-one-out
posterior distribution for the ith factor. - Minimization Minimize the following KL
divergence by moment matching. - Inclusion
8Results
9(No Transcript)
10(No Transcript)
11Spatial resolution comparison between JBD and
other methods
- The average distance of JBDs Gcn4 binding
predictions to motif sites is smaller than for
other methods, and JDB identifies more known Gcn4
targets.
12JBD better resolves proximal binding events than
do other methods. Shown here is performance of
the JBD, MPeak and Ratio methods on 200 simulated
DNA regions each containing two binding events.
13Using binding posterior to guide motif discovery
- Approach
- Using binding posterior probabilities derived
from the ChIP-chip data to weight sequence
regions differently for motif discovery. - Results
- Finding Mig2 motif while a standard motif
discovery algorithm (e.g., MEME) failed. - Note that the correct motif for Mig2 was not
recovered when using the Ratio method to analyze
the ChIP-chip data.
14Positional priors for motif discovery improve
robustness to false input DNA sequence regions.
15Questions?