Motif Refinement using Hybrid Expectation Maximization Algorithm - PowerPoint PPT Presentation

About This Presentation
Title:

Motif Refinement using Hybrid Expectation Maximization Algorithm

Description:

Motifs are certain patterns in DNA and protein ... Finding these conserved patterns might be very useful for controlling the ... of Hessian matrix. ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 32
Provided by: dimacsR
Category:

less

Transcript and Presenter's Notes

Title: Motif Refinement using Hybrid Expectation Maximization Algorithm


1
Motif Refinement using Hybrid Expectation
Maximization Algorithm
Chandan ReddyYao-Chung WengHsiao-Dong
ChiangSchool of Electrical and Computer
Engr.Cornell University, Ithaca, NY - 14853.
2
Motif Finding Problem
  • Motifs are certain patterns in DNA and protein
    sequences that are strongly conserved i.e. they
    have important biological functions like gene
    regulation and gene interaction
  • Finding these conserved patterns might be very
    useful for controlling the expression of genes
  • Motif finding problem is to detect novel,
    over-represented unknown signals in a set of
    sequences (for eg. transcription factor binding
    sites in a genome).

3
Motif Finding Problem
Consensus Pattern - CCGATTACCGA
( l, d ) (11,2) consensus pattern
4
Problem Definition
  • Without any previous knowledge about the
    consensus pattern, discover all instances
    (alignment positions) of the motifs and then
    recover the final pattern to which all these
    instances are within a given number of mutations.

5
Complexity of the Problem
Let n is the length of the DNA sequence l is
the length of the motif t is the number of
sequences d is the number of mutations in a
motif The running time of a brute force
approach There are (n-l1) l-mers in each of t
sequences. Total combination is (n-l1)t l-mers
for t sequences. Typically, n is much larger
than l. ie. n 600, t 20.
6
Existing methodologies
  • Generative probabilistic representation -
    continuous
  • Gibbs Sampling
  • Expectation Maximization
  • Greedy CONSENSUS
  • HMM based
  • Mismatch representation Discrete Consensus
  • Projection Methods
  • Multiprofiler
  • Suffix Trees

7
Existing methodologies
  • Global Solvers
  • Advantage neighborhood of global optimal
    solutions.
  • Disadvantage misses out better solutions
    locally.
  • ie Random Projection, Pattern Branching, etc
  • Local Solvers
  • Advantage returns best solution in
    neighborhood.
  • Disadvantage relies heavily on initial
    conditions.
  • ie EM, Gibbs Sampling, Greedy CONSENSUS, etc

8
Our Approach
  • Performs global solver to estimate neighborhood
    of a promising solution. (Random Projection)
  • Using this neighborhood as initial guess, apply
    local solver to refine the solution to be the
    global optimal solution. (Expectation
    Maximization)
  • Performs efficient neighborhood search to jump
    out of convergence region to find another local
    solutions systematically.
  • A hybrid approach includes the advantages of
    both the global and local solvers.

9
Random Projection
  • Implements a hash function h(x) to map l-mer
    onto a k-dimensional space.
  • Hashes all possible l-mers in t sequences into
    4k buckets where each bucket corresponds an
    unique k-mer.
  • Imposing certain conditions and setting a
    reasonable bucket threshold S, the buckets that
    exceed S is returned as the solution.

10
Expectation Maximization
  • Expectation Maximization is a local optimal
    solver in which we refine the solution yielded by
    random projection methodology. The EM method
    iteratively updates the solution until it
    converges to a locally optimal one.
  • Follow these steps
  • Compute the scoring function
  • Iterate the Expectation step and the
    Maximization step

11
Profile Space
A profile is a matrix of probabilities, where
the rows represent possible bases, and the
columns represent consecutive sequence positions.
  • Applying the Profile Space into the coefficient
    formula constructs PSSM.

12
Scoring function- Maximum Likelihood
13
Expectation Step
  • The Expectation step returns the expected number
    of jth residue in each position of the motif
    instance and overall sequence. The algorithm is
    as follows
  • Obtains ?k,j from the previous M-step
    iteration.
  • Uses ?k,j to calculate the probability of all
    possible l-mers against the expected motif.
  • Given probability of each l-mer, calculates
    probability of the correct starting position for
    each l-mer using Bayes formula.
  • Multiplying weight to each position of each
    l-mer, calculate the expected number of j at
    position k.

14
Maximization Step
  • The Maximization Step receives the expected
    values passed on by E-Step to calculate the new
    probability ?k,j and ?0,j and return them for
    E-Step.
  • ?(q)k,j Ek,j / t , ?(q)0,j E0,j / (t
    n-l )
  • If ?(q) ?(q-1), then iteration ends. All the
    local optimal solution sites are returned with
    the consensus made up of jth residue with
    highest probability at kth position.
  • Else, ?(q)k,j and ?(q)0,j are used to the q1
    iteration of the E-Step.

15
(No Transcript)
16
Basic Idea
  • one-to-one correspondence of the critical points

Local Minimum
Stable Equilibrium Point
Saddle Point
Decomposition Point
Local Maximum
Source
17
Theoretical Background
Practical Stability Boundary
The problem of finding all the Tier-1 stable
equilibrium points of xs is the problem of
finding all the decomposition points on its
stability boundary
18
Theoretical background
Theorem (Unstable manifold of type-1 equilibrium
point) Let xs1 be a stable e.p. of the
gradient system (2) and xd be a type-1 e.p. on
the practical stability boundary ?Ap(xs). Assume
that there exist e and d such that ?f (x) gt e
unless x ? x ?f (x) 0. If every e.p. of (1)
is hyperbolic and its stable and unstable
manifolds satisfy the transversality condition,
then there exists another stable e.p. xs2 to
which the one dimensional unstable manifold of xd
converges.Our method finds the stability
boundary between the two local minima and traces
the stability boundary to find the saddle point.
We used a new trajectory adjustment procedure to
move along the practical stability boundary.
19
Definitions
Def 1 x is said to be a critical point of (1)
if it satisfies the condition ?f (x) 0 where f
(x) is the objective function assumed to be in
C2(?n, ?).The corresponding nonlinear dynamical
system is -------- Eq. (1) The solution
curve of Eq. (1) starting from x at time t 0 is
called a trajectory and it is denoted by F( x ,
.) ? ? ?n. A state vector x is called an
equilibrium point (e.p.) of Eq. (3) if f ( x )
0.
20
Definitions (contd.)
Def 2 An equilibrium point is said to be
hyperbolic if the Jacobian of f at point x has no
eigenvalues with zero real part. A hyperbolic
e.p. is a Stable e.p. - if all the eigenvalues
of its Jacobian have negative real part.
Unstable e.p. - if some eigenvalues have
positive real part. Type-k e.p. - if its
Jacobian has exact k eigenvalues with positive
real part.We propose to build a negative
gradient system associated with ( 1) as shown
below dx /dt - ?f (x) -------- Eq. (2)
21
Definitions (contd.)
A dynamical system is completely stable if every
trajectory of the system leads to one of its
stable equilibrium points. Def 3 The
stability region (or region of attraction) of a
stable equilibrium point xs of a nonlinear
dynamical system (1) is denoted by A(xs) and is
A(xs) x ? ?n limt?8 F( x , t) xs
The boundary of stability region is called the
stability boundary of xs and is represented as
?A(xs).
22
Definitions (contd.)
Def 4 The practical stability region of a
stable equilibrium point xs of a nonlinear
dynamical system (1), denoted by Ap(xs) and is
. The practical
stability boundary (? Ap(xs) ) is a subset of its
stability boundary. It eliminates the complex
portion of the stability boundary which has no
contact with the complement of the closure of
the stability region. Def 5 A decomposition
point is a type-1 equilibrium point xd on the
practical stability boundary of a stable
equilibrium point xs .
23
Theoretical background
Theorem 1 (Unstable manifold of type-1
equilibrium point) Let xs1 be a stable e.p. of
the gradient system (2) and xd be a type-1 e.p.
on the practical stability boundary ?Ap(xs).
Assume that there exist e and d such that ?f
(x) gt e unless x ? x ?f (x) 0. If every
e.p. of (1) is hyperbolic and its stable and
unstable manifolds satisfy the transversality
condition, then there exists another stable e.p.
xs2 to which the one dimensional unstable
manifold of xd converges.Our method finds the
stability boundary between the two local minima
and traces the stability boundary to find the
saddle point. We used a new trajectory adjustment
procedure to move along the practical stability
boundary.
24
Our Method
25
Search Directions
26
Search Directions
27
Our Method
  • The exit point method is implemented so that EM
    can move out of its convergence region to seek
    out other local optimal solutions.
  • Construct a PSSM from initial alignments.
  • Calculate eigenvectors of Hessian matrix.
  • Find exit points (or saddle points) along each
    eigenvector.
  • Apply EM from the new stability/convergence
    region.
  • Repeat first step.
  • Return max score A, a1i, a2j

28
Results
29
Improvements in the Alignment Scores
30
Improvements in the Alignment Scores
Random Projection method results
31
Performance Coefficient
K is the set of the residue positions of the
planted motif instances, and P is the
corresponding set of positions predicted
32
Results
Different Motifs and the average score using
random starts. The first tier and second tier
improvements on synthetic data.
33
Results
Different Motifs and the average score using
random projection. The first tier and second tier
improvements on synthetic data.
34
Results
Different Motifs and the average score using
random projections and the first tier and second
tier improvements on real human sequences.
35
Results on Real data
36
Concluding discussion
  • Using dynamical system approach, we have shown
    that the EM algorithm can be improved
    significantly.
  • In the context of motif finding, we see that
    there are many local optimal solutions and it is
    important to search the neighborhood space.
  • Try different global methods and other
    techniques like GibbsDNA

37
Questions and suggestions !!!!!
Write a Comment
User Comments (0)
About PowerShow.com