Statistical Alignment: Computational Properties, Homology Testing and Goodness-of-Fit - PowerPoint PPT Presentation

About This Presentation
Title:

Statistical Alignment: Computational Properties, Homology Testing and Goodness-of-Fit

Description:

each has a mortal link on the right. left end has an immortal link. For example: A G G ... mortal link can give birth to a new mortal link or die out ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 10
Provided by: wen21
Category:

less

Transcript and Presenter's Notes

Title: Statistical Alignment: Computational Properties, Homology Testing and Goodness-of-Fit


1
Statistical Alignment Computational Properties,
Homology Testing and Goodness-of-Fit
  • J. Hein, C. Wiuf, B. Knudsen, M.B. Moller and G.
    Wibling

2
Main Objective of the paper
  • To show how to accelerate the statistical
    alignment algorithms several orders of magnitude
    using the model of insertion and deletions by
    Thorne, Kishino, and Felsenstein in 1991 (TKF91
    model).
  • To propose a new homology test based on the
    model.
  • To describe a goodness-of-fit test that allows
    testing the proposed insertion-deletion process
    inherent to the model.

3
Why isnt statistical alignment popular?
  • Computationally VERY SLOW
  • Authors of the paper accelerated the statistical
    alignment algorithms several orders of magnitude
    compared with the TKF91 algorithm.
  • Lack of user-friendly software?
  • Usually written in Fortran or C, or the compiled
    program only works in UNIX environment, but most
    biologists dont know much about it.
  • Authors of the paper have provided a web
    interface to the program

4
parsimony and similarity alignments
  • Parsimony strategy minimizing the distance
  • For example
  • Similarity strategy maximizing the similarity
    score
  • For example BLAST

5
TKF91 model of substitutions
  • continuous time Markov model on the state space
    of nucleotides or amino acids
  • Rate matrix Q is specified
  • Describes the intensity of different substitution
    events over an infinitesimal time period.
  • Probability that i has changed to j after time t
    is
  • The process is assumed to be time reversible

6
TKF91 model of the indel process
  • Can be view as a Markov model with all sequences
    as possible states
  • indel part of the model
  • links connecting the letters of the sequences
  • each has a mortal link on the right
  • left end has an immortal link
  • For example ? A ? G ? G ?
  • If the type of the nucleotide is ignored, can be
    represented as ? ? ? ?

7
TKF91 model
  • mortal link can give birth to a new mortal link
    or die out
  • immortal link can also give birth but would not
    die
  • Therefore, the rates can be written as
  • ? A ? G ? G ?
  • I0 S1 I1/D1 S2 I2/D2 S3 I3/D3
  • where I is the birth rate
  • D is the death rate, DgtI
  • S is the substitution rate

8
TKF91 model
  • To calculate the probability of a particular
    alignment
  • s(1) ? A ? T ? -
  • s(2) ? C ? T ? G ?
  • P(s(1), s(2), alignment) (p1)(?A?P1 ?PAC)(?T
    ?P2 ? PTT ??G)

9
Calculating the probability of two sequences
  • Without conditioning on the alignment, it is
    necessary to sum over all alignments weighted
    with their probabilities according to the TKF91
    process.
  • Confine likelihood calculations to a band close
    to the similarity based alignment allows an
    efficient numerical optimization algorithm for
    finding the maximum likelihood estimate
  • The recursions originally presented by Thorne,
    Kishino and Felsenstein can be simplified.
Write a Comment
User Comments (0)
About PowerShow.com