Sequence Alignment - III - PowerPoint PPT Presentation

About This Presentation
Title:

Sequence Alignment - III

Description:

d (g-1) e where d = e log k ## affine score. Repeated matches ... E. H. G. W. A. G. A. E. H. Next. Alignment with affine gap scores. Heuristic based approach. ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 8
Provided by: publi5
Category:

less

Transcript and Presenter's Notes

Title: Sequence Alignment - III


1
Sequence Alignment - III
  • Chitta Baral

2
Scoring Model
  • When comparing sequences
  • Looking for evidence that they have diverged from
    a common ancestor by a process of mutation and
    selection
  • Basic mutational processes
  • Substitutions
  • insertions deletions (together referred to as
    gaps)
  • Total Score
  • sum for each aligned pair terms for each gap
  • Corresponds to logarithm of the related
    likelihood that the sequences are related,
    compared to being unrelated.
  • Identities and conservative substitutions to be
    more likely (than by chance) contribute positive
    score terms
  • Non-conservative changes are observed to be less
    frequently in real alignments than we expect by
    chance contribute negative score terms
  • Additive scoring scheme Based on assumption that
    mutations at different sites in a sequence to
    have occurred independently
  • Reasonable for DNA and protein sequences
  • Inaccurate for structural RNAs

3
Substitution Matrices
  • Notation pair of sequence x1..n and y1..m
  • Let xi be the ith symbol in x
  • And yj be the jth symbol in y
  • Let pxiyi probability that xi and yi are
    related
  • Let qxi probbaility that we have xi by chance
  • Frequency of occurrence of xi
  • Score log P(x and y supposing they are
    related)/ P (x and y supposing they are
    unrelated)
  • P(x and y supposing they are related) px1y1
    px2y2
  • P(x and y supposing they are unrelated)

  • qx1q x2 X qy1qy2
  • Odds ratio (px1y1/qx1qy1) X (px2y2/qx2qy2) X
  • Log-odds ratio s(x1,y1) s(x2, y2)
  • Where s(a,b) log (pab/qaqb)
  • The s(a,b) table is known as the score matrix or
    substitution matrix

4
Gap Penalties
  • Also based on a probabilistic model of alignment
  • Less widely recognized than the probabilistic
    basis of substitution matrices
  • Gap of length g due to insertion of a1ag
  • p(gap because of mutation) f(g) (qa1qag)
  • p(having a1ag by chance) qa1qag
  • Ratio f(g)
  • Log of ratio log (f(g))
  • Geometric distribution f(g) ke-xg
  • Suppose f(g) e-gd then log of ratio -gd
    linear score
  • Suppose f(g) ke-ge then log of ratio -ge
    log k
  • -ge e (log k - e) - (e - log k) (g
    1) e
  • - d (g-1) e where d e log k
    affine score

5
Repeated matches
  • A big string x1..n and smaller string y1..m
  • Asymmetric looking for multiple matches of y in
    x.
  • As we do the matching and fill the table, we need
    to decide when to stop going further in y, and
    start over from the beginning of y.
  • F(i,0) Assuming xi is in an unmatched region,
    what is the best total score so far.
  • F(i,j), j gt 1 Assuming xi is in a matched
    region and the last matching ends at xi and yj,
    the best total score so far.
  • F(0,0) 0.
  • F(i,i) maximum of F(i,0) F(i-1,j-1)
    s(xi,yj) F(i-1,j)-d F(i,j-1) d
  • F(i,0) corresponds to start over option (but now
    we store the total score so far)
  • F(i,0) maximum of
  • F(i-1,0)
  • F(i-1, j) T j 1, , m
  • T is a threshold and we are only interested in
    matches scoring higher than the threshold.
    (Important because there are always short local
    alignments with small positive scores even
    between entirely unrelated sequences.)

6
Illustration of repeated matches
H E A G A W G H E E
0 0 0 0 1 1 1 1 1 3 9 ?9
P 0 0 0 0 1 1 1 1 1 3 9
A 0 0 0 5 1 6 1 1 1 3 9
W 0 0 0 0 2 1 21 13 5 3 9
H 0 10 2 0 1 1 13 19 23 15 9
E 0 2 16 8 1 1 5 11 19 29 21
A 0 0 8 21 13 6 1 5 11 21 28
E 0 0 6 13 18 12 4 1 5 17 27
7
Next
  • Alignment with affine gap scores.
  • Heuristic based approach.
Write a Comment
User Comments (0)
About PowerShow.com