Dynamic Programming Multiple Alignment - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Dynamic Programming Multiple Alignment

Description:

We can handle this with a dynamic programming approach. ... Frame1: A T G C T T A G T C T G. Frame2: A T G C T T A G T C T G. Frame3: A T G C T T A G T C T G ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 12
Provided by: randyz
Category:

less

Transcript and Presenter's Notes

Title: Dynamic Programming Multiple Alignment


1
Lecture 10
  • Dynamic Programming - Multiple Alignment Gene
    Finding
  • Jones Pevzner,
  • Secs. 6.10-6.16

2
Multiple Alignment
  • We are often interested in simultaneously
    aligning a collection of sequence so as to best
    highlight the similarities within a family.
  • We can handle this with a dynamic programming
    approach.
  • Construct a graph analogous to that used for
    pair-wise alignment, but corresponding to a
    matrix of dimension k, with k the number of
    sequences.
  • Have strings v1,,vk, each with character counts
    ni.
  • The maximum length of the alignment is the
    maximum among the ni.

3
Constructing the graph
  • Each cell now has 2k - 1 predecessors (can we
    show that?)
  • We have characters drawn from an extended
    alphabet A, which includes a gap character.
  • If we follow the original DP approach, our
    scoring matrix also needs to be k dimensional.

4
Recurrence relations for three sequences
5
Performance of rigorous multiple alignment
  • Very slow, runs in O((2n)k), where n is the
    typical sequence length and k the number of
    sequences.
  • Not practical for anything other than a small
    family of sequences
  • An active area of research to find more efficient
    algorithms, often using heuristics.

6
Progressive alignment
  • One approach that doesnt work - making all
    possible pair-wise alignments and then assembling
    them at the end.
  • Typically the individual alignments will not be
    compatible with a multiple alignment
  • In progressive alignment (ala CLUSTAL), the
    strongest pair-wise alignment is used as a seed
    to build up a profile, which is progressively
    extended by adding more sequences.

7
The problem with progressive alignment
  • The strongest initial alignment may be a bad
    seed that spoils the multiple alignment.
  • This can be circumvented (somewhat) by
    considering all possible initial pairs as seeds,
    no matter their score.

8
Scoring multiple alignments
  • A k-dimensional scoring matrix is not practical!
  • One useful measure is the sum of the entropies
    computed for the columns
  • Another measure is to score using all unique
    pairs of comparisons within a column, and
    applying a conventional scoring matrix (e.g.
    PAM-250)

9
Gene Prediction
  • Genomic sequences are translated into peptide
    sequences by the mechanism of codons. One codon
    is a triple of three consecutive nucleotides.
  • There are 43 64 codons. 61 of these map to one
    of the twenty amino acids, while 3 of them (TAA,
    TAG, TGA) are stop codons which terminate a
    protein coding sequence. They are like periods at
    the end of a sentence
  • Note that U (uracil) substitutes for T (thymine)
    in the messenger RNA copies of genomic sequences.

10
Reading frames
Frame1 A T G C T T A G T C T G Frame2 A T G
C T T A G T C T G Frame3 A T G C T T A G T C
T G
Translations Met Leu Ser Leu Cys Leu Val Va
l STOP
11
Open Reading Frames (ORFs) are clues for finding
genes
  • Based on random chance alone, the average length
    of an ORF is about 21 codons.
  • Translated peptides are typically 300 amino acids
    in length
  • Long ORFs can be an indication of the presence of
    a coding region
Write a Comment
User Comments (0)
About PowerShow.com