Sequence allignement 1 - PowerPoint PPT Presentation

About This Presentation
Title:

Sequence allignement 1

Description:

Sequence allignement 1. Chitta Baral. Sequences and Sequence ... Smith-Waterman algorithm. Closely related to the global alignment algorithm. ( few differences) ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 10
Provided by: publi5
Category:

less

Transcript and Presenter's Notes

Title: Sequence allignement 1


1
Sequence allignement 1
  • Chitta Baral

2
Sequences and Sequence allignment
  • Two main kind of sequences
  • Sequence of base pairs in DNA molecules
  • (ATCG)
  • Sequence of aminoacids in a protein molecule
  • A(CDEFGHIKLMNPQRSTVWXY )Z
  • Two main kind of sequence allignment
  • Global alignment
  • LGPSSKQTGKGS-SRIWDN
  • LNI TKSAGKGAIMRLGDA
  • Local alignment
  • ----------TGKG------------------
  • ----------AGKG------------------

3
Importance of sequence alignment
  • Useful for discovering Functional, structural and
    evolutionary information.
  • Functional
  • DNA molecules that are very much alike or
    similar in sequence analysis parlance probably
    have the same regulatory role.
  • Protein molecules that are very much alike
    probably have the same biochemical function
  • Structural
  • Protein molecules that are very much alike
    probably have the same 3-D structure
  • Evolutionary
  • If two sequences from different organisms are
    similar then there may have been a common
    ancestor sequence, and the sequences are then
    defined as being homologous.
  • The alignment indicates the changes that could
    have occurred between the two homologous
    sequences and a common ancestor sequence during
    evolution.

4
Some terminology
  • Homologous Genes that descended from a common
    ancestor are called homologs
  • Sequence homology is different from sequence
    similarity
  • The later is a measure of the matching characters
    in an alignment.
  • sequences show 50 homology or the sequences
    are highly homologous are meaningless.
  • Orthologous when a lineage splits into two
    species
  • Paralogous when a gene is duplicated in a genome

5
Global alignement Needleman-Wunsch algorithm
  • A dynamic programming algorithm
  • Input
  • Two strings x and y of length n and m
    respectively.
  • Scoring table between the sequence alphabets and
    gap penalty
  • Output The alignment with the best score
  • Algorithm terminologies
  • F(i,j) The score of the best alignment between
    the initial segment x1i and y1j
  • Boundary values F(0,0) 0 F(i,0) -id F(0,j)
    -jd where d is the gap penalty.
  • F(i,j) is the maximum of
  • F(i-1, j-1) matching score between xi and yj
  • F(i-1, j) d
  • F(I, j-1) -- d
  • Algorithm steps
  • Fill the table following an appropriate order
  • While filling F(i,j) keep an arrow to the slot
    used in deriving F(i,j)
  • After F(n,m) is determined, trace back and
    construct the alignment.
  • Complexity of the algorithm O(nm). If n m then
    O(n2).
  • Note With biological sequences and standard
    computers O(n2) algorithms are feasible but a
    little slow, while O(n3) algorithms are only
    feasible for very short sequences.

6
Part of BLOSUM50 scoring matrix
7
Illustration of Needleman-Wunsch
8
Local Alignment Smith-Waterman algorithm
  • Closely related to the global alignment
    algorithm. (few differences)
  • Top row and left column now filled with 0s.
  • F(i,j) maximum of
  • 0 means starting a new alignment
  • F(i-1,j-1) s(xi,yj)
  • F(i-1,j) d
  • F(i,j-1) -- d
  • Instead of taking the value in the bottom right
    corner, F(n,m) for the best score, we look for
    the highest value of F(i,j) over the whole matrix
    and start the traceback from there.
  • Traceback ends when we meet a cell with value 0,
    which corresponds to the start of the alignment.

9
Illustration of Smith-Waterman
Write a Comment
User Comments (0)
About PowerShow.com