Sequence Alignment - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Sequence Alignment

Description:

Smith-Waterman, BLAST. Number of sequences. Pairwise alignment - only two sequences compared. Dotmatrix, Needleman-Wunsch, Smith-Waterman, BLAST ... – PowerPoint PPT presentation

Number of Views:392
Avg rating:3.0/5.0
Slides: 45
Provided by: ryang
Category:

less

Transcript and Presenter's Notes

Title: Sequence Alignment


1
Sequence Alignment
2
Sequence Alignment
  • A procedure of comparing two or more sequences by
    searching for a series of individual characters
    or character patterns that are in the same order
    in the sequences being compared.
  • Two sequences are said to be aligned by writing
    them across in two rows
  • Identical (or similar) characters are matches,
    and non-identical characters are mismatches.
  • Gaps can be introduced in either (or both)
    sequences to produce a better alignment.

3
Sequence alignment between two zinc finger
protein sequences
Colors Red Small (small hydrophobic (incl.
aromatic -Y)) - AVFPMILW Blue Acidic Magenta
Basic - RHK Green Hydroxyl Amine Basic Q -
STYHCNGQ Gray Others Symbols Identical
Conserved substitutions (Same color group) .
Semi-conserved substitution
Example from Wikipedia, http//en.wikipedia.org/wi
ki/Sequence_alignment
4
Types of Sequence Alignments
  • Portion of sequences aligned
  • Global alignment - aligns sequences over their
    entire length
  • Dotmatrix, Needleman-Wunsch, ClustalW
  • Local alignment - determines the longest/best
    subsequence pair that gives maximum similarity
  • Smith-Waterman, BLAST
  • Number of sequences
  • Pairwise alignment - only two sequences compared
  • Dotmatrix, Needleman-Wunsch, Smith-Waterman,
    BLAST
  • Multiple alignment - multiple sequences compared
  • ClustalW, MEME

5
Dot Plot
  • Global, pairwise alignment method
  • Full visual comparison of two sequences
  • Gives a big picture a visual depiction of
    sequence relationship
  • Steps
  • Create a two-dimensional matrix placing the
    N-terminal end (in the case of proteins) in the
    top-left corner
  • For each cell, a dot is placed in the position of
    the intersection if the row and column matches

6
Anatomy of a Dot Plot
Matrix, M, is a two-dimensional grid.
j entries
Sequence A
i entries
We move through M in row-wise fashion...M(i,j)
Sequence B
A cell is the intersection of a row, i, and a
column, j
7
Anatomy of a Dot Plot
In this example, identities were found at M(1,1),
M(2,2), M(3,3).
Connecting the dots, we can see a diagonal, the
identity diagonal
Because the sequences are the same, this is an
intrasequence comparison.
8
Dot Plots
This is an intrasequence comparison (inversion in
sequence A)
Note inversion in this portion of Sequence B
9
Dot Plot Patterns
Gaps dissimilarity
Displaced Main Diagonal
Main Diagonal
Similar, but not identical
An indel (insertion/deletion)
Displacement of main diagonal parallel to the
sequence with the insertion
10
Dot Plot Patterns
Repeated sequence
Non-self-dotplot (different sequences), tandem
duplication
Self-dotplot (same sequence), tandem duplication
ABCDEFGEFGHIJKLMNO
11
Dot Plot Patterns
Number of diags. Interval between diags.
Complex sequence expansion
Inversion (Transposition)
12
Dot Plot Patterns
?
Palindrome (Intrastrand)
5 GGCGG 3
Intrasequence comparison is method of choice for
characterizing internal repeats
Prev. examples from http//bioinformatics.weizman
n.ac.il/courses/BCG/lectures/02_pairwise/2.2method
s/01dotplots.html
13
Dot Matrices
protein sequences
DNA sequences
14
Random Matches in Dot Matrix
  • When comparing DNA sequences, random matches
    occur with probability 1/4
  • When comparing protein sequences, 1/20
  • Thus, for comparisons of protein coding DNA
    sequences, we should translate them to amino
    acids first

15
To Reduce Random Noise in Dot Matrix
  • Specify a window size, w
  • Look at w consecutive residues from each of the
    two sequences
  • Specify a stringency
  • Among the w pairs of residues, count how many
    pairs are match within the window

16
Simple Dot Matrix, Window Size 1
17
Window Size is 3
18
Window Size is 3 Stringency is 2
19
DNA Sequences
single residue identity
16 out of 23 identical
20
Protein Sequences
single residue identity
6 out of 23 identical
21
Two examples of dotplots
  • http//emboss.umdnj.edu
  • Dottup
  • http//www.isrec.isb-sib.ch/java/dotlet/Dotlet.htm
    l
  • Dotlet - Java-based, interactive
  • Rat cytochrome c
  • Protein NP_036971
  • retrieve the mRNA based on this protein
  • Genomic DNA NW_0476912934400-2934900
  • Human zinc finger
  • S52507
  • S52508

22
Two examples of dotplots
  • http//emboss.umdnj.edu
  • Dottup
  • http//www.isrec.isb-sib.ch/java/dotlet/Dotlet.htm
    l
  • Dotlet - Java-based, interactive
  • HOXB4
  • Chicken
  • NM_205294NW_001471737 5341000 - 5359000
  • HOXD8
  • Chicken
  • NM_205354NW_001471688 1907000 - 1910000

23
Dot Plot
  • Advantages
  • All possible matches of residues between two
    sequences are found
  • Good for finding direct and inverted repeats
  • Allows for fast visual inspection
  • Disadvantages
  • Random matches cause noise
  • Computer cannot visually detect diagonals
  • Diagonals can be missed by visual inspection
  • Unreasonable for large number of comparisons
  • Conclusions
  • For DNA Comparisons
  • Long windows, high stringencies
  • For Protein Comparisons
  • Use short windows and stringencies
  • For a short domain of partial similarity, use a
    longer window and a small stringency

24
Needleman-Wunsch
  • Global, pairwise alignment method
  • Uses a technique called dynamic programming
  • ie guaranteed to find the alignment giving the
    maximum score between two sequences
  • Works well for aligning sequences that are
    similar and roughly equal size
  • Steps
  • Construct a matrix similar to a dot-plot of the
    two sequences
  • Assign similarity scores to each cell in the
    matrix
  • Trace through the scores in the matrix to find
    the optimal path

25
Similarity Scores
  • A method of assigning a score of aligning two
    amino acids or two DNA bases to each other
  • Represented in a matrix similar to this

A G C T A 1 3 3 -3 G 3 1 3 -3 C 3 3
1 -3 T 3 3 3 1
26
Needleman-Wunsch Example
27
Needleman-Wunsch Example
28
Needleman-Wunsch Example
29
Smith-Waterman
  • Based on Needleman-Wunsch
  • Instead of looking at each sequence in its
    entirety, compare segments of all possible
    lengths and choose whichever optimizes the
    similarity measure
  • Assign negative score for a mismatch and a
    negative score based on introduction of
    insertion/deletion and length of insert/delete

30
Linear Scores
  • Match 2, Mismatch -1, Gap -2

G A A T T C C G T T A G G
A T _ C _ G _ _ A
  • Changing the size of the gap doesnt affect the
    score

G A A T T C C G T T A G G
A T _ _ C G _ _ A
31
Affine Gap Penalties
  • Match 2, Mismatch -1, Gap Opening -2, Gap
    Extension -1

G A A T T C C G T T A G G
A T _ C _ G _ _ A
  • Changing the size of the gap does affect the score

G A A T T C C G T T A G G
A T _ _ C G _ _ A
32
Affine Gap Penalties
  • Affine gap penalties provide incentive for the
    alignment algorithm to keep sequence together
    where possible rather than inserting large
    numbers of small gaps
  • Wk1 (1/3)k
  • Gap opening penalty 1 1/3
  • Gap extension penalty 1/3 length of gap

33
Illustration of Dynamic Programming
34
Intuition of Dynamic Programming
If we already have the optimal solution
to XY AB then we know the next pair of
characters will either be XYZ or XY-
or XYZ ABC ABC AB- (where - indicates a
gap). So we can extend the match by determining
which of these has the highest score.
35
Illustration of Gotohs Algorithm
36
Gotoh
  • Local, pairwise alignment method
  • Uses a technique called dynamic programming
  • ie guaranteed to find the alignment giving the
    maximum score between two sequences
  • De-facto standard for performing local alignments
  • Steps
  • Construct a matrix similar to a dot-plot of two
    sequences
  • Assign similarity scores to each cell in the
    matrix
  • Trace through the scores in the matrix to find
    the optimal path

37
Example match 1, mismatch -1, gap -1
38
Example match 1, mismatch -1, gap -1
39
Example match 1, mismatch -1, gap -1
40
Example match 1, mismatch -1, gap -1
41
Example match 1, mismatch -1, gap -1
42
Example match 1, mismatch -1, gap -1
43
Example match 1, mismatch -1, gap -1
44
Example match 1, mismatch -1, gap -1
Write a Comment
User Comments (0)
About PowerShow.com