Sequence Alignments - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Sequence Alignments

Description:

Alignment between two or more nucleotide or amino acid sequences ... Example: Bos taurus and porcine myoglobin mRNA sequences (sequences on course website) ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 25
Provided by: cclo
Category:

less

Transcript and Presenter's Notes

Title: Sequence Alignments


1
Sequence Alignments
  • BIOL/CHEM 3100

2
Reading
  • Chapter 2 in your textbook

3
Sequence Alignments
  • Alignment between two or more nucleotide or amino
    acid sequences
  • Similarity between sequences
  • What can this tell you?
  • In this chapter
  • How do we align two or more sequences?
  • How do we evaluate these alignments?
  • What conclusions can we make based on these
    alignmets?

4
Dot Plots
  • Used to visualize regions of similarity
  • One sequence placed on the x-axis, the other on
    the y-axis
  • Dots are placed in the plot where the two
    sequences are identical
  • Diagonal lines in plot indicate regions of
    similarity
  • Example compare ATCG to GATC
  • Advantages easy, quick
  • Disadvantages only gives regions of similarity,
    not actual alignment
  • What would plot look like with longer sequences?

5
Noise in Dot Plots
  • Control by adjusting the following
  • Window size
  • Similarity cutoff
  • Removing too much noise might conceal small
    region of similarity
  • Example GCTAGTCAGA and GATGGTCACA

Complete this plot!
Window of 1 Similarity cutoff of 1
Window of 4 Similarity cutoff of 3
6
Dot Plots in Excel
7
Try the DotPlot Program
  • Download the program from this link
  • It will automatically save the program and
    several files to your desktop
  • Open DotPlot application
  • Load sequences as FASTA text files
  • File, Open Horizontal, Browse
  • File, Open Vertical, Browse
  • Parameters menu changes length and cutoff
  • Draw, Identities shows plot
  • Clear screen when change parameters to visualize
  • Example Bos taurus and porcine myoglobin mRNA
    sequences (sequences on course website)

8
Simple Alignments
  • Molecular changes occur when organisms evolve
  • Mutation
  • Most common
  • Insertion
  • Deletion
  • Gaps in alignments
  • Added to account for insertions/deletions
  • Goal to obtain optimal alignment
  • Most likely to represent the true relationship
    between homologous sequences
  • Consider the following sequences AATCTATA and
    AAGATA
  • Either 2 insertions in first sequence or 2
    deletions in second sequence
  • What is the optimal alignment?

9
  • If no gaps allowed, there are three ways the
    sequences can be aligned
  • AATCTATA AATCTATA AATCTATA
  • AAGATA AAGATA AAGATA
  • Which alignment is optimal?
  • Scoring alignments
  • Match score credit for identical aligned pair
  • Mismatch score penalty for nonidentical
    residues
  • Total score sum of match and mismatch scores
  • Higher score better alignment

10
  • If gaps are allowed, there are many more ways the
    sequences can be aligned
  • Three examples
  • AATCTATA AATCTATA AATCTATA
  • AAG-AT-A AA-G-ATA AA--GATA
  • Scoring must now account for gaps
  • Gap penalty penalty for each residue aligned
    with
  • Total score match mismatch gap penalty

11
  • If match 1, mismatch 0, and gap penalty -1,
    what are the scores for these three alignments?
  • AATCTATA AATCTATA AATCTATA
  • AAG-AT-A AA-G-ATA AA--GATA

12
Gap Penalties
  • Is it more likely to have one longer
    insertion/deletion, or multiple smaller ones?
  • Two types of gap penalties
  • Length penalty
  • Penalty for each residue aligned with -
  • Origination penalty
  • Penalty for presence of a gap
  • Allows differentiation between alignments with
    many short gaps and those with fewer, longer gaps
  • Further penalizes for rare insertion/deletion
    (indel) events

13
  • If match 1, mismatch 0, length penalty -1,
    and origination penalty -2, what are the scores
    for these three alignments?
  • AATCTATA AATCTATA AATCTATA
  • AAG-AT-A AA-G-ATA AA--GATA

14
Terminal Gaps
  • Might not actually be indels
  • Data could be incomplete
  • Sometimes ignored in scoring
  • AATCTATAGC
  • AAG--ATA--

15
Mismatch Penalties
  • Different mismatch scores depending on particular
    nucleotide or amino acid that is mismatched
  • Reward mismatches that are more likely to occur
    (common substitutions)
  • Nucleotides
  • Purine vs. pyrimidine
  • Transitions vs. transversions

16
Scoring Matrices
  • Show scores for all non-gap positions in
    alignment
  • For nucleotide sequences

Identity (Sparse)
BLAST
Transition/transversion
17
Matrices for Proteins
  • Amino acids
  • 1. Structure and properties
  • Substitution of similar AAs
  • more likely to retain protein function
    (conservative substitution)
  • 2. Genetic code
  • Minimum number of nucleotide substitutions needed
    to convert a codon

18
Matrices for Proteins
  • 3. Actual observed substitution rates
  • Point accepted mutation (PAM)
  • Alignment constructed with high similarity (gt85)
  • Calculate relative mutability (mj)
  • Number of times one amino acid (j) is substituted
    by any other
  • Calculate specific substitution (Aij)
  • Number of times j is substituted by a specific
    amino acid i
  • See Box 2.1 (page 40)

19
PAM Example
  • Ambiguities
  • X ambiguous amino acid
  • B Asn or Asp
  • Z Gln or Glu
  • Some algorithms take ambiguities into account and
    score some count them as identical others
    ignore them
  • If the sequence has lots of ambiguities scores
    may not be reliable with certain types of software
  • Identical amino acids highest score
  • Conservative substitution next highest score
  • Non-conservative substitution lowest score

20
PAM Matrices
  • Pam matrix is normalized to represent
    substitution over a fixed period of evolutionary
    change
  • PAM-1
  • 1 substitution per 100 residues
  • Matrix represents probability of AA substitution
    in time it takes for 1 of all residues to be
    substituted
  • Used to compare sequences that are closely
    related
  • PAM-1000
  • Used for sequences with distant relationships
  • PAM-250
  • Commonly used middle ground

21
BLOSUM Matrix
  • Also derived from observing substitution rates in
    proteins
  • Looks at clusters of amino acids sequences
  • Lower numbered matrices used for more distantly
    related sequences
  • BLOSUM-45 vs. BLOSUM-80
  • BLOSUM-62 is default

22
PAM and BLOSUM
BLOSUM 80
BLOSUM 62
BLOSUM 45
PAM 1
PAM 250
PAM 1000
More Divergent
Less Divergent
23
Types of Scores
  • Raw Score
  • Protein and nucleotide alignments
  • Sum the scores for matches, mismatches, and gaps
  • Percent identities
  • Protein and nucleotide alignments
  • Ratio of residues that match up in both sequences
    to total number of residues compared
  • Percent positives
  • Protein alignments only
  • Matrix values 1 are called positives
  • Ratio of positive values to total number of
    residues compared

24
An Example
  • Alignment of mouse and crayfish trypsin
  • Raw score
  • Identities
  • Positives

Mouse I V G G Y N C E E N S V P Y
Q 5 4 5 5 -3 2 -2 2 3 0 0 -1 6
10 4 Crayfish I V G G T D A V L G E
F P Y Q
Write a Comment
User Comments (0)
About PowerShow.com