Basic terms: - PowerPoint PPT Presentation

About This Presentation
Title:

Basic terms:

Description:

Basic terms: Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Identity percentage Homology-specific term ... – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 39
Provided by: CarlJ154
Learn more at: https://udel.edu
Category:
Tags: alignment | basic | gene | terms

less

Transcript and Presenter's Notes

Title: Basic terms:


1
Basic terms
  • Similarity - measurable quantity.
  • Similarity- applied to proteins using concept of
    conservative substitutions
  • Identity
  • percentage
  • Homology-specific term indicating relationship by
    evolution

2
Basic terms
  • Orthologs homologous sequences found in two or
    more species, that have the same function (i.e.
    alpha- hemoglobin).

3
Basic terms
  • Orthologs homologous sequences found it two or
    more species, that have the same function (i.e.
    alpha- hemoglobin).
  • Paralogs homologous sequences found in the same
    species that arose by gene duplication. ( alpha
    and beta hemoglobin).

4
Pairwise comparison
  • Dotplot
  • All against all comparison.
  • Every position is compared with every other
    position.

5
Pairwise comparison
  • Dotplot
  • All against all comparison.
  • Every position is compared with every other
    position.
  • Nucleic acids and proteins have polarity.

6
Pairwise comparison
  • Dotplot
  • All against all comparison.
  • Every position is compared with every other
    position.
  • Nucleic acids and proteins have polarity.
  • Typically only one direction makes biological
    sense.

7
Pairwise comparison
  • Dotplot
  • All against all comparison.
  • Every position is compared with every other
    position.
  • Nucleic acids and proteins have polarity.
  • Typically only one direction makes biological
    sense.
  • 5 to 3 or amino terminus to carboxyl terminus.

8
Simple plot
  • Window size of sequence block used for
    comparison. In previous example
  • window 1
  • Stringency Number of matches required to score
    positive. In previous example
  • stringency 1 (required exact match)

9
DotPlot
WINDOW 4 STRINGENCY 2
GATCGTACCATGGAATCGTCCAGATCA
GATC
(4/4)
GATC
- (0/4)
GATC
- (0/4)
GATC
(2/4)
10
Dot Plot
  • Compare two sequences in every register.
  • Vary size of window and stringency depending upon
    sequences being compared.
  • For nucleotide sequences typically start with
    window 21 stringency 14
  • Protein - start with smaller window 3,
    stringency 1 or 2.
  • Important to test different stringencies.

11
Intergenic comparison
  • Nucleotide sequence contains three domains.
  • 50 - 350 - Strong conservation
  • Indel places comparison out of register
  • 450 - 1300 - Slightly weaker conservation
  • 1300 - 2400 - Strong conservation

12
Scoring Alignments
  • Quality Score
  • Score x for match, -y for mismatch

13
Scoring Alignments
  • Quality Score
  • Score x for match, -y for mismatch
  • Penalty for
  • Creating Gap
  • Extending a gap

14
Scoring Alignments
  • Quality Score
  • Quality 10(match)

15
Scoring Alignments
  • Quality Score
  • Quality 10(match) -1(mismatch)

16
Scoring Alignments
  • Quality Score
  • Quality 10(match) -1(mismatch) -
  • (Gap Creation Penalty)(of Gaps)

17
Scoring Alignments
  • Quality Score
  • Quality 10(match) -1(mismatch) -
  • (Gap Creation Penalty)(of Gaps) (Gap Ext.
    Pen.)(Total length of Gaps)
  • Scoring scheme incorporates an evolutionary
    model--

18
Scoring Alignments
  • Quality Score
  • Quality 10(match) -1(mismatch) -
  • (Gap Creation Penalty)(of Gaps) (Gap Ext.
    Pen.)(Total length of Gaps)
  • Scoring scheme incorporates an evolutionary
    model--
  • Matches are conserved

19
Scoring Alignments
  • Quality Score
  • Quality 10(match) -1(mismatch) -
  • (Gap Creation Penalty)(of Gaps) (Gap Ext.
    Pen.)(Total length of Gaps)
  • Scoring scheme incorporates an evolutionary
    model--
  • Matches are conserved
  • Mismatches are divergences

20
Scoring Alignments
  • Quality Score
  • Quality 10(match) -1(mismatch) -
  • (Gap Creation Penalty)(of Gaps) (Gap Ext.
    Pen.)(Total length of Gaps)
  • Scoring scheme incorporates an evolutionary
    model--
  • Matches are conserved
  • Mismatches are divergences
  • Gaps are more likely to disrupt function, hence
    greater penalty than mismatch.

21
Scoring Alignments
  • Quality Score
  • Quality 10(match) -1(mismatch) -
  • (Gap Creation Penalty)(of Gaps) (Gap Ext.
    Pen.)(Total length of Gaps)
  • Scoring scheme incorporates an evolutionary
    model--
  • Matches are conserved
  • Mismatches are divergences
  • Gaps are more likely to disrupt function, hence
    greater penalty than mismatch.
  • Introduction of a gap (indel) penalized more than
    extension of a gap.

22
Z Score (standardized score)
  • Z (Scorealignment - Average Scorerandom)

Standard Deviationrandom
23
  • Quality ScoreRandomization
  • Program takes sequence and randomizes it X times
    (user select).
  • Determines average quality score and standard
    deviation with randomized sequences
  • Compare randomized scores with Quality score to
    help determine if alignment is potentially
    significant.

24
Randomization
  • It has become clear that
  • Sequences appear to evolve in a word like
    fashion.
  • 26 letters of the alphabet--combined to make
    words.
  • Words actually communicate information.
  • Randomization should actually occur at the level
    of strings of nucleotides (2-4).

25
Global Alignment
  • Global - Compares all possible alignments of two
    sequences and presents the one with the greatest
    number of matches and the fewest gaps.

26
Global Alignment
  • Global - Compares all possible alignments of two
    sequences and presents the one with the greatest
    number of matches and the fewest gaps.
  • Alignment will run from one end of the longest
    sequence, to the other end.

27
Global Alignment
  • Global - Compares all possible alignments of two
    sequences and presents the one with the greatest
    number of matches and the fewest gaps.
  • Alignment will run from one end of the longest
    sequence, to the other end.
  • Best for closely related sequences.

28
Global Alignment
  • Global - Compares all possible alignments of two
    sequences and presents the one with the greatest
    number of matches and the fewest gaps.
  • Alignment will run from one end of the longest
    sequence, to the other end.
  • Best for closely related sequences.
  • Can miss short regions of strongly conserved
    sequence.

29
Local Alignment
  • Identifies segments of alignment with the highest
    possible score.

30
Local Alignment
  • Identifies segments of alignment with the highest
    possible score.
  • Align sequences, extends aligned regions in both
    directions until score falls to zero.

31
Local Alignment
  • Identifies segments of alignment with the highest
    possible score.
  • Align sequences, extends aligned regions in both
    directions until score falls to zero.
  • Best for comparing sequences whose relationship
    is unknown.

32
Global Alignment
Local Alignment
33
Blast 2
Basic Local Alignment Search Tool E (expect)
value number of hits expected by random chance
in a database of same size. Larger numerical
value lower significance HIV sequence
34
  • Both Global and Local alignment programs will
    (almost) always give a match.

35
  • Both Global and Local alignment programs will
    (almost) always give a match.
  • It is important to determine if the match is
    biologically relevant.

36
  • Both Global and Local alignment programs will
    (almost) always give a match.
  • It is important to determine if the match is
    biologically relevant.
  • Not necessarily relevant Low complexity regions.
  • Sequence repeats (glutamine runs)

37
  • Both Global and Local alignment programs will
    (almost) always give a match.
  • It is important to determine if the match is
    biologically relevant.
  • Not necessarily relevant Low complexity regions.
  • Sequence repeats (glutamine runs)
  • Transmembrane regions (high in hydrophobes)

38
  • Both Global and Local alignment programs will
    (almost) always give a match.
  • It is important to determine if the match is
    biologically relevant.
  • Not necessarily relevant Low complexity regions.
  • Sequence repeats (glutamine runs)
  • Transmembrane regions (high in hydrophobes)
  • If working with coding regions, you are typically
    better off comparing protein sequences. Greater
    information content.
Write a Comment
User Comments (0)
About PowerShow.com