Alignment methods - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Alignment methods

Description:

Learning objectives- Understand difference between identity, similarity and homology. ... Analogous to railroad car and diner function. Orthologs vs Paralogs ... – PowerPoint PPT presentation

Number of Views:124
Avg rating:3.0/5.0
Slides: 35
Provided by: jmom
Category:

less

Transcript and Presenter's Notes

Title: Alignment methods


1
Alignment methods
  • April 17, 2007
  • Quiz 1Question on databases
  • Learning objectives- Understand difference
    between identity, similarity and homology.
    Understand how PAM scoring matrices. Understand
    difference between global alignment and local
    alignment. Knowledge of Dotter software program.
  • Workshop-Import sequences of interest from
    GenBank, place in FASTA format, align sequences
    using DOTTER program.
  • Homework 4 due on Tues, April 24 at the
    beginning of class.

2
Purpose of finding differences and similarities
of amino acids in two proteins.
  • Infer structural information
  • Infer functional information
  • Infer evolutionary relationships

3
Evolutionary Basis of Sequence Alignment
  • Similarity Quantity that relates how much
  • two amino acid sequences are alike.
  • 2. Identity Quantity that describes how much
  • two sequences are alike in the strictest terms.
  • 3. Homology a conclusion drawn from data
  • suggesting that two genes share a common
  • evolutionary history.

4
Evolutionary Basis of Sequence Alignment (Cont. 1)
Why are there regions of identity? 1) Conserved
function-residues participate in reaction. 2)
Structural (For example, conserved cysteine
residues that form a disulfide linkage) 3)
Historical-Residues that are conserved solely due
to a common ancestor gene.
5
Identity Matrix
1
A
1
0
C
1
0
0
I
1
0
0
0
L
L
I
C
A
Simplest type of scoring matrix
6
Similarity
It is easy to score if an amino acid is identical
to another (the score is 1 if identical and 0 if
not). However, it is not easy to give a score
for amino acids that are somewhat similar.
CO2-
CO2-
NH3
NH3
Isoleucine
Leucine
Should they get a 0 (non-identical) or a 1
(identical) or Something in between?
7
One is mouse trypsin and the other is crayfish
trypsin. They are homologous proteins. The
sequences share 41 identity.
8
(No Transcript)
9
Evolutionary Basis of Sequence Alignment (Cont. 2)
Note it is possible that two proteins share a
high degree of similarity but have two different
functions. For example, human gamma-crystallin
is a lens protein that has no known enzymatic
activity. It shares a high percentage of
identity with E. coli quinone oxidoreductase.
These proteins likely had a common ancestor but
their functions diverged.
Analogous to railroad car and diner function.
10
Orthologs vs Paralogs
  • Two proteins that have a common ancestor that
    exist in different species are said to be
    orthologs.
  • Two proteins with a common ancestor that exist in
    the same species are said to be paralogs.

11
Modular nature of proteins
  • The previous alignment was global. However, many
    proteins do not display global patterns of
    similarity. Instead, they possess local regions
    of similarity.
  • Proteins can be thought of as assemblies of
    modular domains. It is thought that this may, in
    some cases, be due to an evolutionary process
    known as exon shuffling.

12
Modular nature of proteins (cont. 1)
Exon 1a
Exon 2a
Gene A
Duplication of Exon 2a
Exon 1a
Gene A
Exon 2a
Exon 2a
Exchange with Gene B
Exon 1b
Gene B
Exon 2b
Exon 2b
Exon 3 (Exon 2b from Gene B)
Exon 2a
Exon 1a
Gene A
Gene B
Exon 1b
Exon 3 (Exon 2a from Gene A)
Exon 2b
13
Scoring Matrices
  • Importance of scoring matrices
  • Scoring matrices appear in all analyses involving
    sequence comparisons.
  • The choice of matrix can strongly influence the
    outcome of the analysis.
  • Scoring matrices implicitly represent a
    particular theory of relationships.
  • Understanding theories underlying a given scoring
    matrix can aid in making proper choice of which
    matrix to use.

14
Scoring Matrices
  • When we consider scoring matrices, we encounter
    the convention that matrices have numeric indices
    corresponding to the rows and columns of the
    matrix.
  • For example, M11 refers to the entry at the
    first row and the first column. In general, Mij
    refers to the entry at the ith row and the jth
    column. To use this for sequence alignment, we
    simply associate a numeric value to each letter
    in the alphabet of the sequence.

15
Two major scoring matrices for amino acid
sequence comparisons
  • PAM-derived from sequences known to be closely
    related (Eg. Proteins from chimpanzees and
    human). PAM1 was created from empirical data and
    other PAMs were mathematically derived.
  • BLOSUM-derived from sequences not closely related
    (Eg. E. coli and human) from data stored in the
    BLOCKS database.

16
The Point-Accepted-Mutation (PAM) model of
evolution and the PAM scoring matrix
  • Started by Margaret Dayhoff, 1978
  • A series of matrices describing the extent to
    which two amino acids have changed during
    evolution.
  • Proteins were aligned by eye and then the number
    of times an amino acid was substituted in
    different species was counted.

17
Protein families used to construct Dayhoffs
scoring matrix
  • Protein PAMs per 100 mil yrs
  • IgG kappa C region 37
  • Kappa casein 33
  • Serum Albumin 26
  • Cytochrome C 0.9
  • Histone H3 0.14
  • Histone H4 0.10

18
Numbers of accepted point mutations, multiplied
by 10
A R N D C Q E G H I L K
M F P S T W Y V A R 30N 109 17D 154
0 532C 33 10 0 0Q 93 120 50 76 0E
266 0 94 831 0 422G 579 10 156 162 10 30
112H 21 103 226 43 10 243 23 10I 66 30
36 13 17 8 35 0 3L 95 17 37 0 0
75 15 17 40 253K 57 477 322 85 0 147 104
60 23 43 39M 29 17 0 0 0 20 7 7
0 57 207 90F 20 7 7 0 0 0 0 17
20 90 167 0 17P 345 67 27 10 10 93 40
49 50 7 43 43 4 7S 772 137 432 98 117
47 86 450 26 20 32 168 20 40 269T 590 20
169 57 10 37 31 50 14 129 52 200 28 10
73 696W 0 27 3 0 0 0 0 0 3 0
13 0 0 10 0 17 0Y 20 3 36 0 30
0 10 0 40 13 23 10 0 260 0 22 23
6V 365 20 13 17 33 27 37 97 30 661 303
17 77 10 50 43 186 0 17
Original amino acid
Replacement amino acid
19
Calculation of relative mutability of amino acid
  • Find frequency of amino acid change to another
    amino acid at a certain position in protein.
  • Divide the frequency of aa change by the
    frequency that the j (original) aa occurs in
    all proteins studied. This is called the
    mutability.
  • Determine the constant to multiply the alanine
    mutability to get 100.
  • Multiply the 19 other a.a. mutabilities by the
    same constant. This is called the relative
    mutability.

20
Relative mutabilities of amino acids
  • Asn 134
  • Ser 120
  • Asp 106
  • Glu 102
  • Ala 100
  • Thr 97
  • Ile 96
  • Met 94
  • Gln 93
  • Val 74
  • His 66
  • Arg 65
  • Lys 56
  • Pro 56
  • Gly 49
  • Tyr 41
  • Phe 41
  • Leu 40
  • Cys 20
  • Trp 18

21
Why are the mutabilities different?
  • High mutabilities because a similar amino acid
    can replace it. (Asp for Glu)
  • Conversely, the low mutabilities are unique,
    cant be replaced.

22
Creation of a mutation probability matrix
  • Used accepted mutation data from earlier slide
    and the mutability of each amino acid in nature
    to create a mutation probability matrix.
  • Mij shows the probability that an original amino
    acid j (in columns) will be replaced by amino
    acid i (in rows) over a defined evolutionary
    interval. For PAM1, 1 of aas have been changed.

23
PAM1 mutational probability matrix
. . .
Values of each column will sum to 10,000
24
The Point-Accepted-Mutation (PAM) model of
evolution and the PAM scoring matrix
A 1-PAM unit is equivalent to 1 mutation found in
a stretch of 2 sequences each containing 100
amino acids that are aligned Example 1
..CNGTTDQVDKIVKILNEGQIASTDVVEVVVSPPYVFLPVVKSQLRPE
IQV..
..CNGTTDQVDKIVKIRNEGQIASTDVVEVVVSPPYVFLPV
VKSQLRPEIQV.. length 100, 1 Mismatch, PAM
distance 1 A k-PAM unit is equivalent to k
1-PAM units (or Mk).
25
The Point-Accepted-Mutation (PAM) model of
evolution and the PAM scoring matrix
Observed Difference
Evolutionary Distance In PAMs
1 5 10 20 40 50 60 70 80
1 5 11 23 56 80 112 159 246
26
Final Scoring Matrix is the Log-Odds Scoring
Matrix
  • S (a,b) 10 log10(Mab/Pb)

Replacement amino acid
Original amino acid
Frequency of amino acid b
Mutational probability matrix number
27
(No Transcript)
28
Summary of PAM Scoring Matrix
  • PAM a unit of evolution (1 PAM 1 point
    mutation/100 amino acids)
  • Accepted Mutation means fixed point mutation
  • Comparison of 71 groups of closely related
    proteins yielding 1,572 changes. (gt85 identity)
  • Different PAM matrices are derived from the PAM 1
    matrix by matrix multiplication.
  • The matrices are converted to log odds scoring
    matrices. (Frequency of change divided by
    probability of chance alignment converted to log
    base 10.)
  • A PAM 250 matrix is roughly equivalent to 20
    identity in two sequences.

29
The Dotter Program
  • Program consists of three components
  • Sliding window
  • A table that gives a score for each amino acid
    match
  • A graph that converts the score to a dot of
    certain density.
  • The higher the density the higher the score.

30
Two proteins that are similar in certain regions
Tissue plasminogen activator (PLAT) Coagulation
factor 12 (F12).
31
Region of similarity
32
(No Transcript)
33
FASTA format
gtgi1244762gbAAA98563.1 p53 tumor suppressor
homolog MSQGTSPNSQETFNLLWDSLEQVTANEYTQIHERGVGYEYHE
AEPDQTSLEISAYRIAQPDPYGRSESYD LLNPIINQIPAPMPIADTQNN
PLVNHCPYEDMPVSSTPYSPHDHVQSPQPSVPSNIKYPGEYVFEMSFAQ
PSKETKSTTWTYSEKLDKLYVRMATTCPVRFKTARPPPSGCQIRAMPIYM
KPEHVQEVVKRCPNHATAKE HNEKHPAPLHIVRCEHKLAKYHEDKYSGR
QSVLIPHEMPQAGSEWVVNLYQFMCLGSCVGGPNRRPIQLV FTLEKDNQ
VLGRRAVEVRICACPGRDRKADEKASLVSKPPSPKKNGFPQRSLVLTNDI
TKITPKKRKIDD ECFTLKVRGRENYEILCKLRDIMELAARIPEAERLLY
KQERQAPIGRLTSLPSSSSNGSQDGSRSSTAFS TSDSSQVNSSQNNTQM
VNGQVPHEEETPVTKCEPTENTIAQWLTKLGLQAYIDNFQQKGLHNMFQL
DEFT LEDLQSMRIGTGHRNKIWKSLLDYRRLLSSGTESQALQHAASNAS
TLSVGSQNSYCPGFYEVTRYTYKHT ISYL
34
Workshop 3
Write a Comment
User Comments (0)
About PowerShow.com