Dayhoff - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Dayhoff

Description:

Brands of Soup Revisited. Brand A. Brand B. P(B|A) = 2/7. P(A|B) = 2/7. Brands of Soup Revisited. Brand A. Brand B. P(B|A) = p = 2/7. P(A|B) = p = 2/7 ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 22
Provided by: stevensc
Category:

less

Transcript and Presenter's Notes

Title: Dayhoff


1
Dayhoffs Markov Modelof Evolution
2
Brands of Soup Revisited
P(BA) 2/7
Brand A
Brand B
P(AB) 2/7
3
Brands of Soup Revisited
Transition Diagram
P(BA) p 2/7
Brand A
Brand B
P(AB) p 2/7
Conditional Probability Formulas
P(Ak) P(Ak-1) (1-p)P(Bk-1 ) p 5/7 P(Ak-1)
2/7 P(Bk-1)
P(Bk) P(Ak-1 ) p P(Bk-1) (1-p) 2/7 P(Ak-1)
5/7 P(Bk-1)
4
Brands of Soup Revisited
Transition Diagram
P(BA) p 2/7
Brand A
Brand B
P(AB) p 2/7
Conditional Probability Formulas
P(Ak) P(Ak-1) (1-p)P(Bk-1 ) p 5/7 P(Ak-1)
2/7 P(Bk-1)
P(Bk) P(Ak-1 ) p P(Bk-1) (1-p) 2/7 P(Ak-1)
5/7 P(Bk-1)
Matrix Representation
5
Brands of Soup Revisited
Transition Diagram
P(BA) p 2/7
Brand A
Brand B
P(AB) p 2/7
Conditional Probability Formulas
P(Ak) P(Ak-1) (1-p)P(Bk-1 ) p 5/7 P(Ak-1)
2/7 P(Bk-1)
P(Bk) P(Ak-1 ) p P(Bk-1) (1-p) 2/7 P(Ak-1)
5/7 P(Bk-1)
Matrix Representation
6
Brands of Soup Revisited
Transition Diagram
P(BA) p 2/7
Brand A
Brand B
P(AB) p 2/7
Conditional Probability Formulas
P(Ak) P(Ak-1) (1-p)P(Bk-1 ) p 5/7 P(Ak-1)
2/7 P(Bk-1)
P(Bk) P(Ak-1 ) p P(Bk-1) (1-p) 2/7 P(Ak-1)
5/7 P(Bk-1)
Matrix Representation
7
Brands of Soup Revisited
Transition Diagram
P(BA) p 2/7
Brand A
Brand B
P(AB) p 2/7
Conditional Probability Formulas
P(Ak) P(Ak-1) (1-p)P(Bk-1 ) p 5/7 P(Ak-1)
2/7 P(Bk-1)
P(Bk) P(Ak-1 ) p P(Bk-1) (1-p) 2/7 P(Ak-1)
5/7 P(Bk-1)
Matrix Representation
8
Markov Processes Can Be Represented by Matrices
1/2
e.g., a 3-state process
1/3
1/4
Can be represented with this matrix
9
Each Step Involves an Inner Product
10
Each Step Involves an Inner Product
11
Markov Matrix Properties
  • Sum of probabilities in a row must be 1
  • No change diagonal matrix
  • If well-behaved, multiplying the matrix by
    itself many times converges to a limit
  • This limit matrix has identical column elements
  • The rows of the limit matrix are the equilibrium
    probabilities for the process

(1) Every state can transition to every other
state at least indirectly, and (2) the least
common denominator of any cycle in the transition
diagram is 1
12
Ask Mathematica!
Recall m
13
Margaret Dayhoff
  • Had a large (for 1978) database of related
    proteins
  • Asked what is the probability that two aligned
    sequences are related by evolution?

DAYHOFF, M. O., R. M. SCHWARTZ, and B. C. ORCUTT.
1978. A model of evolutionary change in
proteins. (pp 345-352 in M. 0. DAYHOFF, ed. Atlas
of protein sequence and structure. Vol. 5, Suppl.
3. National Biomedical Research Foundation,
Washington, D.C.)
14
Dayhoff Model
  • Amino acids change over time independently of
    their position in a protein. (simplifying
    assumption)
  • The probability of a substitution depends only on
    the amino acids involved and not on the prior
    history (Markov model).

15
A Sequence Alignment
(Example alignment from a BLAST search)
gtgi1173266spP44374RS5_HAEIN 30S ribosomal
protein S5 Length 166 Score 263
bits (672), Expect 1e-70 Identities 154/166
(92), Positives 159/166 (95) Query 1
MAHIEKQAGELQEKLIAVNRVSKTVKGGRIFSFTALTVVGDGNGRVGFGY
GKAREVPAAI 60 MIEKQ
GELQEKLIAVNRVSKTVKGGRI SFTALTVVGDGNGRVGFGYGKAREVPA
AI Sbjct 1 MSNIEKQVGELQEKLIAVNRVSKTVKGGRIMSFTAL
TVVGDGNGRVGFGYGKAREVPAAI 60 Query 61
QKAMEKARRNMINVALNNGTLQHPVKGVHTGSRVFMQPASEGTGIIAGGA
MRAVLEVAGV 120 QKAMEKARRNMINVALN
GTLQHPVKGVHTGSRVFMQPASEGTGIIAGGAMRAVLEVAGV Sbjct
61 QKAMEKARRNMINVALNEGTLQHPVKGVHTGSRVFMQPASEGTGII
AGGAMRAVLEVAGV 120 Query 121 HNVLAKAYGSTNPINVVRA
TIDGLENMNSPEMVAAKRGKSVEEILG 166
NVLKAYGSTNPINVVRATID L NM SPEMVAAKRGKVEILG Sbjc
t 121 RNVLSKAYGSTNPINVVRATIDALANMKSPEMVAAKRGKTVDE
ILG 166
16
Observed Substitution Frequencies
A                                      
R 30                                    
N 109 17                                  
D 154 0 532                                
C 33 10 0 0                              
Q 93 120 50 76 0                            
E 266 0 94 831 0 422                          
G 579 10 156 162 10 30 112                        
H 21 103 226 43 10 243 23 10                      
I 66 30 36 13 17 8 35 0 3                    
L 95 17 37 0 0 75 15 17 40 253                  
K 57 477 322 85 0 147 104 60 23 43 39                
M 29 17 0 0 0 20 7 7 0 57 207 90              
F 20 7 7 0 0 0 0 17 20 90 167 0 17            
P 345 67 27 10 10 93 40 49 50 7 43 43 4 7          
S 772 137 432 98 117 47 86 450 26 20 32 168 20 40 269        
T 590 20 169 57 10 37 31 50 14 129 52 200 28 10 73 696      
W 0 27 3 0 0 0 0 0 3 0 13 0 0 10 0 17 0    
Y 20 3 36 0 30 0 10 0 40 13 23 10 0 260 0 22 23 6  
V 365 20 13 17 33 27 37 97 30 661 303 17 77 10 50 43 186 0 17
  A R N D C Q E G H I L K M F P S T W Y
17
Building a Markov Model
  • From the observed substitution data, Dayhoff et
    al. were able to estimate the joint probabilities
    of two amino acids substituting for eachother.
    This yields a big, diagonally symmetric matrix of
    probabilities. The diagonal elements Mab are
    close to 1.
  • But the matrix of joint probabilities, P(bna)
    does not represent a Markov process. Recall the
    elements of a Markov process matrix are
    conditional probabilities, P(ba) P(bna) /
    P(a). P(a) is just the probability (frequency) of
    an amino acid, so each column in Mab is divided
    by the frequency of the corresponding amino acid.
    The diagonal elements are still all close to 1.
  • Dayhoff then adjusts the small non-diagonal
    elements by a common factor that makes the
    expected number of amino acid substitutions equal
    to 1 in 100. The diagonal elements are then
    adjusted to make each row add up to 1 as required
    by the law of total probability.
  • This is the PAM1 Markov matrix (PAM Point
    Accepted Mutation 1 1 substitution
    frequency).

18
Using the PAM Model
  • The PAM1 Markov matrix can be multiplied by
    itself to yield the PAM2 Markov matrix, and again
    to yield the PAM3 matrix, etc. PAM1 is a unit of
    evolutionary distance.
  • PAM250 is commonly used. Note that 250 of the
    amino acids have not been substituted its more
    like 80.
  • The PAM Markov Matrices arrived at by matrix
    multiplication need to be converted into the
    scoring matrices that one would use for BLAST or
    CLUSTALW.

19
Probability of an Alignment
In a random model, the probability of the
independent alignment of two proteins x and y is
the product of the probabilities qa for all
the amino acids.
(Note that the qi are not all the same value
of 1/20.)
In a match model, the proteins have descended
from a common ancestor protein and the amino acid
sequences are no longer independent. In this
model, the probability can be expressed as a
matrix of joint probabilities pab
(Note that the pij pji because neither protein
is first.)
Dayhoff and coworkers could estimate these
probabilities from the frequencies of amino acid
substitutions she observed in her database of
evolutionarily related proteins.
20
A Log-Odds Score
We are interested in the ratio of the match
model probability of alignment to the random
model probability
In practice, we usually take the log of these
quantities for a substitution scoring matrix.
This changes the multiplications into additions
and reduces round-off error.
S(a,b) defines the number you usually see in a
substitution matrix. These numbers are usually
rounded to integers to ease computation.
21
Questions?
  • I will post a Mathematica notebook.
Write a Comment
User Comments (0)
About PowerShow.com