Title: Scoring Matrices
1Scoring Matrices
2Diff. Scoring Rules Lead to Diff. Alignments
- Example Score
- 5 x ( matches) (-4) x ( mismatches)
- (-7) x (total length of all gaps)
- Example Score
- 5 x ( matches) (-4) x ( mismatches)
- (-5) x ( gap openings) (-2) x (total
length of all gaps)
3(No Transcript)
4(No Transcript)
5Scoring Rules/Matrices
- Why are they important?
- The choice of a scoring rule can strongly
influence the outcome of sequence analysis - What do they mean?
- Scoring matrices implicitly represent a
particular theory of evolution - Elements of the matrices specify the similarity
of one residue to another
6The Sij in a Scoring Matrix (as log likelihood
ratio)
7- The alignment score of aligning two sequences is
the log likelihood ratio of the alignment under
two models - Common ancestry
- By chance
8Likelihood Ratio for Aligning a Single Pair of
Residues
- Above the probability that two residues are
aligned by evolutionary descent - Below the probability that they are aligned by
chance - Pi, Pj are frequencies of residue i and j in all
sequences (abundance)
9Likelihood Ratio of Aligning Two Sequences
10 PAM Accepted Mutations1500 changes in 71
groups w/ gt 85 similarity BLOSUM Blocks
Substitution Matrix2000 blocks from 500
families
Two classes of widely used protein scoring
matrices
11- PAM and BLOSUM matrices are all log likelihood
matrices - More specifically
- An alignment that scores 6 means that the
alignment by common ancestry is 2(6/2)8 times
as likely as expected by chance.
12Constructing BLOSUM Matrices
- Blocks Substitution Matrices
13BLOSUM Matrices of Specific Similarities
- Sequences with above a threshold similarity are
clustered. - If clustering threshold is 62, final matrix is
BLOSUM62
14- A toy example of constructing a BLOSUM matrix
from 4 training sequences
15Constructing a BLOSUM matr.1. Counting mutations
162. Tallying mutation frequencies
173. Matrix of mutation probs.
184. Calculate abundance of each residue (Marginal
prob)
195. Obtaining a BLOSUM matrix
20- Constructing the real BLOSUM62 Matrix
211.2.3.Mutation Frequency Table
224. Calculate Amino Acid Abundance
235. Obtaining BLOSUM62 Matrix
24(No Transcript)
25BLOSUM matrices reference
- S. Henikoff and J. Henikoff (1992). Amino acid
substitution matrices from protein blocks. PNAS
89 10915-10919 - Training Data 2000 conserved blocks from BLOCKS
database. Ungapped, aligned protein segments.
Each block represents a conserved region of a
protein family
26Break
27PAM Matrices (Point Accepted Mutations)
- Mutations accepted by natural selection
28Constructing PAM Matrix Training Data
29PAM Phylogenetic Tree
30PAM Accepted Point Mutation
31Mutability of Residue j
32Total Mutation Rate
is the total mutation rate of all amino acids
33Normalize Total Mutation Rate to 1
This defines an evolutionary period the period
during which the 1 of all sequences are mutated
(accepted of course)
34Mutation Probability Matrix Normalized Such that
the Total Mutation Rate is 1
35Mutation Probability Matrix (transposed) M10000
36-- PAM1 mutation prob. matr. --
PAM2 Mutation Probability Matrix? -- Mutations
that happen in twice the evolution period of that
for a PAM1
37PAM Matrix Assumptions
38In two PAM1 periods
- A?R A?A and A?R or
- A?N and N?R or
- A?D and D?R or
- or
- A?V and V?R
39Entries in a PAM-2 Mut. Prob. Matr.
40PAM-k Mutation Prob. Matrix
41PAM-k log-likelihood matrix
42PAM-250
43- PAM6060, PAM8050,
- PAM12040
- PAM-250 matrix provides a better scoring
alignment than lower-numbered PAM matrices for
proteins of 14-27 similarity
44PAM Matrices Reference
- Atlas of Protein Sequence and Structure,
- Suppl 3, 1978, M.O. Dayhoff.
- ed. National Biomedical Research Foundation,
1
45Choice of Scoring Matrix
46Comparing Scoring Matrix
- PAM
- Based on extrapolation of a small evol. Period
- Track evolutionary origins
- Homologous seq.s during evolution
- BLOSUM
- Based on a range of evol. Periods
- Conserved blocks
- Find conserved domains
47Sources of Error in PAM