Title: Multivariate Analysis of Protein Polymorphism (MAPP)
1Multivariate Analysis of Protein
Polymorphism(MAPP)
- Purpose and Basics
- Algorithm Outline and Performance
- How to view MAPP results in ProPhylER
? All Content Arend Sidow / ProPhylER 2008
2Multivariate Analysis of Protein
Polymorphism(MAPP)
3The Impact of Amino Acid Variants
Amino Acid Change
Impaired Function of Variant
Phenotype
cSNP
When and Where the protein acts, Dosage, etc.
Protein Structure and Function
Deterministic
MAPPs prediction target How strongly does a
variant amino acid affect the proteins structure
or function?
4MAPP Concept
- MAPP addresses specific variants in single
positions of the protein sequence - More specifically, it uses the evolutionary
variation in single columns of the alignment for
predictions of the impact of all possible
variants on structure and function of the protein - In contrast to ESF, which considers averages of
neighboring sites, MAPP focuses on single sites
and single variants - Consider the variation in the two framed columns
on the left - Red-framed column has a lot of variation that
does not appear to be constrained in obvious ways - Blue-framed column has very little variation that
preserves a certain characteristic (small size of
side chain) - MAPP quantifies the intuition that there are
significant differences in constraint acting upon
the red and blue columns, and generates
predictions of functional impact of variants
5MAPP in ProPhylER
For each amino acid in each protein ...
... calculate an impact score from the observed
evolutionary variation. The impact score is
converted to a P-value that describes the
confidence that the variant is consistent with
structure or function of the protein. Low
P-values predict highly deleterious substitutions.
For each possible variant ...
6MAPP Methodology Rationale
- The observed variation is a sample that reflects
specific structural or functional constraints on
that position - MAPP quantifies these constraints by converting
the letter information in each column into
their corresponding physicochemical values - Key concept
- The conversion allows calculating the mean and
the variance in each column for each
physicochemical property - The variance is a statistical reflection of the
tolerated variation - The further a potential variant (polymorphism) is
outside of the variance of the observed data, the
more likely is it to be deleterious
I I I I I V
V V V A A A
A A A
A 1.8C 2.5D -3.5E -3.5F 2.8G -0.4H
-3.2I 4.5K -3.9L 3.8M 1.9N -3.5P
-1.6Q -3.5R -4.5S -0.8T -0.7V 4.2W
-0.9Y -1.3
7Multivariate Analysis of Protein
Polymorphism(MAPP)
- Algorithm Outline and Performance
Stone EA, Sidow A. Physicochemical constraint
violation by molecular missense substitutions
mediates impairment of protein function and
disease severity. Genome Res. 2005, Jul15978-986
8MAPP Methodology General Outline
- MAPP uses scales of six important physicochemical
properties - Hydropathy
- Polarity
- Charge
- Volume of side chain
- Free energy in alpha helical conformation
- Free energy in beta sheet conformation
- The property scales are standardized so the
values from different scales are comparable to
one another - MAPP also decorrelates the scales, which is
necessary because certain scales (such as
hydropathy and polarity) are correlated - MAPP generates impact scores for all possible
variants from the observed evolutionary variation - MAPP impact scores are converted to P-values,
which are displayed on the ProPhylER interface - The lower the P-value, the higher the chance that
the substitution will be deleterious for
structure or function of the protein
9MAPP Methodology Algorithm
(1) and (3) are ProPhylERs tree and alignment
weights (2, from Branch-Manager) are used to
calculate a column-specific summary of
physicochemical properties (4). (5) The mean
describes the average property, the variance
describes the degree of constraint for each
property. For each possible substitution in the
column, and for each phyisicochemical property,
MAPP generates a score. These scores are
combined in a way that decorrelates the
physicochemical properties (7). The scores are
then converted to P-values (not shown).
10Test Binary Predictions on Mutation Impact Data
11Prediction Accuracy for HIV Protease
1
99
sequence position
The amino acid in HIV-1 protease is in blue red
or green boxes show the experimentally tested
sequence variants. Correct MAPP predictions
(variant was reduced in activity, and was
correctly predicted to be so or variant
mutation was fully functional, and was correctly
predicted to be so) are in green. Incorrect MAPP
predictions are in red. This chart is for the
reduced activity accuracy described below, for
which the P-value cutoff was 0.01. For the
magnitude of the decrease in actitivy, the
P-value cutoff was 0.001.
Reduced activity (functional vs. reduced or
dead) 80.4 prediction accuracy Magnitude of
decrease (reduced versus dead) 76.3 prediction
accuracy
12Multivariate Analysis of Protein
Polymorphism(MAPP)
- How to view MAPP results in ProPhylER
13MAPP Track
For each position in the protein, for each
possible amino acid variant, the MAPP display
shows a color for the predicted deleteriousness.
Red is predicted to be deleterious with high
confidence, blue is unlikely to be deleterious,
intermediate colors range.
(Mousing over the fields will show the P-values.
Low P-values are strong predictions for
deleteriousness, high P-values mean the variant
is unlikely to be deleterious.)
14Physicochemical Property Importance
MAPP analyses also allow inference as to whether
a particular property is important in the given
alignment position.
but in regions of much evolutionary variation ..
For evolutionarily constrained positions ..
.. certain properties stand out as important
.. no property is important
(Shading is proportional to likely importance.
Mousing over the fields will show the P-values.
Low P-values are strong predictions for the
importance of the property, high P-values mean
the property is unlikely to be important.)