Title: Bioinformatics 2 lecture 4
1Bioinformatics 2 -- lecture 4
Rotation and superposition Structure-based
alignment
2In class exercise Superimpose 2 molecules by hand
- Open two separate copies of RasMol
- Open PDB files 3shd.pdb, 1jzm.pdb (one in each
window) - Display first chain ("A" usually) using
restrict A - Display--gtCartoon
- Rotate to align the molecules to the same frame
of reference.
Do these pairs 3sdh.pdb 1jzm.pdb (hard) 3sdh.
pdb 1h97.pdb (harder) 3sdh.pdb 1phn.pdb (hardes
t)
3What happens when you move the mouse to rotate a
molecule?
Mouse sends mouse coordinates (?x,?y) to the
running program
1.
y
2.
Rotation angles are calculated?x ?xscale,
?y ?yscale
x
3.
Rotation matrices are calculated
4What happens when you move the mouse (cont'd)
4.
New atom coordinates are calculated
5.
The scene is rendered using the new coordinates.
All of this happens in a fraction of a second.
5Rotation is angular addition
atom starts at (xrcos?, yrsin?)
y
axis of rotation Cartesian origin
(x,y)
(x,y)
?
?
r
x
..rotates to... (x'rcos(??), y'rsin(??))
Convention angles are measured counter-clockwise.
6Sum of angles formula
cos (????? cos ??cos ????sin ??sin ?
sin (??????? sin ??cos ????sin ??cos ?
7A rotation matrix
y
x rcos ? y rsin ?
(x,y)
(x,y)
?
r
?
x
x' r cos (???? ??? r(cos ??cos ????sin
??sin ?? ??? (r cos ???cos ?????r sin ??sin
? ??? x?cos ????y sin ?
y' r sin (???? ??? r(sin ??cos ????sin
??cos ?? ??? (r sin ???cos ?????r cos ???sin
? ??? y?cos ????x sin ?
rotation matrix is the same for any r, any ?.
8A rotation around a principle axis
The Z coordinate stays the same. X and Y change.
Rz
The Y coordinate stays the same. X and Z change.
Ry
The X coordinate stays the same. Y and Z change.
Rx
9A 3D rotation matrix
Is the product of 2D rotation matrices.
Rotation around z-axis
Rotation around y-axis
3D rotation
10When multiplying matrices, the order matters.
This is the matrix if the X-rotation is first,
then the Y-rotation.
11Rotating in opposite order gives a different
matrix
12Reversing the rotation
For the opposite rotation, flip the matrix.
This is the transpose
The inverse matrix The transposed matrix.
NOTE cosb cosb sinb sinb 1
13Example rotation in 2 steps
Rotate the vector v(1.,2.,3.) around Z by 60,
then around Y by -60
14Right-handed 90 rotations
z
90 rotation around
y
X
x
x
Y
z
y
y
Z
x
z
Helpful hint
For a R-handed rotation, the -sine is up and to
the right of the sine.
15In class exercise rotate a point
(x,y,z) (1., 4., 7.) Rotate this point by 90
around the Z-axis Then... Rotate the new point by
90 around the Y-axis. What are the new
coordinates?
16Euler angles, ? ? ?
3D rotation conventions
axis of rotation
z
x
z
Order of rotations
1
2
3
Each rotation is around a principle axis.
Polar angles, ???
y
z
z
-z
-y
4
1
2
3
5
Net rotation ?, around an axis axis defined by
? and ?
17Polar angles
z north pole
?
?
y
x prime meridean _at_ equator
?
Rotation of ? degrees around an axis axis located
at ? degrees longitude and ? degrees latitude
18Special properties of rotation matrices
- They are square, 2x2 or 3x3(higher dimensions in
principle) - The product of any two rotation matrices is a
rotation matrix. - The inverse equals the transpose, R-1 RT
- Every row/column is a unit vector.
- Any two rows/columns are orthogonal vectors.
- The cross-product of any two rows equals the
third. - x Rx, where R is a rotation matrix.
Read more about rotation matrices at
http//mathworld.wolfram.com/RotationMatrix.html
19RMSD
Root Mean Square Deviation in superimposed
coordinates is the standard measure of structural
difference. Similar to standard deviation
which is the square-root of the variance.
Orengo p.88
Where xi are the coordinates from molecule 1
and yi are the equivalent coordinates from
molecule 2.
Which atoms are equivalent is based on an
alignment.
20Least squares superposition
Problem find the rotation matrix, M, and a
vector, v, that minimize the following quantity
Where xi are the coordinates from one molecule
and yi are the equivalent coordinates from
another molecule.
equivalent based on alignment
21Mapping structural equivalence aligning the
sequence
Any position that is aligned is included in the
sum of squares.
4DFRA ISLIAALAVDRVIGMENAMPWNLPADLAWFKRNTLDKPV
IMGRHTWESIG-RPLPGRKNI 1DFR_
TAFLWAQNRNGLIGKDGHLPWHLPDDLHYFRAQTVGKIMVVGRRTYESFP
KRPLPERTNV 4DFRA ILSSQ-PGTDDRVTWVKSVDEAIAAC--
GDVPEIMVIGGGRVYEQFLPKAQKLYLTHIDA 1DFR_
VLTHQEDYQAQGAVVVHDVAAVFAYAKQHLDQELVIAGGAQIFTAFKDDV
DTLLVTRLAG 4DFRA EVEGDTHFPDYEPDDWESVFSEFHDADA
QNS--HSYCFKILERR 1DFR_ SFEGDTKMIPLNWDDFTKVSSRT
VEDT---NPALTHTYEVWQKK
Unaligned positions are not.
22Least squares
At the position of best superposition, we have
an approximate equality First we eliminate v by
translating the center of mass of both molecules
to the origin.Now we have We have one equation
(i) for each atom, M has 9 unknowns.
If there are more equations than unknowns, there
is a unique solution. What is it? See the next
two slides.
23Least squares
Least squares solves a set of a linear equations
in the form
x11a1x12a2 x1NaN y1x21a1x22a2
x2NaN y2 xM1a1xM2a2 xMNaN
yM
known coefficients N x M
known values (M gt N)
unknowns (N)
This is 'shorthand' notation for the equations.
a
x
y
matrix
vector
vector
24Least squares, continued
a
x
y
Green elements are known. Orange are unknown.
Fat Rectangles are matrices. Thin rectangles are
vectors.
x
y
a
Multiply both sides by transpose of x. "Squaring"
xT
xT
"Squared" matrix can be inverted. (We can use the
"LU decomposition".)
a
xTx
xTy
(xTx)-1
a
xTx
(xTx)-1
xTy
Multiplying both sides by the inverse of
"squared" matrix solves for a.
(xTx)-1
a
Summary a (xTx)-1 xTy
xTy
25least-squares superimposed molecules
26Structural alignment algorithms
Alignment algorithms create a one-to-one mapping
of subset(s) of one sequence to subset(s) of
another sequence. Structure-based alignment
types
Geometric--intermolecular Algorithms may be do
this by minimizing the intermolecular distances
or root-mean-square deviation (rmsd) in
superimposed alpha-carbon positions.
Geometric--intramolecular Algorithms minimize the
difference between aligned contact maps or
distance matrices. Intramolecular distances are
used
Non-Geometric Algorithms align structural
properties, such as buried, or secondary
structure type, usually using dynamics
programming (DP)
27Structure-based alignment tools
Programs
Databases
DALI VAST CE KENOBI MAMMOTH PRiSM SCALI SARF
FSSP HOMSTRAD COMPASS PALI
SARF, SCALI and KENOBI do non-sequential
alignment.
28Aligning 2 structures
Structure superposition programs have to do two
things (1) Align the sequence (2) Minimize the
RMSD Can't do first without the second. Can't do
the second without the first.
Often it is impossible to get a good sequence
alignment, even though there is structural
homology.
29What a structure-based alignment should mean
Aligned residues are equivalent substructures
(i.e. same secondary structure) Pairs of aligned
residues have the same contact property (either
in contact or not in contact)
What is usually means
Aligned residues superimpose in space. Alignments
are sequential.
Most algorithms used these criterea.
30Remote homologs are more likely than close
homologs
The existence of large numbers of remote homologs
shows us that true structural similarity is hard
to see in the amino acid sequence. Structural
conservation is stronger than sequence
conservation.
likelihood
the "twilight zone"
percent identity for structural homologs
31Example of structural homologs (analogs)
4DFR Dihydrofolate reductase 1YAC Octameric
Hydrolase Of Unknown Specificity
5.9 sequence identity (best alignment) 1YAC
structure solved without knowing
function. Alignment to 4DFR and others implies it
is a hydrolase of some sort, probably uses NAD
cofactors.
32Viewing structural homologs (analogs)
DHFR in yellow and orange. YAC in green and purple
sheets only
helices only
33DALI a intramolecular geometric structural
alignment algorithm
DALI (Distance matrix-based ALIgnment) Liisa
Holm Chris Sander
(1) Generate a distance matrix for each protein
Dij distance between alpha carbon i and alpha
carbon j
i
The distance matrix contains all pairwise
distances.(symmetrical)
j
Geometric--intramolecular
34Shapes in distance matrices
In a Contact Map 1 means close in space,
typically Dij lt 8Ã…
1
1
1
1
1
1
1
1
1
hairpin
helix
1
1
1
1
1
1
1
1
parallel strands
anti-parallel strands
35Aligning two distance matrices
Cut-and-paste alignment of distance matrices
Resulting sequence alignment
36DALI algorithm
S is a score that is a maximum when the alignment
is optimal.
where,
Phi is a constant (theta) minus the absolute
difference of the two distance matrices, after
they are aligned to each other.
37squares are 6x6
Making the DALI pairs list
Structure A
Structure B
S (2 6x6s)
45.3 13.2 56.2 33.3 20.2 12.5
high scores to pairs list
vs
Each pair of 6x6s corresponds to a gapped
alignment
...
...
...
38DALI alignment start with 6x6 pairs from pairs
list
one 6x6 pair
one 6x6 pair
Axes sequence positionshadedistance, shorter
distances are lighter shade.
4dfr
1yac
39extend using another pair, consider new blocks
DALI
2nd pair added here
new, pair-to-pair distances
4dfr
1yac
40DALI alignment
41SSAP alignment
A View is the set of all vectors from one
residue. Each residue has its own "View"
Geometric--intramolecular
42SSAP alignment views
i and j must have similar backbone angles,
otherwise the score is zero.
View for Template residue i
View for Target residue j
j
residue level score matrix
The difference between the two views is a measure
of how similar the structures are, when viewed
from i and j.
i
43SSAP alignment Double Dynamic Programming using
Views
residue level score matrix
For each ij pair, we find the best DP alignment
that includes it. How? First, get the global DP
starting at 0,0 and ending at ij. Then get the
global DP starting at ij and ending at the lower
right-hand corner. Keep the total DP score at
(i,j)
44SSAP alignment summary matrix
DP score for ij goes here
summary matrix
DP
1 single alignment
The summary matrix is made up of DP scores for
each ij position. Then a second round of DP is
run, through the summary matrix, to give the
optimal alignment.
The final SSAP score is compared to an identity
match, and scaled to the range 0-100, where 100
is an identical match.
45Two other servers for structure-based alignment
CE Combinatorial Extension http//cl.sdsc.edu/ce
.html VAST http//www.ncbi.nlm.nih.gov/Structur
e/VAST/vast.shtml
Geometric--intermolecular
46CE interface
47CE alignment
48CE alignment of15 analogs
helix
sheet
49HOW to use CE to find and align structural
homologs
Set your browser to http//cl.sdsc.edu/ce.html Fin
d structural alignments by selecting from ALL or
REPRESENTATIVES from the PDB. Submit your protein
and chain. Or use 4dfrA Select 2 structures.
Then hit Get alignment (or use 4DFR and
1YAC) Download as PDB file. Save it. For use in
MOE, divide the file into 2, one for each protein.
50Non-sequential similarities exist!
1alk
2
3
6
1
5
3
4
2
7
4
1
1vpt
3
4
5
4
1
2
3
6
7
SCALI non-sequential alignment.
1
2
Yuan Bystroff, 2005
51Uses of structural alignment in modeling
- A structure-based alignment is the Gold Standard
for a sequence alignment. - Aligned structures tell you where
insertions/deletions are allowed/not allowed. - Structural analogs provide plausible loop
structures. - Multiple aligned structures show the extent of
structural variability, given a fold. -