Title: Exploring 3D Molecular Structures Using NCBI Tools
1Exploring 3D Molecular Structures Using NCBI Tools
Lecture 2 Alignments in Cn3D
June 10, 2008
2The Cn3D Alignment Model
- Each sequence is aligned pairwise to the master
PDB sequence - Aligned blocks represent secondary structure
elements - Aligned blocks have no internal gaps
- Aligned sequences have a residue at each column
in the block - Residues in the same column occupy the same
position in space
Block 1
Block 2
Block 3
3What Cn3D Can Do
- Render, rotate, and annotate multiple structure
models - Create and edit multiple sequence alignments
- Import and align sequences and structures based
on an existing alignment
What Cn3D Cant Do
- Alter the structural coordinate data
- Create a theoretical structure or run MD
simulations - Read or write PDB files directly
4Finding Structures by Homology
- Use simple sequence homology (BLASTp)
- Finds pairwise alignment based on sequence
similarity - Use sequence and functional homology (CD-Search)
- Finds multiple sequence alignment based on
sequence similarity - Use structural homology (VAST)
- Finds multiple sequence alignment based on
structural similarity - Use sequence, structure and function (Curated
CDs) - Finds multiple sequence alignment based on
sequence and structural similarity
5VAST Creating Structural Alignments
- Why search for similar structures?
- To superimpose structures
- To find homologs that sequence searches cannot
distant protein homologs often conserve structure
more strongly than sequence - To explore protein evolution similar protein
folds can be used to support different functions - To identify conserved core elements of a protein
fold that can be used to model related proteins
of unknown structure
6VAST Structural Neighbors
Vector Alignment Search Tool
4
For each 3D domain,
2
locate SSEs (secondary structure elements),
5
6
and represent them as individual vectors.
1
3
Human IL-4
7VAST Calculate fij
Vector position about the z axis
For both the query and target structures, Calcula
te the midpoint of each SSE. For each SSE
k, align k along z and project midpoints onto the
xy plane. Then calculate fijk for i ? k, j ? k.
z
4
2
5
6
1
3
4
2
1
5
f14
6
8VAST Calculate (rik, zik)
Vector position relative to the xy plane
z
For both the query and target structures, For
each SSE k, set the origin at the midpoint of k.
Then calculate rik and zik for the endpoints
of SSEs i ? k.
3
r13
1
z13
xy
9VAST Create Comparison Graph
N
C
1 2 3 4 5 6
4
N
1 2 3 4 5
2
Nodes r13ltgtr12 z13ltgtz12
IL-4
5
1
3
6
3
4
IL-6
Arcs f16ltgtf15 must follow sequence order
1
5
2
C
Select path with highest weights
10VAST Refinement
Aligned residues are red
Ca atoms are added to the aligned SSEs
Alignments are allowed to extend beyond SSE
boundaries
All atoms are added to the models, and the
detailed backbone and sidechain positions are
refined
Alignment extended to the end of this strand
11VAST Alignment of Sequence
- Aligned blocks represent structural core
elements - Aligned blocks have no internal gaps
- Aligned residues occupy the same position in
space - Aligned residues are shown in CAPITAL letters
Helix 1
Helix 2
Helix 3
Helix 4
12VAST Summary
- Secondary structure elements are represented as
vectors - and are aligned based on their relative
orientations - VAST ignores loops and tolerates variation in
SSE length - The initial alignment is wholly ignorant of
atomic coordinates
- Pathways through aligned SSEs respect sequence
order - VAST is sensitive to topology
C
C
N
N
N
C
- Alignments are extended and optimized using
all-atom models - Aligned blocks may extend across or into loops
or other SSEs
13VAST Scoring
p d P(s gt s0, n) c(n, P1, P2)
The probability that the VAST alignment occurred
by chance.
d
Number of structures searched (set to 500)
P(s gt s0, n)
Probability of observing an alignment of n SSEs
with a score greater than s0 by chance.
Search space Number of possible alignments of n
SSEs between vector sets P1 and P2.
c(n, P1, P2)
14Accessing VAST Neighbors
links to VAST neighbors
15VAST Neighbor View
links to structure-based sequence alignments
16Query by Chain vs 3D Domain
c(n, P1, P2) is smaller for a 3D domain!
Query by whole chain
Not found using whole chain query!
Query by domain 5
17VAST Multiple Alignments
Cn3D
18Entrez Links to VAST Neighbors
Limiting VAST results by an Entrez query in 3D
Domains
3 AND humanorgn AND 4helixcount AND
0domainno
173 VAST neighbors
19Submitting a PDB File to VAST
- Redesigned interface!
- This is the best way to convert PDB into MMDB
format!
20Structure Function
- VAST finds proteins that have similar 3D folds
- CD-Search finds proteins that have similar
sequences and similar functions - Curated CDs VAST CD-Search
- Proteins that have similar 3D folds,
- similar sequences and similar functions
21Curating CDs with VAST
Cn3D
Cn3D
smart00235
VAST
cd00203
22 CD-Curation Effect on model alignment accuracy
A. Marchler-Bauer
23cd00659 A Curated CD
Functional features
parent CD
child CD
cd00659
24CD Family Values
Residues aligned in the parent must be aligned in
the child
Parent cd00397, C-term catalytic domain of DNA
breaking-rejoining enzymes
164 columns
218 columns
Child cd00659, C-term catalytic domain of DNA
Topo IB
25Curated CD Summary
Cn3D
catalytic residue
Aligned query
catalytic residues
26A Path to a Structural Template
- Look for a curated CD
- CD-Search Youre done if you find one otherwise
continue. - Construct a structural alignment
- BLASTp (Related Structure) Find the most
sequence-similar structure to the query - VAST Find the structural neighbors to the most
sequence similar structure - Cn3D Import and align the sequence to the VAST
alignment using algorithms in Cn3D
27Importing into Cn3D
Master
Sequences are initially unaligned, with red
regions indicating blocks
Import
28Cn3D Alignment Algorithms
1
2
3
29Identifying Alignment Problems
Block errors Indicated by red shading These
result when the extent of an aligned block in the
import window differs from that in the template
block error
geometry violation
Fix these problems by adjusting the block lengths!
Geometry violations Indicated by green
shading These result when a loop between aligned
blocks in the import window is too short to span
the distance between the block ends based on the
master structure in the template
30Trial with NP_001058
Human topoisomerase IIa
pfam02518
curated CDs
31Step 1 Related Structures
pfam02518
1ZXM the most sequence-similar structure
32Step 2 VAST Neighbors of 1ZXM
Cn3D
?
?
?
pfam02518
33For more information
- Course web pages
- info_at_ncbi.nlm.nih.gov
- NCBI Handbook, Ch. 3
- NCBI Bookshelf Bioinformatics in Tropical
Disease Research