Title: TEXTALTM : Artificial Intelligence Techniques for Protein Structure Determination
1TEXTALTM Artificial Intelligence Techniques for
Protein Structure Determination
Kreshna Gopal, Reetal Pai, Thomas. R. Ioerger,
Tod.D.Romo, James. C. Sacchettini
Texas AM University
IAAI 2003 Acapulco, Mexico
2Overview
- CAPRA
- LOOKUP
- Post processing Routines
- Results
- Discussion
- TEXTAL Availability
- Acknowledgements
3CAPRA C? Pattern Recognition Algorithm
Map
- Scale input map to enable comparisons of
patterns between - different maps
- Trace gives connected skeleton of pseudo atoms
through the - backbone and the side chains
- The feature vectors are calculated at each of
the tracer atoms
Scale
Tracer
Calculate features
4CAPRA C? Pattern Recognition Algorithm
- The neural network has
- one input layer of 38 nodes
- hidden layer of 20 nodes
- one output node
Trained network parameters
Neural Network
- Neural network associates certain
characteristics in the local density with an - estimate of proximity to true C?
- The neural network predicts distances of pseudo
atoms to true C?s
5CAPRA C? Pattern Recognition Algorithm
- Select candidate C?s from all the pseudo atoms
(trace) - Use the distance predictions from the neural
network
Selection of way-points
- Link all the C?s into linear chains
- Integrate intuitive criteria
- Identify the linearized sub-structure from the
tracer graph
Build chains
C? chains
6LOOKUP The Core Pattern Matching Routine
- Predicts co-ordinates of side chain atoms given
location of C? atom - Features used to determine regions with similar
patterns of density - Use of rotation invariant features
- Feature values calculated at different radii
- Database consists of feature vectors from regions
within previously solved maps
7LOOKUP The Core Pattern Matching Routine
TEXTAL uses weighted Euclidean feature difference
as measure of similarity
8LOOKUP SLIDER
- SLIDER incrementally adjusts feature weights to
increase matches and decrease mismatches - For each region, a match and a mis-match are
found and these 3-tuples are used to tune weights - Weights can be optimized by finding a value
between (0 ? 1) producing most positive
crossovers - Not guaranteed to find globally optimal weight
vector, only a local optimum
9Post Processing Routines
- To correct errors in stereochemistry
- Identify residues with backbone atoms in the
wrong direction - Identify correct backbone direction using a
voting procedure - Re-invoke LOOKUP
- Real Space Refinement
- Move atoms slightly to optimize their fit to
density - Preserve geometric constraints like bond
distances and angles - Sequence Alignment
- Corrects the identities of mislabeled amino acids
- Errors in LOOKUP output due to noise perturbing
local density - Errors corrected if the predicted fragment can be
fit into actual sequence
10TEXTALTM Input
08/01/03
Texas AM University
11TEXTALTM Output
Mean residue density corr.
Mean length of output chains
Length of longest chain
No. of chains output
of structure built
All-atom rms error (Å)
Side chain structural similarity
C? rms error (Å)
Protein
A2u-globulin
2
88
68.5
85
0.85
0.84
0.99
48.9
Armadillo
9
217
46.7
89
0.98
0.82
N.A.
43.7
Cyanase
6
94
32.0
94
1.1
0.79
1.03
42.7
Gere
2
44
30.5
90
0.85
0.83
1.00
30.0
GM-CSF
4
46
25.0
82
0.91
0.84
0.94
28.9
Nsf-d2
6
79
39.5
92
0.96
0.83
1.13
33.5
Penicillopepsin
13
58
25.0
91
1.13
0.78
1.09
41.9
Psd-95
8
58
31.8
94
1.00
0.82
1.04
34.7
Rab3a
8
30
20.5
90
0.90
0.82
1.06
30.5
Rh-dehalogenase
8
66
36.5
97
0.92
0.83
0.99
54.6
CzrA
3
57
33.3
94
1.05
0.82
1.15
39.1
MVK
10
58
28.7
88
0.83
0.82
1.00
44.5
08/01/03
Texas AM University
12CAPRA Output (CzrA)
08/01/03
Texas AM University
13LOOKUP Output (CzrA)
08/01/03
Texas AM University
14Discussion
- TEXTALTM has the potential to reduce one of the
major bottlenecks in the way of high-throughput
Structural Genomics - Future Work
- Other feature comparison measures
- Clustering of database
- Reducing redundancy from database
- Stitching CAPRA chains
15TEXTALTM Development
- Access to TEXTALTM through http//textal.tamu.e
du12321 - PHENIX - Python-based Hierarchical ENvironment
for Integrated Xtallography - PHENIX Members
- Berkeley Lab
- University of Cambridge
- Los Alamos National Laboratory
- Texas AM University
- The alpha release of PHENIX is now available
16Acknowledgements
- National Institute of Health
- TEXTALTM PIs
- Dr. Thomas R. Ioerger
- Dr. James C. Sacchettini
- TEXTALTM Development Team
- Kevin Childs
- Kreshna Gopal
- Reetal Pai
- Tod D Romo
- Vinod Reddy Melapudi