TEXTAL:%20Applications%20of%20Pattern%20Recognition%20to%20Macromolecular%20Crystallography - PowerPoint PPT Presentation

About This Presentation

Title:

TEXTAL:%20Applications%20of%20Pattern%20Recognition%20to%20Macromolecular%20Crystallography

Description:

heavy atom search, Patterson correlation, solvent flattening, maximum likelihood ... Model molecular structures in local regions (e.g. spheres of 5 Angstrom radius) ... – PowerPoint PPT presentation

Number of Views:130

Avg rating:3.0/5.0

Slides: 24

Provided by: thomasr1

Learn more at: https://people.engr.tamu.edu

Category:

more less

Transcript and Presenter's Notes

Title: TEXTAL:%20Applications%20of%20Pattern%20Recognition%20to%20Macromolecular%20Crystallography

1
TEXTALApplications of Pattern Recognition to
Macromolecular Crystallography

Dr. Thomas R. Ioerger
Department of Computer Science
Texas AM University

Collaboration with Dr. James C. Sacchettini,
Center for Structural Biology, Texas AM
2
Automating Structure Determination

Typical Steps
obtain crystals
collect data (e.g. MAD, at synchrotron)
determine initial set of phases
generate electron density map
density modification/phase refinement
construct model (atomic coordinates)

3
Automating Structure Determination

Existing computational routines
heavy atom search, Patterson correlation, solvent
flattening, maximum likelihood phase combination
few methods to interpret electron density maps
requires humans potential bottleneck
difficulty low res., phase errors, weak density
must automate for structural genomics and
rational drug design

4
Overview of TEXTAL

Apply pattern recognition techniques
Exploit database of previously-solved maps
Model molecular structures in local regions (e.g.
spheres of 5 Angstrom radius)
Intuitive principles
1) Have I ever seen a region with a
pattern of density like this before?
2) If so, what were previous
local atomic coordinates?

5
Overview (contd)

Divide-and-Conquer
1) identify alpha-carbon positions
(chain-tracing)
2) model regions around alpha-carbons (CAs),
including backbone and side-chain atoms
3) concatenate local models back together,
resolve any conflicts
Database contains many regions centered on CAs
from previous maps
5A radius right for structural repetition

6
Overview (contd)

Database 105 regions from 100 maps
How to identify closest match (efficiently)???
Calculate numerical features that represent the
pattern in each region
Must be rotation-invariant
Search can be very fast just compare features

7
Overview (contd)
8
Database Construction

Ideally would use solved MAD/MIR maps
Using back-transformed maps works well
PDB ? structure factors (include B-factors)
keep reflections down to 2.8A
Fourier transform ? electron density map
50 proteins from PDBSelect (non-homol.)
about 50,000 regions
Feature extraction done offline

9
Rotation-Invariant Features

Average density m(1/n)Sri, where ri is density
at each lattice point in region
Other Statistical Features standard deviation,
kurtosis
Distant to center of mass
ltxc,yc,zcgt(1/n)lt Sxiri/m,Syiri/m,Sziri/mgt
dcen?(xc2 yc2 zc2)

10
More Features

Moments of inertia
measures dispersion around axes of symmetry in a
density distribution
calculate 3x3 inertia matrix
diagonalize to get eigenvalues
sort from largest to smallest
take magnitudes and ratios of moments

11
More Features

Spoke angles
if region centered on CA, should have 3 spokes
of density emanating from center
find best-fit vectors calc. angles among them
surface area of contours
connectivity of density/bones in region
other geometrical features...

12
Details of Matching Process

Feature-based matching
Euclidean distance metric between feature
vectors.
dist(R1,R2)?Swi(Fi(R1)-Fi(R2))2
Must weight features by relevance
less-relevant features add noise
Slider algorithm optimize weights by comparing
features in matching regions versus mismatches
Verify selections by density correlation
requires search for optimal rotation

13
Experiments

Goal evaluate potential of pattern-matching
Assumption CA positions known
Procedure
1. extract features for each region
2. collect top K400 feature-based matches in DB
3. calculate density correlation, take best match
4. rotate backbonesidechain atoms into position
30sec/residue on SGI Origin 2000

14
Feature Weights
15
Results
1gcn glucagon 1fnb ferredoxin reductase 1tup
p53 tumor suppressor IFABP intestinal fatty
acid binding protein BT back-transformed
16
Results
Structural similarity groups Ala
Asp, Asn, Leu Gly
Glu, Gln Pro Arg, Lys,
Met Cys, Ser Phe, Trp, Tyr,
His Ile, Val, Thr
17
Results
18
Example Portion of 1tup
19
Example Glucagon
20
Post-Processing Routines