TEXTAL:%20Applications%20of%20Pattern%20Recognition%20to%20Macromolecular%20Crystallography - PowerPoint PPT Presentation

About This Presentation
Title:

TEXTAL:%20Applications%20of%20Pattern%20Recognition%20to%20Macromolecular%20Crystallography

Description:

heavy atom search, Patterson correlation, solvent flattening, maximum likelihood ... Model molecular structures in local regions (e.g. spheres of 5 Angstrom radius) ... – PowerPoint PPT presentation

Number of Views:130
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: TEXTAL:%20Applications%20of%20Pattern%20Recognition%20to%20Macromolecular%20Crystallography


1
TEXTALApplications of Pattern Recognition to
Macromolecular Crystallography
  • Dr. Thomas R. Ioerger
  • Department of Computer Science
  • Texas AM University

Collaboration with Dr. James C. Sacchettini,
Center for Structural Biology, Texas AM
2
Automating Structure Determination
  • Typical Steps
  • obtain crystals
  • collect data (e.g. MAD, at synchrotron)
  • determine initial set of phases
  • generate electron density map
  • density modification/phase refinement
  • construct model (atomic coordinates)

3
Automating Structure Determination
  • Existing computational routines
  • heavy atom search, Patterson correlation, solvent
    flattening, maximum likelihood phase combination
  • few methods to interpret electron density maps
  • requires humans potential bottleneck
  • difficulty low res., phase errors, weak density
  • must automate for structural genomics and
    rational drug design

4
Overview of TEXTAL
  • Apply pattern recognition techniques
  • Exploit database of previously-solved maps
  • Model molecular structures in local regions (e.g.
    spheres of 5 Angstrom radius)
  • Intuitive principles
  • 1) Have I ever seen a region with a
  • pattern of density like this before?
  • 2) If so, what were previous
  • local atomic coordinates?

5
Overview (contd)
  • Divide-and-Conquer
  • 1) identify alpha-carbon positions
    (chain-tracing)
  • 2) model regions around alpha-carbons (CAs),
    including backbone and side-chain atoms
  • 3) concatenate local models back together,
    resolve any conflicts
  • Database contains many regions centered on CAs
    from previous maps
  • 5A radius right for structural repetition

6
Overview (contd)
  • Database 105 regions from 100 maps
  • How to identify closest match (efficiently)???
  • Calculate numerical features that represent the
    pattern in each region
  • Must be rotation-invariant
  • Search can be very fast just compare features

7
Overview (contd)
8
Database Construction
  • Ideally would use solved MAD/MIR maps
  • Using back-transformed maps works well
  • PDB ? structure factors (include B-factors)
  • keep reflections down to 2.8A
  • Fourier transform ? electron density map
  • 50 proteins from PDBSelect (non-homol.)
  • about 50,000 regions
  • Feature extraction done offline

9
Rotation-Invariant Features
  • Average density m(1/n)Sri, where ri is density
    at each lattice point in region
  • Other Statistical Features standard deviation,
    kurtosis
  • Distant to center of mass
  • ltxc,yc,zcgt(1/n)lt Sxiri/m,Syiri/m,Sziri/mgt
  • dcen?(xc2 yc2 zc2)

10
More Features
  • Moments of inertia
  • measures dispersion around axes of symmetry in a
    density distribution
  • calculate 3x3 inertia matrix
  • diagonalize to get eigenvalues
  • sort from largest to smallest
  • take magnitudes and ratios of moments

11
More Features
  • Spoke angles
  • if region centered on CA, should have 3 spokes
    of density emanating from center
  • find best-fit vectors calc. angles among them
  • surface area of contours
  • connectivity of density/bones in region
  • other geometrical features...

12
Details of Matching Process
  • Feature-based matching
  • Euclidean distance metric between feature
    vectors.
  • dist(R1,R2)?Swi(Fi(R1)-Fi(R2))2
  • Must weight features by relevance
  • less-relevant features add noise
  • Slider algorithm optimize weights by comparing
    features in matching regions versus mismatches
  • Verify selections by density correlation
  • requires search for optimal rotation

13
Experiments
  • Goal evaluate potential of pattern-matching
  • Assumption CA positions known
  • Procedure
  • 1. extract features for each region
  • 2. collect top K400 feature-based matches in DB
  • 3. calculate density correlation, take best match
  • 4. rotate backbonesidechain atoms into position
  • 30sec/residue on SGI Origin 2000

14
Feature Weights
15
Results
1gcn glucagon 1fnb ferredoxin reductase 1tup
p53 tumor suppressor IFABP intestinal fatty
acid binding protein BT back-transformed
16
Results
Structural similarity groups Ala
Asp, Asn, Leu Gly
Glu, Gln Pro Arg, Lys,
Met Cys, Ser Phe, Trp, Tyr,
His Ile, Val, Thr
17
Results
18
Example Portion of 1tup
19
Example Glucagon
20
Post-Processing Routines
  • Concatenate local models per a.a. into PDB
  • Detect and repair flips by majority chain
    direction
  • Utilize amino acid sequence information
  • map chains into known sequence (alignment)
  • re-lookup residues based on identity
  • Real-space refinement

21
CAPRA
  • Need to find CAs automatically and accurately
  • Bones doesnt identify CAs (except branches)
  • Use pattern recognition again
  • Extract features for all lattice points inside 1s
    contour, or along trace
  • Use neural net to predict distance to true CA
  • Training set examples of ltF1,F2gt,Di
  • Status currently 1A rms, need to get 0.5-0.8

22
Example
23
Acknowledgements
  • Dr. James C. Sacchettini
  • Center for Structural Biology, Texas AM
  • Graduate students/post-docs
  • Dr. Jon Christopher, Tom Holton, Lydia Tapia
  • Funding provided by NIH (GM-59398)
  • See our forthcoming paper in Acta Cryst. D
Write a Comment
User Comments (0)
About PowerShow.com