Ioerger Lab - PowerPoint PPT Presentation

About This Presentation
Title:

Ioerger Lab

Description:

effect of feature extraction, weighting, and interaction on performance of induction algorithm ... training in biochemistry; knowledge of molecular interactions ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 24
Provided by: thomasr1
Category:
Tags: ioerger | lab

less

Transcript and Presenter's Notes

Title: Ioerger Lab


1
Ioerger Lab Bioinformatics Research
  • Pattern recognition/machine learning
  • issues of representation
  • effect of feature extraction, weighting, and
    interaction on performance of induction algorithm
  • Applications in Structural Biology
  • molecular basis of biology protein structures
  • predicting structures
  • tools for solving structures (X-ray
    crystallography, NMR)
  • stability, folding, packing, motions
  • drug design (small-molecule inhibitors)
  • large datasets exist exploit them find the
    patterns

2
TEXTAL - Automated Crystallographic Protein
Structure Determination Using Pattern Recognition
  • Principal Investigators Thomas Ioerger (Dept.
    Computer Science)
  • James Sacchettini (Dept. Biochem/Biophys)
  • Other contributors Tod D. Romo, Kreshna Gopal,
    Erik McKee,
  • Lalji Kanbi, Reetal Pai Jacob Smith
  • Funding National Institutes of Health
  • Texas AM University

3
X-ray crystallography
  • Most widely used method for protein modeling
  • Steps
  • Grow crystal
  • Collect diffraction data
  • Generate electron density map (Fourier transform)
  • Interpret map i.e. infer atomic coordinates
  • Refine structure
  • Model-building
  • Currently crystallographers
  • Challenges noise, resolution
  • Goal automation

4
X-ray crystallography
  • Most widely used method for protein modeling
  • Steps
  • Grow crystal
  • Collect diffraction data
  • Generate electron density map (Fourier transform)
  • Interpret map i.e. infer atomic coordinates
  • Refine structure
  • Model-building
  • Currently crystallographers
  • Challenges noise, resolution
  • Goal automation

5
Overview of TEXTAL
  • Automated model-building program
  • Can we automate the kind of visual processing of
    patterns that crystallographers use?
  • Intelligent methods to interpret density, despite
    noise
  • Exploit knowledge about typical protein structure
  • Focus on medium-resolution maps
  • optimized for 2.8A (actually, 2.6-3.2A is fine)
  • typical for MAD data (useful for high-throughput)
  • other programs exist for higher-res data
    (ARP/wARP)

Electron density map (or structure factors)
Protein model (may need refinement)
TEXTAL
6
Crystal
Collect data
Electron density map
Diffraction data
LOOKUP model side chains
CAPRA models backbone
SCALE MAP TRACE MAP CALCULATE FEATURES PREDICT
Cas BUILD CHAINS PATCH STITCH CHAINS REFINE
CHAINS
Model of backbone
Model of backbone side chains
POST-PROCESSING
SEQUENCE ALIGNMENT REAL SPACE REFINEMENT
Corrected refined model
7
Flt1.72,-0.39,1.04,1.55...gt
Flt1.58,0.18,1.09,-0.25...gt
Flt0.90,0.65,-1.40,0.87...gt
Flt1.79,-0.43,0.88,1.52...gt
8
Examples of Numeric Density Features
  • Distance from center-of-sphere to center-of-mass
  • Moments of inertia - relative dispersion along
    orthogonal axes
  • Geometric features like Spoke angles
  • Local variance and other statistics

Features are designed to be rotation-invariant,
i.e. same values for region in any
orientation/frame-of-reference. TEXTAL uses 19
distinct numeric features to represent the
pattern of density in a region, each calculated
over 4 different radii, for a total of 76
features.
9
The LOOKUP Process
Find optimal rotation
Database of known maps
Two-step filter 1) by features 2) by
density correlation
2-norm weighted Euclidean distance metric for
retrieving matches
Region in map to be interpreted
10
SLIDER Feature-weighting algorithm
  • Euclidean distance metric used for retrieval
  • relevant features good, irrelevant features
    bad
  • Goal find optimal weight vector w the generates
    highest probability of hits (matches) in top K
    candidates from database
  • Concept of Slider
  • adjust features so the most matches are ranked
    higher than mismatches

Slider Algorithm(w,F,Ri,matches,mismatches)
choose feature f?F at random for each
ltRi,Rj,Rkgt, Rj?matches(Ri),Rk?mismatches(Ri)
compute cross-over point li where
dist(Ri,Rj)dist(Ri,Rk) dist(X,Y)
l(Xf-Yf)2(1-l)dist\f(X,Y) pick l that is best
compromise among li ranks most matches above
mismatches update weight vector
w?update(w,f,l), wfl repeat until convergence
11
Quality of TEXTAL models
  • Typically builds gt80 of the protein atoms
  • Accuracy of coordinates 1Ã… error (RMSD)
  • Depends on resolution and quality of map

12
Closeup of b-strand (TEXTAL model in green)
13
Deployment
  • September 2004 Linux and OSX distributions
  • Can be downloaded from http//textal.tamu.edu
  • 40 trial licenses granted so far
  • June 2002 WebTex (http//textal.tamu.edu)
  • Till May 2005 TB Structural Genomics Consortium
    members only
  • Recently open to the public
  • users upload data processed on server can
    download results
  • 120 users from 70 institutions in 20 countries
  • July 2003 Model building component of PHENIX
  • Python-based Hierarchical ENvironment for
    Integrated Xtallography
  • Consortium members
  • Lawrence Berkeley National Lab
  • University of Cambridge
  • Los Alamos National Lab
  • Texas AM University

14
Intelligent Methods for Drug Design
  • structure-based
  • given protein structure, predict ligands that
    might bind active site
  • other methods
  • QSAR, high-throughput/combi-chem, manual design
    using 3D
  • Virtual Screening
  • docking algorithm large library of chemical
    structures
  • sort compounds by interaction energy
  • purchase top-ranked hits and assay in lab
  • looking for mM inhibitors (leads that can be
    refined)
  • goal enrichment to 5 hit rate

15
Virtual Screening
  • diversity
  • ZINC database 2.6 million compounds
  • purchasable satisfy Lipinskis rules
  • docking algorithms
  • FlexX, DOCK, GOLD, AutoDock, ICM...
  • search for position and conformation of ligand
  • scoring function
  • electrostatic steric desolvation
  • entropy effects?
  • major open issues
  • active site flexibility, charge state, waters,
    co-factors
  • works best with co-crystal structures (already
    bound)

16
Grid at Texas AM
gridmaster.tamu.edu
DOCK binaries receptor files 20 ligands at a
time
West Campus Library
typical configuration 2.8 GHz dual-core Pentium
CPUs running Windows XP
Blocker
Zachary
1600 computers in student labs on TAMU campus
(Open-Access Labs)
GridMP software by United Devices (Austin, TX)
17
Data Mining of Results
  • promiscuous binders
  • clusters of related compounds
  • patterns of contacts within active site
  • hydrogen-bonding interactions
  • adjust weights of scoring function for unique
    properties of each site
  • open/closed, hydrophobic/charged...
  • ideas for active site variations
  • development of pharmacophore search patterns

18
Current Screens in Sacchettini Lab
  • proteins related to tuberculosis (Mycobacterium)
  • focus on unique pathways involved in
    dormancy/starvation
  • glyoxylate shunt slow-growth metabolic pathway
  • cell-wall biosynthesis (unique mycolic acid layer
    in tb.)
  • biosynthesis of amino acids/co-factors that
    humans get from diet
  • isocitrate lyase
  • malate synthase
  • PcaA mycolic acid cyclopropane synthase
  • ACPS acyl-carrier protein synthase
  • InhA enoyl-acyl reductase (target of isoniazid)
  • KasB fatty-acid synthase
  • BioA biotin (co-factor) synthase
  • PGDH phospho-glycerol dehydrogenase (serine
    biosynthesis)
  • Related proteins in malaria, SARS, shigella

19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
Conclusions
  • Many opportunities for research in Structural
    Bioinformatics
  • large datasets
  • significant problems
  • Provides challenges for machine learning
  • drives development of novel methods, especially
    for dealing with noise, sampling biases,
    extraction of features...
  • Requires inherently interdisciplinary approach
  • training in biochemistry knowledge of molecular
    interactions
  • understanding chemical intuition use of
    visualization tools
  • insights about strengths and limitations of
    existing methods
  • Requires collaboration to construct appropriate
    representations to enable learning algorithms to
    find patterns
  • translate expectations about what is relevant,
    dependencies, smoothing, sources of noise...
Write a Comment
User Comments (0)
About PowerShow.com