CyberBridges Protein Pattern Discovery - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

CyberBridges Protein Pattern Discovery

Description:

... 0.32 Angstroms. RMSD: 0.66 ... Square distance) less than 1.0 Angstrom indicates a good match ... RMSD: 0.46 angstroms. 4 positions RMSD: 0.35 angstroms. 14 ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 15
Provided by: IBMU467
Category:

less

Transcript and Presenter's Notes

Title: CyberBridges Protein Pattern Discovery


1
CyberBridges Protein Pattern Discovery
  • Tom Milledge
  • Giri Narasimhan
  • Bioinformatics Research Group (BioRG)
  • School of Computing and Information Sciences, FIU

2
Protein Pattern Discovery Introduction
  • Goals
  • Implement unsupervised pattern discovery tools
    for protein structure data by using the geometric
    hashing technique
  • Create database of protein structure patterns
  • Create multiple 3-D structural alignments
  • Identify functional regions in proteins.

3
Molecular Biology Primer
Proteins Hemoglobin, Immunoglobin, Keratin,
Melanin, Insulin, etc.
RNA
Protein
4
Where does protein structure information come
from?
PDB (protein data bank) a repository of 3-D
protein structures
5
Representing substructures as triangles
Largest common substructure (many linked
triangles) in query and target proteins
One triangle (3 atoms)
  • Length1 Length2 Length3 ID1 ID2 ID3
  • 9.5 7.05 7.01
    217 231 238

6
Basic steps for triangle-based geometric hashing
  • Preprocessing phase
  • Extract triangle information from target (model)
    proteins and store them in a hash table
  • Searching phase
  • For any given query protein, find the matching
    triangles in the hash table
  • Extension phase
  • Find the largest matching substructures

7
Preprocessing phase Create the hash table
Read PDB data
7.06
9.49
7.01
Hash key
Extract triangles
035035047
Generate a hash key (based on the three lengths
and bin-size parameters) and enter record into a
hash table
8
Search phase finding the matches
1. Decompose Query Protein

The initial search entails matching the query
triangles with the database of (target)
triangles. The results are added to a new hash
table containing all the target matches. The
results table includes the query atom IDs for the
substructure building phase.
The Hash Table is split across cluster nodes by
protein, with protein attribute information
stored in a separate table. This data is
accessed via the atom id foreign keys stored in
the hash table record.
At the begin of the search, the query protein is
decomposed into triangles with the attribute
information stored in a separate table. The
query protein data is then copied to all nodes.
2. Initial Search
9
Extension phase building the substructures
Every vertex of the tree is a triangle
A list of triangle hits
Build an adjacency structure
Use graph searching algorithm, find larger
substructures
Measure structural similarity (RMSD) between
every substructure in query protein with every
substructure in model protein
Output common substructure pairs
RMSD root mean square distance
10
Case study Dehydrogenase superfamily
1B3R Hydrolase (Rat)
1CJC Reductase (Cow)
1CF2 Dehydrogenase (Bacteria)
11
Dehydrogenases Shared structural element
Reoccurring substructure
12
Dehydrogenases building the common substructure
Other overlapping triangle matches are extended
from initial triangle to find largest common
substructure
Triangle from query protein (green) matches
triangle from target protein (pink)
RMSD (Root Mean Square distance) less than 1.0
Angstrom indicates a good match
RMSD is measured at each extension step to ensure
validity of the larger match
RMSD 0.32 Angstroms
RMSD 0.66 Angstroms
13
Results Zinc finger protein family
DNA-binding substructure
Zinc-binding substructure
10 positions RMSD 0.46 angstroms
4 positions RMSD 0.35 angstroms
14
Conclusions and Future Work
  • Geometric hashing of proteins shows promise as an
    important technique with a very good fit to many
    parallel architectures. Areas of future work
    include
  • Molecular Docking Identify potential drugs that
    are least likely to cause side-effects.
  • Function prediction Create a database of
    conserved substructures that indicate a specific
    protein function.
  • Structure prediction Use sequence patterns with
    a structural templates to predict structure of
    new sequences.
Write a Comment
User Comments (0)
About PowerShow.com