Title: Small Molecule-Protein Interactions
1Small Molecule-Protein Interactions
- Howard Feldman
- The Blueprint Initiative
- Toronto, Ontario
- hfeldman_at_blueprint.org
2Drug Discovery Pipeline
- Most drugs are small molecules, and the
interactions they make with proteins determine
their effects, and toxicity, to the human body - Clinical trials are most expensive part of the
pipeline if failure can be predicted before
this point, it saves time and money
3Drug Discovery Pipeline
- It is of utmost importance to identify lead
compounds in the early stages of drug discovery
that will be most likely to succeed - Recent study by Tufts Center for the Study of
Drug Development showed that bringing one drug to
market costs an average of 800M! - 5/5,000 potential new drugs tested on animals
reach clinical trials, and only one ultimately
wins FDA approval
4How small is a small molecule?
- Small molecule generally considered anything
which may interact with protein-DNA - Must be biologically relevant
- Examples include ions, polysaccharides, peptides,
drugs
5Small Molecules
- No absolute maximum size, though drug-like
molecules often have molecular weight of 500 Da
or less - However can get complex branched poly
saccharides, cyclic antibiotics, etc. - Normally not interested in detergents, buffers,
solvents, denaturants, non-biological ions
6How Many?
- Recently a number of public small molecule
databases have become available - CAS Registry 26,000,000 substances
- Cambridge Structural Database 300,000 3D
structures - PDBSum 6700 3D ligands from PDB
- NCI databse 250,000 molecules
- NCBI PubChem 700,000 compounds
- ChemBank 1,100,000 molecules
- Problem data can be very messy, sparse
7Popular small molecules and domains
- Not surprisingly, divalent cations and ATP are
the most common small molecules found interacting
with proteins - AAA is an ATPase domain, the next three are all
helicases, which bind various nucleotides as well
8Toxicity
- Caused when drug interferes with biological
pathway(s) in the host - Less side-effects, the better
- Must be determined in early stages of discovery,
or very costly - Hence predicting toxicity is very important and
desirable - Boils down to predicting interactions, or rather,
non-interactions
9Predicting Toxicity
- Inverse docking Chen and Zhi developed a
database of cavities in PDB structures - INVDOCK searches cavities for potential
interactions to ligand of interest, using scoring
function - Compare energy to absolute threshold, as well as
energy of observed PDB ligand(s) at that site
10Example 4H-Tamoxifen
- Used to treat breast cancer
- INVDOCK finds 22 putative protein targets at
least 10 of which have some experimental backing
(including the ones shown here) - Estrogen receptor (the drug target)
- Alcohol dehydrogenase (enhances sedative effect
of alcohol) - IgG light chain (modulates immune response)
- 17b-hydroxysteroid dehydrogenase (tumor
regression) - GST (suppressed activity, genotoxicity,
carcinogenicity)
11Drug Docking
12Drug Docking
- Shares much in common with structure prediction
- Two components
- Exploration of conformational space
- Scoring function
- Plus one additional component
- Locating the binding site
13Drug Docking Level of Detail
- Rigid body docking protein remains fixed, small
molecule has 6 degrees of freedom (DOF) 3
translational and 3 rotational
14Drug Docking Level of Detail
- Flexible-ligand docking protein remains fixed,
small molecule has standard 6 DOF plus internal
DOF can rotate about bonds - More time consuming, but necessary for complex
ligands if binding conformation is unknown - Flexible docking as above, and in addition
protein atoms in neighbourhood of binding site
can move - Largest conformational space to search
- Often done by using multiple static protein
conformers, and treating each by flexible ligand
docking - Often important when docking to apo-protein e.g.
allosteric effects
15Drug Docking Level of Detail
- Some methods such as FlexX perform incremental
construction within the binding pocket rather
than docking per se
16Drug Docking Techniques
- Drug docking algorithms share much with protein
structure prediction, and include - Monte Carlo search
- Molecular Dynamics
- Genetic Algorithms
- Fragment Assembly
- Tabu Search
- Many more
17Drug Docking
- When ligand and target are known, can allow
complete flexible docking - For HTS, can usually only afford rigid body for
initial pass - Location to dock to on protein target may be
known ahead of time, or may be computed through
binding pocket detection - Often binding site can be predicted if 3D
structure is available using cavity-detection
algorithms - Search must be efficient, as with protein
folding, since exhaustive search is not possible - Scoring function must be selective and efficient
18Drug Docking Example
- Study by Thorntons group (Nature Biotech. 22(8)
(2004) p 1039-1045 - Took 120 enzymes and 125 metabolites from EcoCyc
subset of 29 complexes have crystal structures - Docked all-vs-all with AUTODOCK
19- Energy plots for docking (a) and reverse docking
(b) for subset of 29 with crystal structures
triangles represent crystal complex - Note from (a), enzymes are not that selective
about substrate, nor are substrates that specific
for enzyme in (b)
20Drug Docking Example
- Computed P value ability of substrate or enzyme
to recognize its partner based on energy
distribution - Now with 4 exceptions, the docked pairs show
either enzyme OR substrate OR both are specific
21Transition state
22Transition state
- Most potent inhibitors are not substrate
analogues but rather transition-state analogues - Important to remember when screening compounds
23Interaction Databases
- BIND (Protein-ligand interactions from PDB and
literature, SLRI) - Het-PDB Navi (Protein-ligand interactions from
PDB, Nagahama Inst. Bio-Science) - EcoCyc (metabolic pathways, SRI)
- KEGG (pathway database, Kyoto)
24Blueprints Small Molecule Resources
- BIND-3DSM Division
- 23,584 Filtered Small Molecule Biopolymer
interactions, automatically derived from crystal
structures - Biologically insignificant records removed (i.e.
crystal packing, non-biological ions) - Published Biopolymers. 2001-2002 61(2)111-20
- SMID
- 48886 records matching 4283 small molecules (from
PDB structures) to 2807 protein families (CDD,
SMART, PFAM) - SMID-BLAST
- BLAST calibre tool to attach small molecule
binding annotation (residue-level) to genomic
sequence - SMID-Genomes
- SMID-BLAST vs all completely sequenced genomes
- 9.6 Million high-quality small molecule
interaction annotations mapped to sequences - Database interface to browse/compare/investigate
small molecule specificity across organisms
25 A 3DSM Record
www.bind.ca
26BIND record binding site
27Interaction Example
- Taxol is derived from natural products, and was
discovered to be effective against certain types
of cancer - Interacts with tubulin and
- stabilizes tubules forming
- cell cytoskeleton, preventing
- mitosis and leading to cell death
28 Visualizing Binding Sites
29SMID
- http//smid.blueprint.org/
- Small Molecule Interaction Database
- Matches small molecule binding sites in
structures to protein domains in NCBI's Conserved
Domain Database - 4283 small molecules from PDB
30Creating SMID Records
Start with an MMDB record (PDB record) containing
more than one molecule.
Small Molecule A (smA)
Protein A (ProA)
Small Molecule B (smB)
Find atoms from one molecule in proximity (0.5 Å)
of atoms from another molecule.
321
336
345
357
371
401
62
74
83
ProA
- Interactions Found
- Residues 62, 74 83 interacting with smA.
- Residues 321, 336, 345, 357, 371 401
interacting with smB.
smA
smB
RPS-BLAST
BIND Records
31Creating SMID Records
RPS-BLAST
123
31
44
62
73
86
105
98
RPS-BLAST all sequences found to interact with a
small molecule in order to obtain alignments with
conserved domains (DomA DomB).
DomA
DomB
321
336
345
357
371
401
62
74
83
ProA
Overlay small molecule protein interaction on
aligned conserved domains.
smA
SMID records made
- Interactions Found
- DomA (residues 98, 105 123) interacting with
smA. - DomB (residues 31, 44, 62, 73, 86 interacting
with smB.
smB
123
31
44
62
73
86
105
98
DomA
DomB
smA
smB
32Use Cases for SMID
- Domain Studies
- Binding site analysis
- Domain family binding site conservation
- Small molecule to the domain families that bind
- Structural Genomics
- Domain/ligand/binding site identification
- Some ligands go over domain boundaries
- Easier pattern recognition for interactions
- Quickly identify candidate co-crystalization
ligands
33Taxol ligand conservation in Tubulin/FtsZ domain
family
34SMID-BLAST
- Uses RPS-BLAST (unmodified) with a new scoring
scheme to improve domain family hits using
specific ligand conservation information - Validation - 1652 new unique interactions
deposited into PDB - 1027 (62) of these interactions are predicted
within our selected ligand score cutoff - Of these 262 (25) were top predictions
- This is very good, as the test set is not
comprehensive - we do not have a set of all possible ligands to
each protein crystal structure - we can only use exact small molecule matches (not
similar molecules, e.g. ATP vs ATP-gamma-S) - Specificity able to distinguish closely related
Trp- and Tyr- aminoacyl-tRNA synthetases that hit
the same protein domain families
35Use Cases for SMID-BLAST
- Annotation of Newly Sequenced Genomes
- New enzyme discovery
- Rhodococcus genome
- William Mohn (UBC)
- Metabolic diversity
- PCB degradation
- Drug Docking
- Can help prioritize experiments
- Homology Modelling
- May help in template selection phase
36 SMID-BLAST Results Summary
37Summary
- Understanding and cataloguing biopolymer-small
molecule interactions is critical to the drug
discovery process - Drug docking can help explain toxicity and side
effects, and can be useful in understanding the
forces behind interactions - Transition state analogues make the best
inhibitors - Tools like SMID-BLAST provide a simple, powerful
way to predict what ligands may interact with a
protein, and vice-versa