Title: Adventures in Computational Enzymology
1Adventures in Computational Enzymology
- John Mitchell
- University of St Andrews
2The MACiE Database
Mechanism, Annotation and Classification in
Enzymes. http//www.ebi.ac.uk/thornton-srv/databas
es/MACiE/
Gemma Holliday, Daniel Almonacid, Noel OBoyle,
Janet Thornton, Peter Murray-Rust, Gail
Bartlett, James Torrance, John Mitchell
G.L. Holliday et al., Nucl. Acids Res., 35,
D515-D520 (2007)
3Enzyme Nomenclature and Classification
4The EC Classification
- Deals with overall reaction, not mechanism
- Reaction direction arbitrary
- Cofactors and active site residues ignored
- Doesnt deal with structural and sequence
information - However, it was never intended to do so
5A New Representation of Enzyme Reactions?
- Should be complementary to, but distinct from,
the EC system - Should take into account
- Reaction Mechanism
- Structure
- Sequence
- Active Site residues
- Cofactors
- Need a database of enzyme mechanisms
6MACiE Database
Mechanism, Annotation and Classification in
Enzymes. http//www.ebi.ac.uk/thornton-srv/databas
es/MACiE/
7(No Transcript)
8(No Transcript)
9Global Usage of MACiE
10MACiE Entries
11MACiE Mechanisms are Sourced from the Literature
12Coverage of MACiE
Representative based on a non-homologous
dataset, and chosen to represent each available
EC sub-subclass.
13EC is not Everything
- Different mechanisms can occur with exactly the
same EC number. - MACiE has six beta-lactamases, all with different
mechanisms but the same overall reaction.
14(No Transcript)
15EC Coverage of MACiE
Structures exist for 6 EC 1.-.-.- 61 EC
1.2.-.- 204 EC 1.2.3.- 1776 EC 1.2.3.4
MACiE covers 6 EC 1.-.-.- 57 EC 1.2.-.-
183 EC 1.2.3.- 321 EC 1.2.3.4
Representative based on a non-homologous
dataset, and chosen to represent each available
EC sub-subclass.
16EC Coverage of MACiE
17Repertoire of Enzyme Catalysis
G.L. Holliday et al., J. Molec. Biol., 372,
1261-1277 (2007) G.L. Holliday et al., J. Molec.
Biol., 390, 560-577 (2009)
18(No Transcript)
19(No Transcript)
20Repertoire of Enzyme Catalysis
Enzyme chemistry is largely nucleophilic
21Repertoire of Enzyme Catalysis
Enzyme chemistry is largely nucleophilic
22Repertoire of Enzyme Catalysis
23Repertoire of Enzyme Catalysis
24Repertoire of Enzyme Catalysis
25Repertoire of Enzyme Catalysis
26Repertoire of Enzyme Catalysis
27Repertoire of Enzyme Catalysis
We do see a few steps corresponding to well-known
organic reactions but these are the exception.
28Residue Catalytic Propensities
29Residue Catalytic Functions
30Phospholipidosis
Lowe et al., Molec. Pharmaceutics, 7, 1708 (2010)
- An adverse effect caused by drugs
- Excess accumulation of phospholipids
- Often by cationic amphiphilic drugs
- Affects many cell types
- Causes delay in the drug development process
31Phospholipidosis
Lowe et al., Molec. Pharmaceutics, 7, 1708 (2010)
- Causes delay in the drug development process
- May or may not be related to human pathologies
such as Niemann-Pick disease
32Electron micrographs of alveolar macrophages (A
and B) and peritoneal macrophages (C and D)
obtained from 3-month-old Lpla2/ and Lpla2-/-
mice
Hiraoka, M. et al. 2006. Mol. Cell. Biol.
26(16)6139-6148
33Tomizawa et al.,
34Literature Mined Dataset
- Produced our own dataset of 185 compounds (from
literature survey) - 102 PPL and 83PPL-
- Each compound is an experimentally confirmed
positive or negative
R. Lowe, R.C. Glen, J.B.O. Mitchell Mol. Pharm.
2010 VOL. 7, NO. 5, 17081714
35Some PPL molecules, from Reasor et al., Exp Biol
Med, 226, 825 (2001)
3610001101010011001101
10110101000011101101
10111101010001001100
10000001110011100111
10100101011101001110
10011111110001001010
Represent molecules using descriptors (we used
E-Dragon Circular Fingerprints)
37Experimental Design
Split data into N folds, then train on (N-2) of
them, keeping one for parameter optimisation and
one for unseen testing. Average results over all
runs (each molecule is predicted once per N-fold
validation). We also repeat the whole process
several times with randomly different assignments
of which molecules are in which folds.
38Models are built using machine learning
techniques such as Random Forest
39 or Support Vector Machine
40(No Transcript)
41Results
Average MCC Values RF SVM 0.619 0.650
42So we have built a good predictive model that can
learn the features that predispose a molecule to
being PPL, and can make predictions from
chemical structure. This is useful one could
add it to a virtual screening protocol. But can
we understand anything new about how
phospholipidosis occurs?
43Read up on gene expression studies related to
phospholipidosis
44Sawada et al. listed genes which they found to be
up- or down- regulated in phospholipidosis
45As with all gene expression experiments, some of
these will be highly relevant, others will be
noise. Can we help interpret these data?
46Mechanism?
H. Sawada, K. Takami, S. Asahi Toxicological
Sciences 2005 282-292
47- What expertise do we have available amongst our
team, colleagues collaborators? - Multiple target prediction
- Maths
- Programming
Florian Nigsch
Hamse Mussa
Rob Lowe
48- Multiple target prediction
- Predicting off-target interactions of drugs. Not
with the primary pharmaceutical target, but with
other targets relevant to side effects.
49CHEMBL
Data mining and filtering
Filtered CHEMBL, 241145 compounds 1923 targets
Random 991 split of the whole dataset, 10 repeats
10 models
Phospholipidosis dataset 100 PPL, 82 PPL-
compounds
Predicted target associations
Target PS? scores
50ChEMBL Mining
- Mined the ChEMBL (03) database for compounds and
targets they interact with - Target description included the word "enzyme",
"cytosolic", "receptor", "agonist" or "ion
channel" - A high cut-off (weak binding) was used on
Ki/Kd/IC50 values (lt 500µM) to define activity
51Method
- Number of Compounds 241145
- Number of Targets 1923
- Split the data into 10 different partitions of
training and validation - Used circular fingerprints with SYBYL atom types
to define similarities between molecules
52Multi-class Classification
- Algorithms
- Parzen-Rosenblatt window
- Naive Bayes
53Parzen-Rosenblatt window
- Rank likely targets using estimates of
class-condition probabilities
using a Gaussian kernel K(xi, xj)
(xi - xj)T(xi - xj) corresponds to the number of
features in which xi and xj disagree
54Partition No. PRW Rank NB Rank
1 17.049 74.104
2 16.343 76.251
3 18.424 79.078
4 16.212 73.539
5 17.339 73.535
6 18.630 77.244
7 20.694 78.560
8 18.870 74.464
9 16.584 76.235
10 18.200 78.077
Average 17.835 76.109
When we test the two methods, PRW ranks known
targets better than Naïve Bayes does. Hence we
use PRW for our study.
55Assemble List of Targets Relevant to Sawadas
Suggested Mechanisms
Mechanisms 1. Inhibition of lysosomal
phospholipase activity 2. Inhibition of
lysosomal enzyme transport 3. Enhanced
phospholipid biosynthesis 4. Enhanced
cholesterol biosynthesis.
56Assemble List of Targets Relevant to Sawadas
Suggested Mechanisms
Inhibition of lysosomal phospholipase activity
Enhanced phospholipid biosynthesis
Enhanced cholesterol biosynthesis
57Assigning Scores to Targets
- Use these 10 models of target interactions
- Predict targets for phospholipidosis dataset
- Score targets according to the likelihood of
involvement in phospholipidosis - Use the top 100 predicted targets per compound as
we seek off-target interactions
58- Score measures tendency of target to interact
with PPL rather than PPL- compounds.
59(No Transcript)
60M1 M5 are involved in phospholipase C
regulation may be relevant but not in Sawadas
list.
61(No Transcript)
6262
We consider a PS? score significant if the target
is predicted to interact with at least 50 more
PPL compounds than PPL- compounds.
63Our Scores for 8 of Sawadas PPL-Relevant Targets
Mechanism Target Rank PS?
1 Sphingomyelin phosphodiesterase (SMPD) (h) 225 55
Lysosomal Phospholipase A1 (LYPLA1) (r) 163 90
Phospholipase A2 (PLA2) (h) 152 97
3 Elongation of very long chain fatty acids protein 6 (ELOVL6) (h) 1203 -10
Acyl-CoA desaturase (SCD) (m) 610 0
4 3-hydroxy-3-methylglutaryl-coenzyme A reductase (HMGCR) (h) 456 10
Squalene monooxygenase (SQLE) (h) 437 14
Lanosterol synthase (LSS) (h) 114 134
Inhibition of lysosomal phospholipase activity
Enhanced phospholipid biosynthesis
Enhanced cholesterol biosynthesis
64Our Scores for Sawadas PPL-Relevant Targets
Mechanism Target Rank PS?
1 Sphingomyelin phosphodiesterase (SMPD) (h) 225 55
Lysosomal Phospholipase A1 (LYPLA1) (r) 163 90
Phospholipase A2 (PLA2) (h) 152 97
3 Elongation of very long chain fatty acids protein 6 (ELOVL6) (h) 1203 -10
Acyl-CoA desaturase (SCD) (m) 610 0
4 3-hydroxy-3-methylglutaryl-coenzyme A reductase (HMGCR) (h) 456 10
Squalene monooxygenase (SQLE) (h) 437 14
Lanosterol synthase (LSS) (h) 114 134
Inhibition of lysosomal phospholipase activity
Enhanced phospholipid biosynthesis
Enhanced cholesterol biosynthesis
65Other Mechanisms
- The mechanisms and targets suggested here are
insufficient to explain all the PPL compounds in
our data set. - We expect that other targets and possibly
mechanisms are important. - Our method cant test direct compound
phospholipid binding.
66(No Transcript)
6767
68ACKNOWLEDGEMENTS
Dr Gemma Holliday Dr Rob Lowe Dr Daniel
Almonacid Prof. Janet Thornton Dr Florian
Nigsch Dr Hamse Mussa Prof. Bobby Glen Dr Andreas
Bender Alexios Koutsoukas
69ACKNOWLEDGEMENTS