Title: Use of Chemical Information in Organic Synthesis
1Use of Chemical Information in Organic Synthesis
Reaction Information for the Practicing Synthetic
Chemist The Search for Relevant
Answers
Guenter Grethe May, 2006
2Use of Chemical Information in Organic Synthesis
Information Needs of Synthetic Organic Chemists
in Basic Research and Development
- new preparation of intermediates and starting
materials - well established, high yield preparations
(experimental procedures) - new synthetic methodologies (new reagents,
catalysts etc.) - information on starting materials (availability,
price, physical data etc.) - physical properties of reagents, solvents and
catalysts - access to the primary, secondary, and tertiary
literature - spectral information of related compounds
General searching for information on molecules
precedes retrieval of
synthetic methodology data
3Use of Chemical Information in Organic Synthesis
Differences in Molecule vs. Reaction Searching
- Query Is this particular molecule or similar
ones known? Specific data? - Answer Yes or No from existing databases,
including patents - Query How to selectively reduce the nitrile
group (transformation?) - Answer Pointers to relevant examples in the
literature - Criteria
- Efficient transformation
- Functional group compatibility
- Reactions conditions
Molecules
Reaction Conditions?
Reactions
4Use of Chemical Information in Organic Synthesis
Available Reaction Databases
- online
- CASREACT (CAS) (ca. 10.5 Mio, including Spresi
database, 1985 - present ) - Spresi (InfoChem) (ca. 4.5 Mio, 1974 2004)
- CrossFireplusReactions (Elsevier MDL, STN) (ca.
10 Mio, 1779 - present) - ChemInform RX on STN (FIZ Chemie) (ca. 0.8 Mio)
- CCR (Thomson Scientific) (ca. 0.6 Mio)
- inhouse
- ChemInform Reaction Library (Elsevier MDL)
- Spresi (InfoChem)
- CrossFire Beilstein (Elsevier MDL)
- Specialty Databases (several vendors)
- Proprietary Databases
For a good review see Zass, E. "Reaction
Databases", In Encyclopedia of Computational
Chemistry, Schleyer, P. von R. Allinger, N.L.
Clark, T. Gasteiger, J. Kollman, P.A.
Schaefer, H.F. Shreiner, P.R. (Eds.). Wiley,
Chichester, 4, 2402-2420. QD39.3.E46 E53 1998
5Use of Chemical Information in Organic Synthesis
Use of Available Information in Synthesis
- Preparation of a distinct compound requires
- access to information about new synthetic
methodologies in journals and databases - experimental details for the preparation of known
intermediates and starting materials from
databases, journals and other sources - tools to plan syntheses and select optimal
reaction conditions - Preparation of a library of diverse compounds
requires - all of the above
- knowledge about the characteristics of functional
groups - information about available building blocks
- Process development requirements are defined by
- access to information about various reaction
conditions of a reaction - knowledge about the characteristics of molecules
or their fragments under required reaction
condition - tools to calculate the behavior of reagents,
solvents, and catalysts
6Use of Chemical Information in Organic Synthesis
Barriers Impeding the Use of Available
Information by Endusers
- multiple access systems
- different user interfaces
- different modi operandi
- difficult query formulation
- substructure concept
- keyword inconsistencies
- limited post-search management of large hitlists
- some integrated access to other information
sources
Most importantly failure of available systems to
recognize and to facilitate the integration of
the vast knowledge of synthetic chemists
7Use of Chemical Information in Organic Synthesis
Search Modes
- Structure-Based Searches
- Full structure
- Only for reactions with known molecules (not very
useful) - Reaction substructure (RSS)
- Most frequently used mode (difficult for
end-users to formulate effective query) - Reaction similarity
- Various methodologies using different parameters
(results often vary greatly, good for browsing
and idea generation) - Reaction classification
- Several methodologies, mostly based on structural
information about reaction centers and immediate
environment (good indexing tool, improvement
over reaction similarity) - Reagents, Solvents
- Full structure and substructure searches for
molecules (not available in all databases, used
mostly in conjunction with other structural
searches) - Data-Based Searches
- Keywords
- intellectually derived terms for name reactions,
reaction types etc. (incomplete, not very useful)
- Journal, author, title, yields, etc.
- Text or numeric data searches (mostly used in
conjunction with structural searches)
8Use of Chemical Information in Organic Synthesis
Problems with Reaction Searching
Synthetic Problem
Full Structure Search
No hits
Reaction Substructure Search (colored fragment)
119 hits
Class Code Search
672 hits (broad, reaction center only)
2972 hits
Keyword Search Michael Addition
Results were obtained from Elsevier MDLs
combined reaction databases (ca. 1 Mio
reactions) 2006
9Use of Chemical Information in Organic Synthesis
Problems with Substructure Searching
DATABASE SIZE ca. 1 million reactions
Narrowly Defined Query
0 Hits
Problems - how to avoid excessively large
hitlist - how to formulate reasonable search
queries
- Solutions
- combination of several queries (expert approach)
- indexing of reactions (focusing on relevant
reactions) - - facilitating query building (non-expert
approach, intuitive)
10Use of Chemical Information in Organic Synthesis
Goal for an Efficient Reaction Data Management
System
Create an environment that allows for combining
the intelligence and creativity of synthetic
chemists with the processing and simulating power
of computers and the wealth of information in
databases to meet the challenges in the
laboratory for developing efficient syntheses.
11Use of Chemical Information in Organic Synthesis
Requirements to Facilitate Enduser Searching
- User interfaces based on users tasks and
capabilities - (e.g. CrossFire Web, DiscoveryGate, Reaction
Browser, Scifinder) - (see A Framework for the Evaluation of Chemical
Structure Databases, Cooke,F Schofield, H.
J. Chem. Inf. Comput. Sci. 2001, 41, 1131-1140) - Hierarchical thesauri for keywords and reaction
types - Effective indexing of databases (e.g.
classification) - Simplification of the querying process
(natural, not rule dependent) - Efficient post-search management tools
(e.g.clustering) - Seamless integration of various information
sources - (web environment, point-and-click)
- Most importantly available tools must
simulate the chemists problem solving process
12Use of Chemical Information in Organic Synthesis
Databases in DiscoveryGate
13Use of Chemical Information in Organic Synthesis
Reaction Classification as Indexing Tool
Reaction Classification as Indexing Tool
Do We Still Need a Classification of Organic
Reactions?
- Reasons
- alternate method for indexing databases -
complement to structure-based retrieval systems - access to generic types of information in
retrieval systems - post-search management of large hitlists
- simplification of query generation
- linking of reaction information from different
sources - source for deriving knowledge bases for reaction
prediction and synthesis design - automatic procedures for analyses and
correlations, e.g. quality control and overlap
studies
14Use of Chemical Information in Organic Synthesis
Reaction Classification as Indexing Tool
- Examples of some recent work
-
- Horace An Automatic System for the Hierarchical
Classification of Chemical Reactions. - Rose, J.R., Gasteiger, J. J. Chem. Inf. Comput.
Sci. 1994, 34, 74 - COGNOS A Beilstein-Type System for Organizing
Organic Reactions. - Hendrickson, J.B., Sander, T. J. Chem. Inf.
Comput. Sci. 1995, 35, 251 - Knowledge Discovery in Reaction Databases
Landscaping Organic Reactions by a
Self-Organizing Neural Network. - Chen, L., Gasteiger, J. J. Am. Chem. Soc. 1997,
119, 4033 - Classification of Organic Reactions Similarity
of Reactions Based on Changes in the Electronic
Features of Oxygen Atoms at the Reaction Sites. - Satoh, H., Sacher, O., Nakata, T., Chen, L.,
Gasteiger, J., Funatsu, K. J. Chem. Inf. Comput.
Sci. 1998, 38, 210 - Topology-Based Reaction Classification An
Important Tool for the Efficient Management of
Reaction Information. - Kraut, H., Löw, P., Matuszczyk, H., Saller, H.,
Grethe, G. Proceed. 5th Internat. Conf. Chem.
Struct., Noordwijkerhout, The Netherlands 1999,
26 - Analysis of Reaction Information.
- Grethe, G. In Handbook of Chemoinformatics
Gasteiger, J. (Ed.) Wiley-VCH, Volume 4, 1407
1427, Weinheim, 2003
15Use of Chemical Information in Organic Synthesis
Reaction Indexing through Classification
Based on
Keywords Michael addition, Michael reaction,
ring closure. Molecule Type N-heterocycle,
isoquinoline, quinolizidine.. Reaction Type
reaction centers
16Use of Chemical Information in Organic Synthesis
Reaction Classification - Background
- Classify v.2. 5, developed by InfoChem, Munich
- Based on InfoChems reaction center perception
algorithm -
- A bond is defined as a reaction center if it is
made or broken - An atom is defined as a reaction center if it
changes - number of implicit hydrogens
- number of valencies
- number of ?-electrons
- atomic charge
- the connecting bond is a reaction center
Rules and Definitions
17Use of Chemical Information in Organic Synthesis
Reaction Classification - Background
Rules and Definitions
- Hashcodes are calculated for all reaction centers
taking into account atom properties - atom type
- valence state
- total number of bonded hydrogens (implicit plus
explicitly drawn) - number of ?-electrons
- aromaticity
- formal charges
- reaction center information
- The sum of all reaction center hashcodes of all
reactants and one product of a reaction provides
the unique reaction classification code - ClassCode
18Use of Chemical Information in Organic Synthesis
Reaction Classification - Background
Rules and Definitions
- Inclusion of atoms in the immediate environment
(spheres) - reaction centers only (0-sphere BROAD)
- reaction centers ?-atoms (1-sphere MEDIUM)
- reaction centers ?-atoms (2-sphere NARROW)
- inclusion of one sp3-atoms during sphere
expansion - Atom equivalency
- atoms in the same group of the periodic table,
with the exception of row-2 elements, are
considered equivalent - Multiple occurrences of identical transformations
are handled as one
19Use of Chemical Information in Organic Synthesis
Reaction Classification - Background
Rules and Definitions
20Use of Chemical Information in Organic Synthesis
Reaction Classification Clustering of Search
Results
- Classification codes are data
- stored in the database
- usable for sorting (clustering)
Result 156 hits
Clustered by Classification Code MEDIUM)
RSS-Search Query (in red)
72 clusters
1.Cluster (20 rxns)
2.Cluster (15 rxns)
3.Cluster (13 rxns)
4.Cluster (8 rxns)
21Use of Chemical Information in Organic Synthesis
Classification by Reaction Names
- Chemists are familiar with Name Reactions
(Diels-Alder, Michael etc.) - Papers in a one issue of JOC (22, 2004) mentioned
20 name reactions, known and lesser known, some
multiple times - e.g.,Mitsunobu reaction, Nazarov reaction, Wolff
rearrangement etc. - Several books dealing exclusively with Name
Reactions (ca.700 reactions) - Use of Name Reactions facilitates reaction
retrieval - Complementary to other searches
- Used in combination with other data
- Easier alternative to formulating complex RSS
queries - Excellent browsing tool
- Overview of scope and limitations of a given
reaction, e.g. Aldol reaction - Combining different reaction types leading to
same compound class - Hantzsch pyridine synthesis from dihydropyridines
or ß-keto esters - Fischer Indole synthesis from hydrazines or
hydrazones - Darzens reaction of epoxides from esters, amides,
sulfones, or nitriles
References
Named Organic Reactions, Laue, T. and Plagens,
A., Eds., John Wiley Sons, 1st Edition 1999, 2nd
Edition 2005 Organic Syntheses Based on Name
Reactions, Hassner, A. and Stumer,C., Eds.,
Elsevier Science,1st Edition 1994 2nd Edition
2002 Name Reactions, Li, J. J., Ed., Springer,
2002 Strategic Applications of Named Reactions,
Kürti, L. and Czakó, B., Eds., Elsevier,
2005 Name Reactions and Reagents in Organic
Synthesis, Mundy, B.P Ellerd, M.G. and Favaloro,
F.G., Jr. Wiley Interscience 2005
Note The work on classification by reaction
names is being developed at InfoChem (Munich) in
consultation with G.Grethe
22Use of Chemical Information in Organic Synthesis
Classification by Reaction Names - Requirements
- Established electronically not intellectually
- NOW Intellectually derived
- Inclusion of intellectually derived keywords
greatly varies from database to database and
depend on abstractors and are either too
inclusive or not comprehensive - Example Michael addition 184 hits (keywords)
vs. 89 hits (RSS search) 52
hits (reaction name keywords) - FUTURE Electronically derived
- Assignments based on single or multiple RSS
searches - Boolean logic is applied to combine and/or
subtract search results (queries) - Assignments are pre-processed and added as data
to database(s) - Name reactions are aligned in hierarchical order
- Based on main reaction categories (addition,
substitution, rearrangements, eliminations,
oxidations, reductions) - Reactions can be listed in multiple categories,
e.g. - Baeyer-Villiger oxidation in Oxidation and
Rearrangement - Hierarchy must be able to accommodate non-name
reactions (future project) - Reactions containing n reactions (e.g., tandem
reactions) are listed in n categories - Individual name reactions have to be recognizable
- Otherwise, stored under Miscellaneous
- Queries and corresponding names are stored in
spreadsheet
Use of Chemical Information in Organic Synthesis
23Use of Chemical Information in Organic Synthesis
Classification by Reaction Names - Hierarchy
Main categories
First Level
Second Level
Third Level
1,2-Addition
Darzens condensation
Sulfones
Addition
1,4-Addition
Michael reaction
Intermolecular
Cycloaddition
Diels-Alder reaction
42 Cycloadditions
Aromatic electrophilic
Friedel-Crafts acylation
Intramolecular
Substitution
Aliphatic Nucleophilic
Schotten-Baumann reaction
Free radical
Gomberg-Bachmann reaction
Intermolecular
Nucleophilic
Hofmann rearrangement
Alkyl
Rearrangements
Sigmatropic
3,3 Sigmatropic rearrangement
Claisen rearrangement
Radical
Cope reaction
Elimination
Chugaev reaction
Reductions
Cannizaro reaction
Intermolecular
Oxidations
Baeyer-Villiger oxidation
Lactones
Heterocyclic Synthesis
Hantzsch pyridine synthesis
Modified
Miscellaneous
Alper reaction
Cyclocarbonylation
24Use of Chemical Information in Organic Synthesis
Classification by Reaction Names Keyword
Generation
Example Intermolecular Mannich reaction with
CH-acidic compounds
Procedure - generate query for general search
- check hitlist for non-relevant hits -
formulate queries to eliminate negatives -
combine queries using Boolean operators
Mannich reaction
Query Q1
Elimination of negative hits
Biginelli reaction
Query Q2
Aza Diels-Alder reaction
Query Q3
Query set for intermolecular Mannich reaction
with CH-acidic compounds Q1 (Q2Q3)
25Use of Chemical Information in Organic Synthesis
Classification by Reaction Names
Example of query menu (partial view) from
InfoChems SpresiWeb
26Use of Chemical Information in Organic Synthesis
The design of organic syntheses by chemists
without the help of computers proceeds in
anything but a systematic stepwise manner from
the target molecule to available starting
materials. A systematic stepwise approach is
more the exception than the rule. The human
mind solves problems by lateral thinking,
jumping from one idea to the next, from one
question to a different one, from retrosynthetic
thinking to considering the course and outcome of
a reaction ,etc. Gasteiger, J. Ihlenfeldt,
W.D. Roese, P. Recl.Trav.Chim.Pays-Bas 1992,
111, 270.
The paradigm in an ideal electronic world
Journals
Major Reference Works
Books
Databases
E-Labjournal
Knowledge, Intuition, and Experience of
Synthetic Chemist
27Use of Chemical Information in Organic Synthesis
Integrated Major Reference Works (iMRW)
(Reaction Databases, DiscoveryGate )
(Elsevier MDL, Third Party, Proprietary
etc.)
present status
ClassCodes
LinkFinderPlus (citations)
LinkFinderPlus (citations)
Tertiary Sources
Primary Journals
Major Reference Works (MRWs)
iMRW links
Future links
28Use of Chemical Information in Organic Synthesis
Integrated Major Reference Works - Concept
- Simulating chemists approach of gathering
information from various sources (lateral
approach) for solving synthetic problems through
a simple point-and-click mechanism - Assisting chemists with the synthesis of new
compounds by providing complementary information - With examples for synthetic methodologies from
reaction databases - From summaries, critically evaluated by experts,
describing - reaction mechanisms
- principles of stereo-controlled reactions
- applications, preparations, and properties of
reagents - and other information generally not found in
reaction databases - Through one-click linking to the primary
literature when combined with LinkFinderPlus
29Use of Chemical Information in Organic Synthesis
Integrated Major Reference Works - Summary
iMRW.
- is a unique collaboration between Elsevier MDL,
InfoChem and leading scientific - publishers (Elsevier Science, Georg Thieme
Verlag, and Springer-Verlag) - provides one-click, bi-directional linking based
on reaction type between synthetic methodology
databases and electronic versions of major
reference works (MRWs) or between individual
MRWs, i.e.a true integration of information - allows text and (sub)structure searching over
multiple major reference works from a single
user interface
30 Use of Chemical Information in Organic Synthesis
Major Reference Works in iMRW
- Detailed information about methodologies based
on reaction type - Information about scope and limitations of
reactions - Evaluated experimental procedures
- Information about reaction mechanism,
stereo-control, effect of substituents and
ligands, and other factors influencing a
reaction - Information about reagents and catalysts, their
preparation and properties - Updates for each of them are planned or under
consideration by the publishers and will be added
when available
31Use of Chemical Information in Organic Synthesis
Comprehensive Asymmetric Catalysis (CAC) - Summary
Editors Eric N. Jacobsen, Andreas Pfaltz,
Hisashi Yamamoto
(1999)
CAC is an innovative reference work that reviews
in three volumes catalytic methods for asymmetric
organic synthesis, a major challenge in synthetic
chemistry today. Illustrated by over 6,000
reactions critically evaluated by 60 leading
experts in the field, the basic principles,
mechanisms, basis for stereoinduction, and scope
and limitations of asymmetric reactions are
covered in-depth.
32 Use of Chemical Information in Organic Synthesis
Comprehensive Organic Functional Group
Transformations (COFGT) Summary
Editors-in-Chief Alan R. Katritzky, Otto
Meth-Kohn, Charles W. Rees
(1995)
COFGT covers in 40,000 reactions and seven
volumes the vast subject of organic synthesis in
terms of the introduction and interconversion of
functional groups. The editors have adopted a
rather rigorous, logical and formal treatment on
the basis of structure, which enables a detailed
analysis of all known, and indeed of some as yet
unknown, functional groups. Therefore, the
treatise deals rationally and comprehensively
with the method of their construction.
33Use of Chemical Information in Organic Synthesis
Science of Synthesis - Summary
Houben-Weyl Methods of Molecular Transformations
Editorial Board D. Bellus, S. V. Ley, R.
Noyori, M. Regitz P. J. Reider, E. Schaumann, I.
Shinkai, E. J. Thomas, B. M. Trost
2001
- Science of Synthesis is the authoritative and
comprehensive reference work for the entire field
of organic and organometallic synthesis. The
series of 48 volumes will be published over a
period of 8 years, it will present 15,000
selected synthetic methods for all classes of
compounds illustrated by 150,000 reactions, and
it includes - Methods critically evaluated by leading
scientists - Background information and detailed
experimental procedures - Schemes and tables which illustrate the
reaction scope
34Use of Chemical Information in Organic Synthesis
Collecting Information for the Synthesis of a new
Compound
Target molecule
Muray, E. Rifé, J. Branchadell,
V. Ortuno, R.M. J. Org. Chem. 2002, 67, 4520
4525 (The paper describes the syntheses of
cyclopropyl nucleosides as potential antiviral
and antitumor agents)
35Use of Chemical Information in Organic Synthesis
Synthesis Plan
Retrosynthetic Analysis N1-alkylation of adenine
1.Step general information about the alkylation
reaction 2.Step information about the
preparation of A, including stereochemistry 3.Step
information about scope and limitations, effect
of substituents, applicable reagents etc.
36Use of Chemical Information in Organic Synthesis
Reaction Substructure Data Search in
DiscoveryGate
37Use of Chemical Information in Organic Synthesis
38Use of Chemical Information in Organic Synthesis
39Use of Chemical Information in Organic Synthesis
Search for Similar Reactions in iMRW
40Use of Chemical Information in Organic Synthesis
Literature Linking
COFGT chapter
41Use of Chemical Information in Organic Synthesis
Text Search in iMRW
42Use of Chemical Information in Organic Synthesis
Information about Enantioselective
Cyclopropanation from CAC
43Use of Chemical Information in Organic Synthesis
Text Search Results from COFGT and Linking to
Literature
44Use of Chemical Information in Organic Synthesis
Integration of iMRW with Reaction Database
45Use of Chemical Information in Organic Synthesis
Conclusion
- DiscoveryGate provides chemists with relevant
information from different sources required for
solving synthetic problems in a single system
allowing for interaction by the user in an
interactive fashion - Access is provided from an intuitive
user-interface by a simple point-and-click
mechanism. - The system very closely simulates the lateral
information gathering process of synthetic
chemists