Title: Data Mining in Chemistry
1Data Mining in Chemistry
Markus C. Hemmer Computer-Chemie-Centrum,
Universität Erlangen-Nürnberg D-91054 Erlangen,
Germany
2What is Data Mining ?
- Data Mining is
- an analytical process designed to explore large
amounts of data in search for consistent
patterns and systematic relationships. - ...a non-trivial process of identifying valid,
novel, potentially useful, and ultimately
understandable patterns in data (Srikant,
Agrawal, 1996)
3Yearly number of documentsin Chemical Abstracts
Amount of Information in Chemistry
Number of registered substances
800
24
700
20
600
16
500
Thousands
Millions
400
12
300
8
200
4
100
1920
1940
1960
1980
2000
1970
1980
1990
2000
4The Chemical Language
Dichlophenthion
Phosphorothioic acid O-2,4-dichlorophenyl
O,O-diethyl ester
C10H13Cl2O3PS
ClC(C(C1)OP(S)(OCC)OCC)CC(C1)Cl
5Search for Cancerostatic Drugs
similar substrates
protein/substrate complex
6Representation of Properties
biological activity
chemical reactivity
7Non-linear Projection onto a Torus
8Comparison of Steroid Surfaces
3,20-Allopregnandion
3,20-Pregnandion
9Descriptor of a Polycyclic System
10Visualization of Multidimensional Data
11Research and Projects at the CCC
Evaluation of Reactions
Biochemical Pathways
TeleSpec
Synthesis Design
SOL
Drug Design
VS-C
QSAR/QSPR
ChemVis
Structure/Spectrum Correlation
Dissertation online
12Software Development at the CCC
CORINA 3D structure generator PETRA atomic
property calculator ARC descriptor generator
KMAP Kohonen network generator
CACTVS chemical information system EROS reaction
prediction expert system CORA reaction
classification system WODCA synthesis design
expert system
13Data Mining Dienst Chemie (Data Mining Service
Chemistry)
Property Search
Substructure Search
Diversity Search
Similarity Search
Pattern Recognition
Pattern Analysis
14Information Sources
15The Concept of Data Mining Service - Chemistry
16Descriptor Software
17Searching a Substructure
substructure search
18Acknowledgements
Team Coordination Prof. Dr. Johann Gasteiger
Chemical Information Dr. Thomas
Engel Databases Visualization Dr.
Wolf-Dietrich Ihlenfeldt Frank Oellien Expert
Systems Achim Herwig Genetic Algorithms Dr.
Sandra Handschuh Neural Networks Dr. Andreas
Teckentrup Dr. Lothar Terfloth
Spectroscopy Dr. Paul Selzer Thomas
Kostka Structures Properties Thomas
Kleinöder Christof Schwab Structure Coding Dr.
Joao Aires de Sousa Dr. Valentin
Steinhauer Synthesis Planning Dr. Matthias
Pförtner Markus Sitzmann
19Contact Information
Email Johann.Gasteiger_at_ccc.chemie.uni-erlangen.de
Markus.Hemmer_at_chemie.uni-erlangen.de
WWW http//www2.chemie.uni-erlangen.de