Data Mining Applied to Chemistry and chemical engineering - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Data Mining Applied to Chemistry and chemical engineering

Description:

Data Mining Applied to Chemistry and chemical engineering Department of Chemistry, College of Sciences, Shanghai University, P. R. China – PowerPoint PPT presentation

Number of Views:286
Avg rating:3.0/5.0
Slides: 39
Provided by: LuWen
Category:

less

Transcript and Presenter's Notes

Title: Data Mining Applied to Chemistry and chemical engineering


1
Data Mining Applied to Chemistry and chemical
engineering
???
  • Department of Chemistry, College of Sciences,
    Shanghai University, P. R. China

2
1 Introduction 1.1 Concept
Data Mining is an analytic process designed to
explore data in search of consistent patterns
and/or systematic relationships
3
between variables, and then to validate the
findings by applying the detected patterns to new
subsets of data.
4
1.2 Main Focuses (1) Materials design
How to find the best conditions of
preparation or the structure-property
relationship of materials, in order to make
experimental design for new materials preparation
or to predict the physico-chemical properties of
unknown materials systems.
5
(2) Molecular design
How to find the structure-active relationship of
molecules, in order to design new compounds with
expected biological activities or predict the
physico-chemical properties of unknown molecules.
6
(3) Industrial optimization
How to acquire the optimized conditions of
processing productions, in order to achieve the
good results of industrial production.
7
2. Methods in MASTER
(1) Optimal map recognition The projection map
with best separability can be selected out
according to the rate of correctness for
classification.
8
Fig.1 OMR Comparison to PCA
(a) Classification diagram by using Optimal Map
Recognition (OMR)
(b) Classification diagram by using Pincipal
Component Analysis (PCA)
9
(2) Hyper-polyhedron (HP)
HP Model can be created in such a way that the
optimal zone can be expressed by a series of
inequalities to describe the boundaries of two
types of samples.
10
Fig.2 Conceptual HP model
11
(3) Optimal projection regression (OPR)
  • The OPR method is a quantitative model with the
    data fusion of regression and Optimal Map
    Recognition (OMR) method. It utilizes the
    information of classification of data set to
    select the most appropriate features for
    regression.

12
Fig.3 Conceptual OPR model
Projection from hyperspace to 2-dimensional space
X1
X2
13
(4) Inverse projection
Fig.4 Projection from 2-dimensional space to
hyperspace
14
(5) Hierachical projection model
Fig.5 Conceptual hierachical projection
15
(6) Support Vector Machine
Support Vector Classification
16
Support Vector Regression
?????
???????
????
?????
????
17
3 Examples of Application
3.1 Applications in Materials Design
(1) Optimization of high temperature
superconductor A nonlinear function based on 5
terms with the PRESS value of 0.128 was obtained.
By using inverse projection and OPR method, the
critical temperature was promoted from 116 K to
121 K.
18
Inverse projection result of high temperature
superconductor
19
(2) Composition design of rare-earth containing
phosphor
By extrapolation we obtained a series of new
compositions located outside of the scope of
German patents. Our experimental work confirmed
that the brightness of these newly designed
phosphor was higher than those the German patents
had declared.
20
Importance of features
21
Classification diagram using Fisher method
22
(3) Optimization of VPTC ceramic semiconductors
By using MASTER, some proposed new composition
and technological condition of VPTC materials
gave much better result the ratio of the
electric resistance at 273K and minimum
resistance was elevated from 20 to 27.3.
23
Partial Least Square (PLS) result of VPTC ceramic
semiconductors
24
(4) Composition design of cathode materials of
Ni/H battery
By using Support Vector Machine (SVM), the
mathematical models with powerful prediction
ability had been built, and new formulations were
predicted and proved by experiments.
25
Cal. vs Exp. values of C400/C0
26
(5) Formation condition for amorphous phase of
ternary fluorides
By using OMR method, the inequalities obtained
were used to predict whether a new ternary
fluoride could form amorphous phase or not. The
results predicted were in agreement with the
experimental ones.
27
OMR result of formation condition for amorphous
phase of ternary fluorides
28
(6) Formation condition of ternary intermetallic
compounds
Using 2400 known phase diagrams as training set,
the regularities of formation condition of
ternary intermetallic compounds were found. A
series of newly discovered ternary intermetallic
compounds were predicted in this way with good
results.
29
OMR result of formation condition of ternary
intermetallic compounds
30
(1) Molecular screening of guanidine compounds
3.2 Applications in Molecular design
The Hyper polyhedron (HP) and Support Vector
Classification (SVC) methods were used for the
computer-aided molecular screening of guanidine
compounds. It was found that the predicted
results of HP and SVC were better than those of
the PCA, KNN and FDV methods etc.
31
(2) Structure-activity relationship of
antagonists
SVC was used to investigate SAR of 26 compounds
of antagonists. The results of leave-one-out
cross-validation proved that the prediction
ability of SVC method was better than those of
the PCA, KNN and FDV methods etc.
32
(3) Molecular screening of triazoles compounds
(1) OMR model was used for the molecular
screening of new triazoles compounds with
probable higher anti-fungicidal activities. (2)
The predicted results of SVC were better than
those of the PCA, KNN and FDV methods etc.
33
(4) Structure-property relationship of azo
dyestuff
Support Vector Regression (SVR) method was
employed to predict the absorption maximum
wavelength of 37 azo dyestuff molecules. The mean
relative error is 4.22 for the training set and
4.52 for the predicted set, respectively.
34
3.3 Applications in industrial optimization
(1) Optimization of nitriding technique for
crankshaft production The problem is that the
surface hardness of crankshaft products in the
Factory of Wuxi Diesel Engine was too low. It was
found that there existed an optimal zone in the
multidimensional feature space. After
optimization, the rate of rejection decreased
from 1.7 to 0.3.
35
(2) Springback prediction in sheet metal forming
MASTER combining with FEA software (ANSYS/LS-DYNA
5.71) was used to predict the springback in
V-type sheet steel forming. The relative error of
springback predicted could be controlled within
10 compared with the experiments.
36
4 Conclusion
  • (1) MASTER software package is a comprehensive
    system consisting of orthogonal design,
    statistical analysis, data visualization, pattern
    recognition, regression analysis, artificial
    neural networks (ANN) and support vector machine
    (SVM) etc.

37
4 Conclusion
  • (2) MASTER could be used to
  • optimize the formula and technological conditions
  • predict the biological activities and
    physico-chemical properties
  • improve the product quality and analyze the fault
    of processing production.

38
Thank you
Write a Comment
User Comments (0)
About PowerShow.com