Title: Optimization and datamining for catalysts library design
1Optimization and datamining for catalysts library
design
2Contents
- Introduction
- Catalysts
- Combinatorial optimization
- Concept of meta modeling
- Results
- Concept validation
- Tuning of algorithm
- Case studies
- Conclusion and Future works
3Catalysts
4Catalysts
- Powder deposit
- Favor and select chemical reactions
- Search for efficient compounds
- Computational tool combinatorial optimization
5Combinatorial optimization
- Populations evolve through iterative process
- Generations converge towards optimum
- Individual virtual catalyst
Y
X1
X2
6Generic optimization process
Random initialization
Population
OPERATORS
Evaluation
False
True
Best individual
7Algorithms
- Taboo search
- Simulated annealing
- Genetic algorithms
- Evolutionary strategies
- OptiCat
- Diversity of data treatments
8Issues in catalysis
- Evaluation requires manual expertise
- Synthesis
- Test
- Cost of a catalyst
- Time
- Money
- Skepticism against black box optimization
?
9Solution Meta modeling
- Optimization and dataminig
- Genetic operators
- Datamining operator
- Data storage
- Supervised learning
OPERATORS
- Cost reduction
- Opening the black box
10Meta modeling
Random initialization
Estimation
Population
Use the statistical model for estimating the
efficiency of virtual catalysts
OPERATORS
Catalysts Choice
Choose promising catalysts among the estimated
ones
Use a statistical model for predicting the
efficiency of virtual catalysts
Evaluate the chosen catalysts and update
statistical model
Evaluation
Choose promising catalysts amon the estimated ones
Evolution Control
Evaluate the chosen catalysts
False
True
Best individual
11Objectives in catalysis
- Reduced number of evaluations
- Population size ? 40
- Number of generations ? 10
- Catalyst complexity gt 1020
- Other means of catalytic evaluations
- Prove the efficiency of combinatorial approach
12Meta modeling validation
- Use virtual surface response
- Validate the approach
- Use purpose-designed data
- Tune algorithmic parameters
- Use real data
- Appropriate datamining algorithm for context
- Obtain results
13Artificial evaluation
14Response Surface and Dimensions
- Three optima
- PtPd support Al2O3
- Cu support CeO2
- Au support TiO2
15Learning algorithm
- Continuous space by parts
- Single performance response
- Simple and efficient
16Algorithm
Random initialization
Linear regression
Population
Elitism
Evaluation
False
True
Best individual
17Algorithm efficiency measure
v4
v3
v2
- Stochastic algorithm multiple runs for
statistical relevance - Need for an efficiency measure
- Performance
- Reliability
v5
v1
18Results
- Compare with classic methods
- Good compromise reliability-performance
19Summary
- Artificial response surface
- Multilinear regression learning
- Compared with classic algorithms
- Other surfaces, experimental noise
- Meta modeling is reliable and performing
- Still valid with purpose designed data?
- How to tune parameters?
20Meta modeling validation
- Use virtual surface response
- Validate the approach
- Use purpose-designed data
- Tune algorithmic parameters
- Use real data
- Appropriate datamining for context
- Obtain results
21Purpose designed data
- 168 catalysts prepared for CO oxidation
- Four variables
- Noble metal Au, Cu, Pt
- Transition metal Mo, Nb, V
- Support CeO2, TiO2, ZrO2
- Reaction temperature 200, 250, 300C
- Performance function
- CO Conversion
22Algorithm Tuning
Random initialization
Linear regression
Population
Selection Crossover Mutation
Elitism
Genetic operators
Evaluation
false
true
Best catalyst
23Reduce algorithm number
4 Population sizes 4 Selection types 4 Crossover
types 2 Elitism modalities
4?4?4?2 128 algorithms
Use Design of Experiments
- Reduce to 16 algorithms
- Quantify the impact of each parameter
24Results
Elitism Yes No Population size 8 16 24 48
Crossover type One point Three points Uniform
20 Uniform 50 Selection type Wheel Threshold
Tournament Rank
Low
- Average Importance
- Points-based crossover preferred
- Important
- Efficiency of tournament selection
Very High
- Modality weight
- Long bar gt important modality
- Elitism not important
- Very important
- High population size gt high efficiency
Average
High
25Summary
- Use data-based surface CO oxidation
- Use DoE for efficient parameter tuning
- Meta modeling tuning realized
- High population size
- Tournament selection
- Which learning algorithm?
26Meta modeling validation
- Use virtual surface response
- Validate the approach
- Use purpose-designed data
- Tune algorithmic parameters
- Use real data
- Appropriate learning algorithm with regards to
context - Obtain results
27Search spaces
QSAR
Descriptor calculation
Elemental composition
Response space
Literature data
Optimization search space
Datamining search space
28Case studies in catalysis
- Heck reaction
- 222 catalysts
- 30 continuous descriptors
- 3 performance values
- Propene Oxidation
- 467 catalysts
- 72 descriptors
- discrete
- continuous
- 1 desirability value
- Partial least squares regression
- Artificial neural network
29Algorithm
Random Initialization
QSAR descriptor calculation
Population
Selection Crossover Mutation
Estimation
Genetic operators
Elitist choice
false
true
Best catalyst
30Results
- Guidelines promising catalyst families
- Propene oxidation
- 14 Gallium
- 16 Niobium
- Support oxyde
- Solvant alcohol
- Meta modeling appropriate for different contexts
31Summary
- Two case studies in catalysis
- Propene oxidation ANN
- Heck reaction PLS
- Drawing guidelines for catalysts design
- Learning algorithm choice depends on context
- Type of variables
- Quantity of observations
- Classic datamining requirements
32Conclusions
- Optimization and Datamining for catalysts library
design - Meta modeling reliable and performing
- Tuning general parameters
- Two case studies
- Fast catalysts optimization (400)
- Find promising guidelines
- OptiCat as multi-purpose
- software tool
33OptiCat Diffusion
- Free (CeCILL license)
- http//eric.univ-lyon2.fr/fclerc
- On line model builder
- Webservice
WSDL
34Future works
- Other learning algorithms
- Association rules
- Diversity control
- Multiple optima
- Real experimentation
- University of Amsterdam
- IRC (TOPCOMBI Program)
- Max Planck Institute
35Acknowledgements
David Farrusseng Ricco Rakotomalala Ferdi
Schuth Gadi Rothenberg Claude Mirodatos Nicolas
Nicoloyannis Gilles Venturini Djamel
Zediar Silvia Pereira Enrico Burello Jos
Hageman Ignacio Lopez Martin Laurent
Baumes Mourad Lengliz Joanna Procelewska Javier
Llamas Galilea Juriaan Beckers Jan Blank
David Farrusseng Ricco Rakotomalala Ferdi
Schuth Gadi Rothenberg Claude Mirodatos Nicolas
Nicoloyannis Gilles Venturini Djamel
Zediar Silvia Pereira Enrico Burello Jos
Hageman Ignacio Lopez Martin Laurent
Baumes Mourad Lengliz Joanna Procelewska Javier
Llamas Galilea Juriaan Beckers Jan Blank
David Farrusseng Ricco Rakotomalala Ferdi
Schuth Gadi Rothenberg Claude Mirodatos Nicolas
Nicoloyannis Gilles Venturini Djamel
Zediar Silvia Pereira Enrico Burello Jos
Hageman Ignacio Lopez Martin Laurent
Baumes Mourad Lengliz Joanna Procelewska Javier
Llamas Galilea Juriaan Beckers Jan Blank
David Farrusseng Ricco Rakotomalala Ferdi
Schuth Gadi Rothenberg Claude Mirodatos Nicolas
Nicoloyannis Gilles Venturini Djamel
Zediar Silvia Pereira Enrico Burello Jos
Hageman Ignacio Lopez Martin Laurent
Baumes Mourad Lengliz Joanna Procelewska Javier
Llamas Galilea Juriaan Beckers Jan Blank
David Farrusseng Ricco Rakotomalala Ferdi
Schuth Gadi Rothenberg Claude Mirodatos Nicolas
Nicoloyannis Gilles Venturini Djamel
Zediar Silvia Pereira Enrico Burello Jos
Hageman Ignacio Lopez Martin Laurent
Baumes Mourad Lengliz Joanna Procelewska Javier
Llamas Galilea Juriaan Beckers Jan Blank
- David Farrusseng
- Ricco Rakotomalala
- Ferdi Schuth
- Gadi Rothenberg
- Claude Mirodatos
- Nicolas Nicoloyannis
- Gilles Venturini
- Djamel Zediar
- Silvia Pereira
- Enrico Burello
- Jos Hageman
- Ignacio Lopez Martin
- Laurent Baumes
- Mourad Lengliz
- Joanna Procelewska
- Javier Llamas Galilea
- Juriaan Beckers
36Créer une population Opérateurs génétiques
- Le croisement mélange les caractéristiques des
individus
- La mutation introduit de nouvelles informations
- La sélection retient les individus les plus
adaptés
- Assurer une diversité contrôlée dans les
populations
37Sélection
30
90
- Deux ou plusieurs individus intègrent un tournoi
- Leur performance est comparée
- Seul le meilleur est admis pour létape suivante
- Il y a autant de tournois consécutifs que
dindividus dans la population - Des individus sont éliminés alors que dautres
sont répétés
38Croisement
- Un point de scission est déterminé dans les
individus
- Croisement multi-points
- Croisement uniforme
- Les portions sont échangées
39Mutation
70
- Introduit des changements aléatoires dans les
valeurs
40OptiCat
41Meta modeling
42Heck Reaction
43Meta modeling