Title: Efficient Computer ExperimentBased Optimization through Variable Selection
1Efficient Computer Experiment-Based Optimization
throughVariable Selection
2- Design and Analysis of Computer Experiments
- Methods CART, CART/FDR, Inverse FDR
- Applications
- Stochastic dynamic programming (SDP)
- air quality problem (Yang 2004)
- Stochastic programming fleet assignment model
(SP-FAM) (Pilla 2006) - Future work
3- Design and Analysis of Computer Experiments
(DACE) can be used to reduce the computation. - DACE Steps
- An optimization model is formulated as the
computer experiment. - Design of Experiments (DoE) is used to select
sample points as input to the optimization
model. - A Multivariate Adaptive Regression Splines
(MARS) model is fit to these data.
4- Large-scale optimization problems
- Can include thousands variables.
- Can be computationally expensive.
- The number of variables could be greater than the
number of runs i.e. p gt n. - Variable selection methods
- Regression trees (CART)
- Multiple testing procedure based on FDR
- (false discovery rate)
- CART/FDR 2 sample/FDR
- Inverse FDR (InvFDR)
5- Data mining (Berry and Linoff 2000)
- A process of exploratory data analysis.
- To discover meaningful patterns and rules.
- Variable selection problem
- Model a response variable of interest.
- Select important explanatory variables.
6(No Transcript)
7( Possible outcomes from the multiple hypothesis
tests of size m )
False Discovery Rate (FDR Benjamini and
Hochberg, 1995) The expected proportion of
false positives among rejected hypotheses.
V of false rejections R of rejections.
8H1 H2 H3 H4 H5 H6 H996
H997 H998 H999 H1000
P1 P2 P3 P4 P5 P6
P996 P997 P998 P999 P1000
Procedure Controlling FDR?
Among the significant hypotheses, the proportion
of falsely rejected positive to be ? ?
9FDR-based Variable Selection with Grouping
- Assume the response surface is monotonic.
- Divide the data into C2 groups by using median
- (or mean) of y with the following rule
- If y median (or mean), then group 1
- If y lt median (or mean), then group 2.
- Construct two-sample t statistic for each
variable. - Generate p-value from each statistic.
- Conduct an FDR procedure to find the significant
variables.
10Illustration
Conduct an FDR procedure with the FDR level ?
11FDR-based Variable Selection from Regression
Trees
12Illustration
Conduct an FDR procedure with the FDR level ?
13Inverse FDR
14Illustration
15(No Transcript)
16- Two-stage SP problem
- First-stage expected profit objective function
approximation - Crew compatible allocation (CCA)
- Decision variables (CCA) p 1,264.
- Data set of n 141 points.
- Large p and small n
17(No Transcript)
18- New variable selection methods based on
principal component analysis (PCA). - Information gain (IG) could be an option to
perform variable selection. - Theoretical investigation of Inverse FDR .
19- Thank you for your attention!
20(No Transcript)