Model%20Selection%20via%20Bilevel%20Optimization - PowerPoint PPT Presentation

About This Presentation

Title:

Model%20Selection%20via%20Bilevel%20Optimization

Description:

Loss/Regularization. Cross-Validation Strategy. Generalization Error ... Inner level regularization. Discussion. New capacity offers new possibilities: ... – PowerPoint PPT presentation

Number of Views:162

Avg rating:3.0/5.0

Slides: 62

Provided by: kristinp9

Category:

more less

Transcript and Presenter's Notes

Title: Model%20Selection%20via%20Bilevel%20Optimization

1
Model Selection via Bilevel Optimization

Kristin P. Bennett, Jing Hu, Xiaoyun Ji,
Gautam Kunapuli and Jong-Shi Pang
Department of Mathematical Sciences
Rensselaer Polytechnic Institute
Troy, NY

2
Convex Machine Learning

Convex optimization approaches to machine
learning has been major obsession of machine
learning for last ten years.
But are the problems really convex?

3
Outline

The myth of convex machine learning
Bilevel Programming Model Selection
Regression
Classification
Extensions to other machine learning tasks
Discussion

4
Modelers Choices

Data
Function
Loss/Regularization

CONVEX!
Optimization Algorithm
w
5
Many Hidden Choices

Data
Variable Selection
Scaling
Feature Construction
Missing Data
Outlier removal
Function Family
linear, kernel (introduces kernel parameters)
Optimization model
loss function
regularization
Parameters/Constraints

Data
Function
Loss/Regularization
Cross-Validation Strategy
Generalization Error

NONCONVEX
7
How does modeler make choices?

Best training set error
Experience/policy
Estimate of generalization error
Cross-validation
Bounds
Optimize generalization error estimate
Fiddle around.
Grid Search
Gradient methods
Bilevel Programming

8
Splitting Data for T-fold CV
9
CV via Grid Search

For every C, e
For every validation set,
Solve model on corresp. training set,
and
to estimate loss for
Estimate generalization error for C, e
Return best values for
C,e
Make final model using
C,e

10
CV as Continuous Optimization Problem

Bilevel Program for T folds
Prior Approaches Golub et al., 1979,
Generalized Cross-Validation for one
parameter in
Ridge Regression

Outer-level validation problem
T inner-level training problems
11
Benefit More Design Variables
Add feature box constraint
in the inner-level problems.
12
-insensitive Loss Function
13
Inner-level Problem for t-th Fold
14
Optimality (KKT) conditions for fixed
15
Key Transformation

KKT for the inner level training problems are
necessary and sufficient
Replace lower level problems by their KKT
Conditions
Problem becomes an Mathematical Programming
Problem with Equilibrium Constraints (MPEC)

16
Bilevel Problem as MPEC
Replace T inner-level problems with corresponding
optimality conditions
17
MPEC to NLP via Inexact Cross Validation

Relax hard equilibrium constraints
to soft inexact constraints
tol is some user-defined tolerance.

18
Solvers

Strategy Proof of concept using nonlinear
general purpose solvers from NEOS on NLP
FILTER, SNOPT
Sequential Quadratic Programming
Methods
FILTER results almost always better.
Many possible alternatives
Integer Programming
Branch and Bound
Lagrangian Relaxations

19
Computational Experiments DATA

Synthetic
(5,10,15)-D Data with Gaussian and Laplacian
noise and (3,7,10) relevant features.
NLP 3-fold CV
Results 30 to 90 train, 1000 test points, 10
trials
QSAR/Drug Design
4 datasets, 600 dimensions reduced to 25
top principal components. NLP 5-fold CV
Results 40 100 train, rest test, 20
trials

20
Cross-validation Methods Compared

Unconstrained Grid
Try 3 values each for C,e
Constrained Grid
Try 3 values each for C, e, and
0, 1 for each component of
Bilevel/FILTER Nonlinear program solved using
off-the-shelf SQP algorithm via NEOS

21
15-D Data Objective Value
22
15-D Data Computational Time
23
15-D Data TEST MAD
24
QSAR Data Objective Value
25
QSAR Data Computation Time
26
QSAR Data TEST MAD
27
Classification Cross Validation

Given sample data from two classes.
Find classification function that minimizes
out-of-sample estimate of classification error

1
-1
28
Lower level - SVM

Define parallel planes
Minimize points on wrong side
Maximize margin of separation

29
Lower Level Loss Function Hinge Loss
Measures distance of points that violate the
appropriate hyperplane constraints,
30
Lower Level Problem SVC with box
31
Inner-level KKT Conditions
32
Outer-level Loss Functions

Misclassification Minimization Loss (MM)
Loss function used in classical CV
Loss 1, if validation pt misclassified,
0, otherwise(computed using step function,
)
Hinge Loss (HL)
Both inner and outer levels use same loss
function
Loss distance from(computed using max
function, )

33
Hinge Loss is Convex Approx. of Misclassification
Minimization Loss
34
Hinge Loss Bilevel Program (BilevelHL)

Replace max in outer level objective with convex
constraints
Replace inner-level problems with KKT conditions

35
Hinge
Loss MPEC
36
Misclassification Min. Bilevel Program (BilevelMM)
Misclassifications are counted using the step
function, defined component wise for a n-vector as
37
The Step Function

Mangasarian (1994) showed that
and that any solution, , to the LP
satisfies

38
Misclassifications in the Validation Set

Validation point misclassified when the sign of
is negative i.e.,
This can be recast for all validation points
(within the t-th fold) as

39
Misclassification Minimization Bilevel Program
(revisited)
Outer-level average misclassification minimization
Inner-level problems to determine misclassified
validation points
Inner-level training problems
40
Misclassification Minimization MPEC
41
Inexact Cross Validation NLP

Both BilevelHL and BilevelMM MPECs are
transformed to NLP by relaxing equilibrium
constraints (inexact CV)
Solved using FILTER on NEOS
These are compared with classical cross
validation unconstrained and constrained grid.

42
Experiments Data sets

3-fold cross validation for model selection
Average results for 20 train test splits

43
Computational Time
44
Training CV Error
45
Testing Error
46
Number of Variables
47
Progress

Cross Validation is a bilevel problem solvable by
continuous optimization methods
Off-the-shelf NLP algorithm FILTER solved
classification and regression
Bilevel Optimization extendable to many Machine
Learning problems

48
Extending Bilevel Approach to other Machine
Learning Problems

Kernel Classification/Regression
Variable Selection/Scaling
Multi-task Learning
Semi-supervised Learning
Generative methods

49
Semi-supervised Learning

Have labeled data, and
unlabeled data
Treat missing labels, , as design variables in
the outer level
Lower level problems are still convex

50
Semi-supervised Regression
Outer level minimizes error on labeled data to
find optimal parameters and labels
-insensitive loss on labeled data in inner
level
-insensitive loss on unlabeled data in inner
level
Inner level regularization
51
Discussion