Title: HANSO and HIFOO Two New MATLAB Codes
1HANSO and HIFOOTwo New MATLAB Codes
- Michael L. Overton
- Courant Institute of Mathematical Sciences
- New York University
- Singapore, Jan 2006
2Two New Codes
- HANSO
Hybrid Algorithm for Nonsmooth
Optimization - Aimed at finding local minimizers of general
nonsmooth, nonconvex optimization problems - Joint with Jim Burke and Adrian Lewis
- HIFOO
H-Infinity Fixed-Order
Optimization - Aimed at solving specific nonsmooth, nonconvex
optimization problems arising in control - Built on HANSO
- Joint with Didier Henrion
3Many optimization objectives are
- Nonconvex
- Nonsmooth
- Continuous
- Differentiable almost everywhere
- With gradients often available at little
additional cost beyond that of computing function - Subdifferentially regular
- Sometimes, non-Lipschitz
4Numerical Optimization of Nonsmooth, Nonconvex
Functions
- Steepest descent jams
- Bundle methods better, but these are mainly
intended for nonsmooth convex functions - We developed a simple method for nonsmooth,
nonconvex minimization based on Gradient Sampling - Intended for continuous functions that are
differentiable almost everywhere, and for which
the gradient can be easily computed when it is
defined - User need only write routine to return function
value and gradient and need not worry about
nondifferentiable cases, e.g., ties for a max
when coding the gradient - Very recently, found BFGS is often very effective
when implemented correctly - and much less
expensive
5BFGS
- Standard quasi-Newton method for optimizing
smooth functions, using gradient differences to
update an inverse Hessian matrix H, defining a
local quadratic model of the objective function - Conventional wisdom jams on nonsmooth functions
- Amazingly, works very well as long as implemented
right (weak Wolfe line search is essential) - It builds an excellent, extremely ill-conditioned
quadratic model - Often, runs until cond(H) is 1016 before breaking
down, taking steplengths of around 1 - Often converges linearly (not superlinearly)
- By contrast, steepest descent and Newtons method
usually jam - Not very reliable, and no convergence theory
6(No Transcript)
7(No Transcript)
8(No Transcript)
9Minimizing a Product of Eigenvalues
- An application due to Anstreicher and Lee
- Min log of product of K largest eigenvalues of A
o X - subject to X positive semidefinite and diag(X)1
- (A, X symmetric) (we usually set KN/2)
- Would be equivalent to SDP if replace product by
sum - Number of variables is n N(N-1)/2 where N is
dimension of X - Following Burer, substitute Y Y for X where Y is
N by r so n is reduced to rN and normalize Y so
its rows have norm 1 thus both constraints
eliminated - Typically objective is not smooth at minimizer,
because the K-th largest eigenvalue is multiple - Results for N10 and rank r 2 through 10
10(No Transcript)
11N10, r6 eigenvalues of optimal A o X
- 3.163533558166873e-001
- 5.406646569314501e-001
- 5.406646569314647e-001
- 5.406646569314678e-001
- 5.406646569314684e-001
- 5.406646569314689e-001
- 5.406646569314706e-001
- 7.149382062194134e-001
- 7.746621702661097e-001
- 1.350303124150612e000
12The U and V Spaces
- The condition number of H, the BFGS approximation
to inverse Hessian, typically reaches 1016 !! - Lets look at eigenvalues of H
- H QDQ ( denotes transpose)
- Search direction is d -Hg -QDQg (where g
gradient) - Next plot shows
- diag(D), sorted into ascending order
- Components of Qg, using same ordering
- Components of DQg, using same ordering
(search direction
expanded in eigenvector basis) - Tiny eigenvalues of H correspond to nonsmooth
V-space - Other eigenvalues of H correspond to smooth
U-space - From matrix theory, dim(V-space) m(m1)/2 1,
where m is multiplicity of Kth eigenvalue of
A o X
13(No Transcript)
14(No Transcript)
15More on the U and V Spaces
- H seems to be an excellent approximation to the
limiting inverse Hessian and evidently gives us
bases for the optimal U and V spaces - Lets look at plots of f along random directions
in the U and V spaces, as defined by eigenvectors
of H, passing through minimizer (for the N10,
r6 example)
16(No Transcript)
17(No Transcript)
18(No Transcript)
19Line Search for BFGS
- Sufficient decrease
- for 0 lt c1 lt 1,
- f (x t d) f (x) c1 t (grad f)(x)d
- Weak Wolfe condition on derivative
- for c1 lt c2 lt 1,
- (grad f)(x t d)d c2 (grad f)(x)d
- Not strong Wolfe condition on derivative
- (grad f)(x t d)d - c2 (grad f)(x)d
- Essential when f is nonsmooth
- No reason to use strong Wolfe even if f is smooth
- Much simpler to implement
- Why ever use strong Wolfe?
20But how to check if final x is locally optimal?
- Extreme ill-conditioning of H is a strong clue,
but is expensive to compute and proves nothing - If x is locally optimal, running a local bundle
method from x establishes nonsmooth stationarity
this involves repeated null-step line searches
and building a bundle of gradients obtained via
the line searches - When line search returns null step, it must also
return gradient at a point lying across the
discontinuity - Then new d arg min d d Î conv G ,
where G is the set of gradients in the bundle - Terminate if d is small
- If d is 0 and line search steps were 0, x is
Clarke stationary more realistically it is
close to Clarke stationary - Ill call this Clarke Quay Stationary since it is
a Quay Idea
21First-order local optimality
- Is this implied by Clarke stationarity of f at x
? - No, for example f(x) -x is Clarke stationary
at 0 - Yes, in sense that f(x d) 0 for all
directions d, when f is regular at x
(subdifferentially regular, Clarke regular) - Most of the functions that we have studied are
regular everywhere, although this is sometimes
hard to prove - Regularity generalizes smoothness and convexity
22Harder Problems
- Eigenvalue product problems are interesting, but
the Lipschitz constant L for f is not large - Harder problems
- Chebyshev Exponential Approximation
- Optimization of Distance to Instability
- Pseudospectral Abscissa Optimization (arbitarily
large L) - Spectral Abscissa Optimization (not even
Lipschitz) - In all cases, BFGS turns out to be substantially
less reliable than gradient sampling
23Gradient Sampling Algorithm Initialize e and x.
Repeat
- Get G, a set of gradients of function f evaluated
at x and at points near x (sampling controlled by
e) - Let d arg min d d Î conv G
- Line search replace x by x t d, with
f(x t d) lt f(x)
(if d is not 0)
until d 0 (or d tol) Then reduce e
and repeat.
24Convergence Theory for Gradient Sampling Method
(SIOPT, 2005)
- Suppose
- f is locally Lipschitz and coercive
- f is continuously differentiable on an open dense
subset of its domain - number of gradients sampled near each iterate is
greater than problem dimension - line search uses an appropriate sufficient
decrease condition - Then, with probability one and for fixed sampling
parameter e, algorithm generates a sequence of
points with a cluster point x that is e-Clarke
stationary - If f has a unique Clarke stationary point x, then
the set of all cluster points generated by the
algorithm converges to x as e is reduced to zero - Kiwiel has already improved on this! (For a
slightly different, as yet untested, version of
the algorithm.)
25(No Transcript)
26(No Transcript)
27(No Transcript)
28HANSO
- User provides routine to compute f and its
gradient at any given x (do not worry about
nonsmooth cases) - BFGS is then run from many randomly generated or
user-provided starting points. Quit if gradient
at best point found is small. - Otherwise, a local bundle method is run from best
point found, attempting to verify local
stationarity. - If this is not successful, gradient sampling is
initiated. - Regardless, return a final x along with a set of
nearby points, the corresponding gradients, and
the d which is the smallest vector in the convex
hull of these gradients - If the nearby points are near enough and d is
small enough, f is Clarke Quay Stationary at x - This is an approximate first-order local
optimality condition if f is regular at x
29Optimization in Control
- Often, not many variables, as in low-order
controller design - Reasons
- engineers like simple controllers
- nonsmooth, nonconvex optimization problems in
even a few variables can be very difficult - Sometimes, more variables are added to the
problem in an attempt to make it more tractable - Example adding a Lyapunov matrix variable P and
imposing stability on A by AP PA 0 - When A depends linearly on the original
parameters, this results in a bilinear matrix
inequality (BMI), which is typically difficult to
solve - Our approach tackle the original problem
- Looking for locally optimal solutions
30H? Norm of a Transfer Function
- Transfer function G(s)C (sI A)-1 B D
- SS representation
- H? norm is ? if A is not stable (stable means all
eigenvalues have negative real part) and
otherwise, sup G(s)2 over all imaginary s - Standard way to compute it is norm(SS,inf) in
Matlab control toolbox - LINORM from SLICOT is much faster
31H? Fixed-Order Controller Design
- Choose matrices a, b, c, d to minimize the H?
norm of the transfer function - The dimension of a is k by k (order, between 0
and ndim(A)) - The dimension of b is k by p (number of system
inputs) - The dimension of c is m (number of system
outputs) by k - The dimension of d is m by p
- Total number of variables is (km)(kp)
32H? Fixed-Order Controller Design
- Choose matrices a, b, c, d to minimize the H?
norm of the transfer function - The dimension of a is k by k (order, between 0
and ndim(A)) - The dimension of b is k by p (number of system
inputs) - The dimension of c is m (number of system
outputs) by k - The dimension of d is m by p
- Total number of variables is (km)(kp)
- The case k0 is static output feedback
33H? Fixed-Order Controller Design
- Choose matrices a, b, c, d to minimize the H?
norm of the transfer function - The dimension of a is k by k (order, between 0
and ndim(A)) - The dimension of b is k by p (number of system
inputs) - The dimension of c is m (number of system
outputs) by k - The dimension of d is m by p
- Total number of variables is (km)(kp)
- The case k0 is static output feedback
- When B1,C1 are empty, all I/O channels are in
performance measure
34HIFOO H? Fixed-Order Optimization
- Aims to find a, b, c, d for which H? norm is
locally optimal - Begins by minimizing the spectral abscissa
max(real(eig(A-block))) until finds a, b, c, d
for which A-block is stable (and therefore H?
norm is finite) - Then locally minimizes H? norm
- Calls HANSO to carry out both optimizations
- Alternative objectives
- Optimize the spectral abscissa instead of
quitting when stable - Optimize the pseudospectral abscissa
- Optimize the distance to instability (complex
stability radius) - The output a,b,c,d can then be input to optimize
for larger order k, which cannot give worse
result - Accepts various input formats
35HIFOO Provides the Gradients
- For H? norm, it combines
- left and right singular vector info at point on
imaginary axis where sup achieved - chain rule
- For spectral abscissa max(real(eig)), it combines
- left and right eigenvector info for eigenvalue
achieving max real part - chain rule
- Do not have to worry about ties
- Function is continuous, but gradient is not
- However, function is differentiable almost
everyhere (and virtually everywhere that it is
evaluated) - Typically, not differentiable at an exact
minimizer
36Benchmark Examples AC Suite
- Made extensive runs for the AC suite of 18
aircraft control problems in Leibfritz COMPLeIB - In many cases we found low-order controllers that
had smaller H? norm than was supposedly possible
according to the standard full-order controller
MATLAB routine HINFSYN! - Sometimes this was even the case when the order
was set to 0 (static output feedback) - After I announced this at the SIAM Control
meeting in July, HINFSYN was extensively debugged
and a new version has been released
37Benchmark Examples Mass-Spring
- A well known simple example with n4
- Optimizing spectral abscissa
- Order 1, we obtain 0, known to be optimal value
- Order 2, we obtain about -0.73, previously
conjectured to be about -0.5
38Benchmarks Belgian Chocolate Challenge
- Simply stated stabilization problem due to
Blondel - From 1994 to 2000, not known if solvable by any
order controller - In 2000, an order 11 controller was discovered
- We found an order 3 controller, and using our
analytical results, proved that it is locally
optimal
39Availability of HANSO and HIFOO
- Version 0.9, documented in a paper submitted to
ROCOND 2006, freely available at - http//www.cs.nyu.edu/overton/software/hifoo/
- Version 0.91, available online but not fully
tested so no public link yet - Version 0.92 will have further enhancements and
we hope to announce it widely in a month or two
40Some Relevant Publications
- A Robust Gradient Sampling Algorithm for
Nonsmooth, Nonconvex Optimization - SIOPT, 2005 (with Burke, Lewis)
- Approximating Subdifferentials by Random Sampling
of Gradients - MOR, 2002 (with Burke, Lewis)
- Stabilization via Nonsmooth, Nonconvex
Optimization - submitted to IEEE Trans-AC (with Burke, Henrion,
Lewis) - HANSO A Hybrid Algorithm for Non-Smooth
Optimization Based on BFGS, Bundle and Gradient
Sampling - in planning stage
- http//www.cs.nyu.edu/faculty/overton/
41 -
- ????
- Gong xi fa cai!
- Gung hei fat choy!
- ????