HANSO and HIFOO Two New MATLAB Codes - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

HANSO and HIFOO Two New MATLAB Codes

Description:

We developed a simple method for nonsmooth, nonconvex minimization based on Gradient Sampling ... By contrast, steepest descent and Newton's method usually jam ... – PowerPoint PPT presentation

Number of Views:157
Avg rating:3.0/5.0
Slides: 42
Provided by: michae1046
Category:
Tags: hanso | hifoo | matlab | codes | new | two

less

Transcript and Presenter's Notes

Title: HANSO and HIFOO Two New MATLAB Codes


1
HANSO and HIFOOTwo New MATLAB Codes
  • Michael L. Overton
  • Courant Institute of Mathematical Sciences
  • New York University
  • Singapore, Jan 2006

2
Two New Codes
  • HANSO
    Hybrid Algorithm for Nonsmooth
    Optimization
  • Aimed at finding local minimizers of general
    nonsmooth, nonconvex optimization problems
  • Joint with Jim Burke and Adrian Lewis
  • HIFOO
    H-Infinity Fixed-Order
    Optimization
  • Aimed at solving specific nonsmooth, nonconvex
    optimization problems arising in control
  • Built on HANSO
  • Joint with Didier Henrion

3
Many optimization objectives are
  • Nonconvex
  • Nonsmooth
  • Continuous
  • Differentiable almost everywhere
  • With gradients often available at little
    additional cost beyond that of computing function
  • Subdifferentially regular
  • Sometimes, non-Lipschitz

4
Numerical Optimization of Nonsmooth, Nonconvex
Functions
  • Steepest descent jams
  • Bundle methods better, but these are mainly
    intended for nonsmooth convex functions
  • We developed a simple method for nonsmooth,
    nonconvex minimization based on Gradient Sampling
  • Intended for continuous functions that are
    differentiable almost everywhere, and for which
    the gradient can be easily computed when it is
    defined
  • User need only write routine to return function
    value and gradient and need not worry about
    nondifferentiable cases, e.g., ties for a max
    when coding the gradient
  • Very recently, found BFGS is often very effective
    when implemented correctly - and much less
    expensive

5
BFGS
  • Standard quasi-Newton method for optimizing
    smooth functions, using gradient differences to
    update an inverse Hessian matrix H, defining a
    local quadratic model of the objective function
  • Conventional wisdom jams on nonsmooth functions
  • Amazingly, works very well as long as implemented
    right (weak Wolfe line search is essential)
  • It builds an excellent, extremely ill-conditioned
    quadratic model
  • Often, runs until cond(H) is 1016 before breaking
    down, taking steplengths of around 1
  • Often converges linearly (not superlinearly)
  • By contrast, steepest descent and Newtons method
    usually jam
  • Not very reliable, and no convergence theory

6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
Minimizing a Product of Eigenvalues
  • An application due to Anstreicher and Lee
  • Min log of product of K largest eigenvalues of A
    o X
  • subject to X positive semidefinite and diag(X)1
  • (A, X symmetric) (we usually set KN/2)
  • Would be equivalent to SDP if replace product by
    sum
  • Number of variables is n N(N-1)/2 where N is
    dimension of X
  • Following Burer, substitute Y Y for X where Y is
    N by r so n is reduced to rN and normalize Y so
    its rows have norm 1 thus both constraints
    eliminated
  • Typically objective is not smooth at minimizer,
    because the K-th largest eigenvalue is multiple
  • Results for N10 and rank r 2 through 10

10
(No Transcript)
11
N10, r6 eigenvalues of optimal A o X
  • 3.163533558166873e-001
  • 5.406646569314501e-001
  • 5.406646569314647e-001
  • 5.406646569314678e-001
  • 5.406646569314684e-001
  • 5.406646569314689e-001
  • 5.406646569314706e-001
  • 7.149382062194134e-001
  • 7.746621702661097e-001
  • 1.350303124150612e000

12
The U and V Spaces
  • The condition number of H, the BFGS approximation
    to inverse Hessian, typically reaches 1016 !!
  • Lets look at eigenvalues of H
  • H QDQ ( denotes transpose)
  • Search direction is d -Hg -QDQg (where g
    gradient)
  • Next plot shows
  • diag(D), sorted into ascending order
  • Components of Qg, using same ordering
  • Components of DQg, using same ordering
    (search direction
    expanded in eigenvector basis)
  • Tiny eigenvalues of H correspond to nonsmooth
    V-space
  • Other eigenvalues of H correspond to smooth
    U-space
  • From matrix theory, dim(V-space) m(m1)/2 1,
    where m is multiplicity of Kth eigenvalue of
    A o X

13
(No Transcript)
14
(No Transcript)
15
More on the U and V Spaces
  • H seems to be an excellent approximation to the
    limiting inverse Hessian and evidently gives us
    bases for the optimal U and V spaces
  • Lets look at plots of f along random directions
    in the U and V spaces, as defined by eigenvectors
    of H, passing through minimizer (for the N10,
    r6 example)

16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
Line Search for BFGS
  • Sufficient decrease
  • for 0 lt c1 lt 1,
  • f (x t d) f (x) c1 t (grad f)(x)d
  • Weak Wolfe condition on derivative
  • for c1 lt c2 lt 1,
  • (grad f)(x t d)d c2 (grad f)(x)d
  • Not strong Wolfe condition on derivative
  • (grad f)(x t d)d - c2 (grad f)(x)d
  • Essential when f is nonsmooth
  • No reason to use strong Wolfe even if f is smooth
  • Much simpler to implement
  • Why ever use strong Wolfe?

20
But how to check if final x is locally optimal?
  • Extreme ill-conditioning of H is a strong clue,
    but is expensive to compute and proves nothing
  • If x is locally optimal, running a local bundle
    method from x establishes nonsmooth stationarity
    this involves repeated null-step line searches
    and building a bundle of gradients obtained via
    the line searches
  • When line search returns null step, it must also
    return gradient at a point lying across the
    discontinuity
  • Then new d arg min d d Î conv G ,
    where G is the set of gradients in the bundle
  • Terminate if d is small
  • If d is 0 and line search steps were 0, x is
    Clarke stationary more realistically it is
    close to Clarke stationary
  • Ill call this Clarke Quay Stationary since it is
    a Quay Idea

21
First-order local optimality
  • Is this implied by Clarke stationarity of f at x
    ?
  • No, for example f(x) -x is Clarke stationary
    at 0
  • Yes, in sense that f(x d) 0 for all
    directions d, when f is regular at x
    (subdifferentially regular, Clarke regular)
  • Most of the functions that we have studied are
    regular everywhere, although this is sometimes
    hard to prove
  • Regularity generalizes smoothness and convexity

22
Harder Problems
  • Eigenvalue product problems are interesting, but
    the Lipschitz constant L for f is not large
  • Harder problems
  • Chebyshev Exponential Approximation
  • Optimization of Distance to Instability
  • Pseudospectral Abscissa Optimization (arbitarily
    large L)
  • Spectral Abscissa Optimization (not even
    Lipschitz)
  • In all cases, BFGS turns out to be substantially
    less reliable than gradient sampling

23
Gradient Sampling Algorithm Initialize e and x.
Repeat
  • Get G, a set of gradients of function f evaluated
    at x and at points near x (sampling controlled by
    e)
  • Let d arg min d d Î conv G
  • Line search replace x by x t d, with
    f(x t d) lt f(x)
    (if d is not 0)

until d 0 (or d tol) Then reduce e
and repeat.
24
Convergence Theory for Gradient Sampling Method
(SIOPT, 2005)
  • Suppose
  • f is locally Lipschitz and coercive
  • f is continuously differentiable on an open dense
    subset of its domain
  • number of gradients sampled near each iterate is
    greater than problem dimension
  • line search uses an appropriate sufficient
    decrease condition
  • Then, with probability one and for fixed sampling
    parameter e, algorithm generates a sequence of
    points with a cluster point x that is e-Clarke
    stationary
  • If f has a unique Clarke stationary point x, then
    the set of all cluster points generated by the
    algorithm converges to x as e is reduced to zero
  • Kiwiel has already improved on this! (For a
    slightly different, as yet untested, version of
    the algorithm.)

25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
HANSO
  • User provides routine to compute f and its
    gradient at any given x (do not worry about
    nonsmooth cases)
  • BFGS is then run from many randomly generated or
    user-provided starting points. Quit if gradient
    at best point found is small.
  • Otherwise, a local bundle method is run from best
    point found, attempting to verify local
    stationarity.
  • If this is not successful, gradient sampling is
    initiated.
  • Regardless, return a final x along with a set of
    nearby points, the corresponding gradients, and
    the d which is the smallest vector in the convex
    hull of these gradients
  • If the nearby points are near enough and d is
    small enough, f is Clarke Quay Stationary at x
  • This is an approximate first-order local
    optimality condition if f is regular at x

29
Optimization in Control
  • Often, not many variables, as in low-order
    controller design
  • Reasons
  • engineers like simple controllers
  • nonsmooth, nonconvex optimization problems in
    even a few variables can be very difficult
  • Sometimes, more variables are added to the
    problem in an attempt to make it more tractable
  • Example adding a Lyapunov matrix variable P and
    imposing stability on A by AP PA 0
  • When A depends linearly on the original
    parameters, this results in a bilinear matrix
    inequality (BMI), which is typically difficult to
    solve
  • Our approach tackle the original problem
  • Looking for locally optimal solutions

30
H? Norm of a Transfer Function
  • Transfer function G(s)C (sI A)-1 B D
  • SS representation
  • H? norm is ? if A is not stable (stable means all
    eigenvalues have negative real part) and
    otherwise, sup G(s)2 over all imaginary s
  • Standard way to compute it is norm(SS,inf) in
    Matlab control toolbox
  • LINORM from SLICOT is much faster

31
H? Fixed-Order Controller Design
  • Choose matrices a, b, c, d to minimize the H?
    norm of the transfer function
  • The dimension of a is k by k (order, between 0
    and ndim(A))
  • The dimension of b is k by p (number of system
    inputs)
  • The dimension of c is m (number of system
    outputs) by k
  • The dimension of d is m by p
  • Total number of variables is (km)(kp)

32
H? Fixed-Order Controller Design
  • Choose matrices a, b, c, d to minimize the H?
    norm of the transfer function
  • The dimension of a is k by k (order, between 0
    and ndim(A))
  • The dimension of b is k by p (number of system
    inputs)
  • The dimension of c is m (number of system
    outputs) by k
  • The dimension of d is m by p
  • Total number of variables is (km)(kp)
  • The case k0 is static output feedback

33
H? Fixed-Order Controller Design
  • Choose matrices a, b, c, d to minimize the H?
    norm of the transfer function
  • The dimension of a is k by k (order, between 0
    and ndim(A))
  • The dimension of b is k by p (number of system
    inputs)
  • The dimension of c is m (number of system
    outputs) by k
  • The dimension of d is m by p
  • Total number of variables is (km)(kp)
  • The case k0 is static output feedback
  • When B1,C1 are empty, all I/O channels are in
    performance measure

34
HIFOO H? Fixed-Order Optimization
  • Aims to find a, b, c, d for which H? norm is
    locally optimal
  • Begins by minimizing the spectral abscissa
    max(real(eig(A-block))) until finds a, b, c, d
    for which A-block is stable (and therefore H?
    norm is finite)
  • Then locally minimizes H? norm
  • Calls HANSO to carry out both optimizations
  • Alternative objectives
  • Optimize the spectral abscissa instead of
    quitting when stable
  • Optimize the pseudospectral abscissa
  • Optimize the distance to instability (complex
    stability radius)
  • The output a,b,c,d can then be input to optimize
    for larger order k, which cannot give worse
    result
  • Accepts various input formats

35
HIFOO Provides the Gradients
  • For H? norm, it combines
  • left and right singular vector info at point on
    imaginary axis where sup achieved
  • chain rule
  • For spectral abscissa max(real(eig)), it combines
  • left and right eigenvector info for eigenvalue
    achieving max real part
  • chain rule
  • Do not have to worry about ties
  • Function is continuous, but gradient is not
  • However, function is differentiable almost
    everyhere (and virtually everywhere that it is
    evaluated)
  • Typically, not differentiable at an exact
    minimizer

36
Benchmark Examples AC Suite
  • Made extensive runs for the AC suite of 18
    aircraft control problems in Leibfritz COMPLeIB
  • In many cases we found low-order controllers that
    had smaller H? norm than was supposedly possible
    according to the standard full-order controller
    MATLAB routine HINFSYN!
  • Sometimes this was even the case when the order
    was set to 0 (static output feedback)
  • After I announced this at the SIAM Control
    meeting in July, HINFSYN was extensively debugged
    and a new version has been released

37
Benchmark Examples Mass-Spring
  • A well known simple example with n4
  • Optimizing spectral abscissa
  • Order 1, we obtain 0, known to be optimal value
  • Order 2, we obtain about -0.73, previously
    conjectured to be about -0.5

38
Benchmarks Belgian Chocolate Challenge
  • Simply stated stabilization problem due to
    Blondel
  • From 1994 to 2000, not known if solvable by any
    order controller
  • In 2000, an order 11 controller was discovered
  • We found an order 3 controller, and using our
    analytical results, proved that it is locally
    optimal

39
Availability of HANSO and HIFOO
  • Version 0.9, documented in a paper submitted to
    ROCOND 2006, freely available at
  • http//www.cs.nyu.edu/overton/software/hifoo/
  • Version 0.91, available online but not fully
    tested so no public link yet
  • Version 0.92 will have further enhancements and
    we hope to announce it widely in a month or two

40
Some Relevant Publications
  • A Robust Gradient Sampling Algorithm for
    Nonsmooth, Nonconvex Optimization
  • SIOPT, 2005 (with Burke, Lewis)
  • Approximating Subdifferentials by Random Sampling
    of Gradients
  • MOR, 2002 (with Burke, Lewis)
  • Stabilization via Nonsmooth, Nonconvex
    Optimization
  • submitted to IEEE Trans-AC (with Burke, Henrion,
    Lewis)
  • HANSO A Hybrid Algorithm for Non-Smooth
    Optimization Based on BFGS, Bundle and Gradient
    Sampling
  • in planning stage
  • http//www.cs.nyu.edu/faculty/overton/

41
  • ????
  • Gong xi fa cai!
  • Gung hei fat choy!
  • ????
Write a Comment
User Comments (0)
About PowerShow.com