Title: Review%20 %20Announcements
1Review Announcements
2Presentation schedule
- Friday 4/25 (5 max) Tuesday 4/29 (5 max)
- 1. Miguel Jaller 803 1. Jayanth 803
- 2. Adrienne Peltz 820 2. Raghav 920
- 3. Olga Grisin 837 3. Rhyss 837
- 4. Dan Erceg 854 4. Tim 54
- 5. Nick Suhr 911
5-6. Lindsey Garret and Mark Yuhas 911 - 6. Christos Boutsidis 928
- Monday 4/28
- 400 700 Pizza included
- Lisa Pak
- Christos Boutsidis
- David Doria.
- Zhi Zeng
- Carlos
- Varun
- Samrat
- Matt
Be on time. Plan your presentation for 15
minutes. Strict schedule. Suggest putting
presentation in Your public_html directory in
rcs so you can click and go. Monday night class
is in Amos Eaton 214 4 to 7.
3Other Dates
- Project Papers due Friday
- (or in class Monday if you have a Friday
presentation) - Final Tuesday 5/6 3 p.m. Eaton 214
- Open book/note (no computers)
- Comprehensive. Labs fair game too.
- Office hours Monday 5/5 10 to 12 (or email)
4What did we learn?
- Theme 1
- There is nothing more practical than a good
theory - Kurt Lewin - Algorithm arise out of the optimality conditions.
5What did we learn?
- Theme 2
- To solve a harder problem, reduce it to an easier
problem that you already know how to solve.
6Fundamental Theoretical Ideas
- Convex functions and sets
- Convex programs
- Differentiability
- Taylor Series Approximations
- Descent Directions
- Combining these with the ideas of feasible
directions provides the basis for optimality
conditions.
7Convex Functions
- A function f is (strictly) convex on a convex
set S, if and only if for any x,y?S, - f(?x(1- ?)y)(lt) ? ? f(x) (1- ?)f(y)
- for all 0? ? ? 1.
-
f(?x(1- ?)y)
?x(1- ?)y
8Convex Sets
- A set S is convex if the line segment joining
any two points in the set is also in the set,
i.e., for any x,y?S, - ?x(1- ?)y ?S for all 0? ? ? 1 .
-
convex
convex
not convex
not convex
not convex
9Convex Program
- min f(x) subject to x?S
- where f and S are convex
- Make optimization nice
- Many practical problems are convex problem
- Use convex program as subproblem for nonconvex
programs -
10Theorem Global Solution of convex program
- If x is a local minimizer of a convex
programming problem, x is also a global
minimizer. Further more if the objective is
strictly convex then x is the unique global
minimizer. - Proof
- contradiction
x
f(y)ltf(x)
y
11First Order Taylor Series Approximation
- Let xxp
- Says that a linear approximation of a function
works well locally -
-
f(x)
x
12Second Order Taylor Series Approximation
- Let xxp
- Says that a quadratic approximation of a function
works even better locally -
-
f(x)
x
13Descent Directions
- If the directional derivative is negative then
- linesearch will lead to decrease in the
function
8,2
d
0,-1
14First Order Necessary Conditions
- Theorem Let f be continuously differentiable.
- If x is a local minimizer of (1),
- then
15Second Order Sufficient Conditions
- Theorem Let f be twice continuously
differentiable. - If and
-
- then x is a strict local minimizer of (1).
16Second Order Necessary Conditions
- Theorem Let f be twice continuously
differentiable. - If x is a local minimizer of (1)
- then
-
17Optimality Conditions
- First Order Necessary
- Second Order Necessary
- Second Order Sufficient
- With convexity the necessary conditions become
sufficient.
18Easiest Problem Line Search 1-D Optimization
- Optimality conditions based on first and second
derivatives - Golden section search
(1)
19Sometimes can solve linesearch exactly
- The exact stepsize can be found
20General Optimization algorithm
- Specify some initial guess x0
- For k 0, 1,
- If xk is optimal then stop
- Determine descent direction pk
- Determine improved estimate of the solution
xk1xk?kpk - Last step is one-dimensional search problem
called line search
21Newtons Method
- Minimizing quadratic has closed form
22General nonlinear functions
- For non-quadratic f (twice cont. diff)
- Approximate by 2nd order TSA
- Solve for FONC for quadratic approx.
23Basic Newtons Algorithm
- Start with x0
- For k 1,,K
- If xk is optimal then stop
- Solve
- Xk1xkp
24Final Newtons Algorithm
- Start with x0
- For k 1,,K
- If xk is optimal then stop
- Solve
- using
modified cholesky -
factorization - Perform linesearch to determine
- Xk1xk??kpk
What are pros and cons?
25Steepest Descent Algorithm
- Start with x0
- For k 1,,K
- If xk is optimal then stop
- Perform exact or backtracking linesearch to
determine - xk1xk??kpk
26Inexact linesearch can work quite well too!
- For 0ltc1ltc2lt1
- Solution exists for any descent direction if f is
bounded below on the linesearch. - (Lemma 3.1)
27Conditioning Important for gradient methods!
50(x-10)2y2 Cond num 50/150
Steepest Descent ZIGZAGS!!!
Know Pros and Cons of each approach
28Conjugate Gradient (CG)
- Method for minimizing quadratic function
- Low storage method
- CG only stores vector information
- CG superlinear convergence for nice problems or
when properly scaled - Great for solving QP subproblems
29Quasi Newton MethodsPros and Cons
- Globally converges to a local min
- always find descent direction
- Superlinear convergence
- Requires only first order information
approximates Hessian - More complicated than steepest descent
- Requires sophisticated linear algebra
- Have to watch out for numerical error
30Quasi Newton MethodsPros and Cons
- Globally converges to a local min
- Superlinear convergence w/o computing Hessian
- Works great in practice. Widely used.
- More complicated than steepest descent
- Best implementations require sophisticated linear
algebra, linesearch, dealing with curvature
conditions. Have to watch out for numerical
error.
31Trust Region Methods
- Alternative to line search methods
- Optimize quadratic model of objective within the
trust region
32Easiest Problem
- Linear equality constraints
33Lemma 14.1 Necessary Conditions (Nash Sofer)
- If x is a local min of f over xAxb, and Z is
a null matrix - Or equivalently use KKT Conditions
-
-
Other conditions Generalize similarly
34Handy ways to compute Null Space
- Variable Reduction Method
- Orthogonal Projection Matrix
- QR factorization (best numerically)
- ZNull(A) in matlab
35Next Easiest Problem
- Linear equality constraints
- Constraints form a polyhedron
36Inequality Case
Inequality problem
a2x b5
a5x b5
a2
Polyhedron Axgtb
a3x b3
Inequality FONC
a1x b1
a1
a4x b4
Nonnegative Multipliers imply gradient points to
the greater than Side of the constraint.
37Second Order Sufficient Conditions for Linear
Inequalities
38Sufficient Conditions for Linear Inequalities
- where Z is a basis matrix for Null(A ) and A
corresponds to nondegenerate active constraints) - i.e.
39General Constraints
Careful Sufficient conditions are the same as
before Necessary conditions
have extra constraint qualification
to make sure Lagrangian multipliers exist!
40Necessary Conditions General
- If x satisfies LICQ and is a local min of f
over xg(x)gt0,h(x)0,
41Algorithms build on prior Approaches
- Linear Equality Constrained
- Convert to
unconstrained - and solve
Different ways to represent Null space
produce Algorithms in practice
42Prior Approaches (cont)
- Linear Inequality Constrained
- Identify active
constraints - Solve equality
constrained subproblems - Nonlinear Inequality Constrained
- Linearize constraints
- Solve subproblems
43Active Set MethodsNW 16.5
Change one item of working set at a time
44Interior point algorithms NW 16.6
Traverse interior of set (a little more later)
45Gradient Projection NW 16.7
Change many elements of working set at once
46Generic inexact penalty problem
From
To
What are penalty problems and why do we use
them? Difference between exact and inexact
penalties.
47Augmented Lagrangian
- Consider min f(x) s.t h(x)0
- Start with L(x, ?)f(x)-?h(x)
- Add penalty
- L(x, ?,c)f(x)-?h(x)µ/2h(x)2
- The penalty helps insure that the point is
feasible.
Why do we like these? How do they work in
practice?
48Sequential Quadratic Programming (SQP)
- Basic Idea
- QP with constraints are easy. For any guess of
active constraints, just have to solve system of
equations. - So why not solve general problem as a series of
constrained QPs. - Which QP should be used?
49Trust Region Works Great
- We only trust approximation locally so limit step
to this region by adding constraint to QP
Trust region
No stepsize needed!
50Advanced topics
- Duality Theory
- Can choose to solve primal or dual problem.
Dual is always nice. But there - may be a duality gap if overall problem is
not nice. - Nonsmooth optimization
- Can do the whole thing again on the basis of
subgradients instead of gradients.
51Subgradient
- Generalization of the gradient
- Definition
Hinge loss
0
1