Skip to content

Commit

Permalink
Lecture 7: Polishing touches
Browse files Browse the repository at this point in the history
  • Loading branch information
sadda committed Apr 5, 2021
1 parent c735948 commit a9bba21
Show file tree
Hide file tree
Showing 4 changed files with 235 additions and 127 deletions.
89 changes: 62 additions & 27 deletions docs/src/lecture_07/constrained.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,42 @@
```@setup optim
using Plots
function create_anim(f, path, xlims, ylims; file_name = "", fps=15)
using Random
function create_anim(
f,
path,
xlims,
ylims,
file_name = joinpath(pwd(), randstring(12) * ".gif");
xbounds = xlims,
ybounds = ylims,
fps = 15,
)
xs = range(xlims...; length = 100)
ys = range(ylims...; length = 100)
plt = contourf(xs, ys, f, color = :jet, axis = false, ticks = false, cbar = false)
plt = contourf(xs, ys, f; color = :jet)
# add constraints if provided
if !(xbounds == xlims && ybounds == ylims)
x_rect = [xbounds[1]; xbounds[2]; xbounds[2]; xbounds[1]; xbounds[1]]
y_rect = [ybounds[1]; ybounds[1]; ybounds[2]; ybounds[2]; ybounds[1]]
plot!(x_rect, y_rect; line = (2, :dash, :red), label="")
end
# add an empty plot
plot!(Float64[], Float64[]; line = (4, :arrow, :black), label = "")
# adds an empty plot to plt
plot!(Float64[], Float64[]; line = (4, :black), label = "")
# extracts last plot series
# extract the last plot series
plt_path = plt.series_list[end]
# creates the animation
# create the animation and save it
anim = Animation()
for x in eachcol(path)
push!(plt_path, x[1], x[2]) # add new point to plt_grad
push!(plt_path, x[1], x[2]) # add a new point
frame(anim)
end
gif(anim, file_name, fps = fps)
gif(anim, file_name; fps = fps, show_msg = false)
return nothing
end
Expand All @@ -40,7 +58,7 @@ The usual formulation of constrained optimization is
&h_j(x) = 0,\ j=1,\dots,J.
\end{aligned}
```
This optimization problem is also called the primal formulation. It is closely connected with the Lagrangian
Functions ``g_i`` generate inequality constraints, while functions ``h_j`` generate equality constraints. Box constraints such as ``x\in[0,1]`` are the simplest case of the former. This optimization problem is also called the primal formulation. It is closely connected with the Lagrangian
```math
L(x;\lambda,\mu) = f(x) + \sum_{i=1}^I \lambda_i g_i(x) + \sum_{j=1}^J \mu_j h_j(x).
```
Expand All @@ -53,6 +71,8 @@ The dual problem then switches the minimization and maximization to arrive at
\tag{D} \operatorname*{maximize}_{\lambda\ge 0,\mu} \quad\operatorname*{minimize}_x\quad L(x;\lambda,\mu).
```

Even though the primal and dual formulations are not generally equivalent, they are often used interchangeably.

```@raw html
<div class = "info-body">
<header class = "info-header">Linear programming</header><p>
Expand All @@ -79,7 +99,7 @@ We can observe several things:
</p></div>
```

The optimality conditions for constrained optimization take a more complex form.
For the unconstrained optimization, we showed that each local minimum satisfies the optimality condition ``\nabla f(x)=0``. This condition does not have to hold for unconstrained optimization, where the optimality conditions are of a more complex form.

```@raw html
<div class = "theorem-body">
Expand All @@ -98,6 +118,8 @@ If $f$ and $g$ are convex and $h$ is linear, then every stationary point is a gl
</p></div>
```

When there are no constraints, the Lagrangian ``L`` reduces to the objective ``f``, and the optimality conditions are equivalent. Therefore, the optimality conditions for constrained optimization generalize those for unconstrained optimization.

## Numerical method

We present only the simplest method for constraint optimization. Projected gradients
Expand All @@ -107,7 +129,7 @@ y^{k+1} &= x^k - \alpha^k\nabla f(x^k), \\
x^{k+1} &= P_X(y^{k+1})
\end{aligned}
```
compute the gradient as for standard gradient descent, and then project the point onto the feasible set. Since the projection needs to be simple to compute, projected gradients are used for simple sets ``X`` such as boxes or balls.
compute the gradient as for standard gradient descent, and then project the point onto the feasible set. Since the projection needs to be simple to calculate, projected gradients are used for simple ``X`` such as boxes or balls.

We will use projected gradients to solve
```math
Expand All @@ -131,13 +153,17 @@ end
nothing # hide
```

The projection function ```P``` computes the projection on ```[x_min, x_max]```. Since it is a box, the projection is computed componentwise:

```@example optim
P(x, x_min, x_max) = min.(max.(x, x_min), x_max)
nothing # hide
```
Now we can call projected gradients from the same starting point as before

Now we can call projected gradients from the same starting point as before.

```@example optim
x_min = [-1; -1]
x_max = [0; 0]
Expand All @@ -146,34 +172,43 @@ xs, ys = optim(f, g, x -> P(x,x_min,x_max), [0;-1], 0.1)
nothing # hide
```
To plot the path, we need to merge them together when one point from ```xs``` is followed by a point from ```ys``` and so on. Since ```xs``` and ```ys``` have different number of entries, we can do it via
```@example optim
xys = hcat(reshape([xs[:,1:end-1]; ys][:], 2, :), xs[:,end])

nothing # hide
```
It is probably not the nicest thing to do, but it is Saturday evening, I am tired and it works. Sorry :) The animation can be created in the same way aa before.
We use the keyword arguments `xbounds` and `ybounds` to plot the feasible region in the animation. First, we plot only the iterations `xs`.

```@example optim
xlims = (-3, 1)
ylims = (-2, 1)
create_anim(f, xys, xlims, ylims; file_name = "anim6.gif")
create_anim(f, xs, xlims, ylims, "anim6.gif";
xbounds=(x_min[1], x_max[1]),
ybounds=(x_min[2], x_max[2]),
)
nothing # hide
```

![](anim6.gif)

There is one significant drawback to this animation: One cannot see the boundary. One possibility would be to reduce the plotting ranges ```xlims``` and ```ylims```. However, then one would not see the iterations outside of the box. Another possibility is to modify ```f``` into ```f_mod``` which has the same values inside the box and is constant outside of it. Because ```f``` is bounded below by ``-1``, we define ```f_mod``` by ``-2`` outside of the box.

To plot the path, we need to merge them by following one point from ```xs``` by a point from ```ys``` and so on. Since ```xs``` and ```ys``` have different number of entries, we can do it via

```@example optim
f_mod(x) = all(x .>= x_min) && all(x .<= x_max) ? f(x) : -2
f_mod(x1,x2) = f_mod([x1; x2])
xys = hcat(reshape([xs[:,1:end-1]; ys][:], 2, :), xs[:,end])
create_anim(f_mod, xys, xlims, ylims; file_name = "anim7.gif")
nothing # hide
```

It is probably not the nicest thing to do, but it is Saturday evening, I am tired, and it works. Sorry :) The animation can now be created in the same way as before.

```@example optim
create_anim(f, xys, xlims, ylims, "anim7.gif";
xbounds=(x_min[1], x_max[1]),
ybounds=(x_min[2], x_max[2]),
)
nothing # hide
```

![](anim7.gif)

The animation shows that projected gradients converge to the global minimum. Most of the iterations are outside of the feasible region but they are projected back to boundary.
The animation shows that projected gradients converge to the global minimum. Most of the iterations are outside of the feasible region, but they are projected back to the boundary. One can use the optimality conditions to verify that the gradient of the objective and the active constraint have the same direction.
89 changes: 76 additions & 13 deletions docs/src/lecture_07/exercises.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,75 @@ with the starting point ``x^0=(0,0)``.



```@raw html
<div class = "exercise-body">
<header class = "exercise-header">Exercise 1: Solving a system of linear equations</header><p>
```

The update of Newton's method computes ``A^{-1}b``. The most intuitive way of writing this is to use `inv(A) * b`, which first computes the inverse of `A` and then multiplies it with a vector. However, this approach has several disadvantages:
- Specialized algorithms for solving the linear system ``Ax=b`` cannot be used.
- When `A` is sparse, this inverse is dense and additional memory is needed to store the dense matrix.
For these reasons, the linear system of equations is solved by `A \ b`, which calls specialized algorithms.

Use the package `BenchmarkTools` to benchmark both possibilities.
```@raw html
</p></div>
<details class = "solution-body">
<summary class = "solution-header">Solution:</summary><p>
```

We first create a random matrix `A` and a random vector `b`.

```julia
using BenchmarkTools

n = 1000
A = randn(n,n)
b = randn(n)
```

We first verify that both possibilities result in the same number.

```julia
using LinearAlgebra

norm(inv(A)*b - A \ b)
```
```julia
9.321906736594836e-12
```

We benchmark the first possibility.

```julia
@btime inv($A)*($b)
```
```julia
71.855 ms (6 allocations: 8.13 MiB)
```

We benchmark the second possibility.

```julia
@btime ($A) \ ($b)
```
```julia
31.126 ms (4 allocations: 7.64 MiB)
```

The second possibility is faster and has lower memory requirements.

```@raw html
</p></details>
```





```@raw html
<div class = "exercise-body">
<header class = "exercise-header">Exercise 1: Bisection method</header><p>
<header class = "exercise-header">Exercise 2: Bisection method</header><p>
```
Similarly to Newton's method, the bisection method is primarily designed to solve equations by finding their zero points. It is only able to solve equations ``f(x)=0`` where ``f:\mathbb{R}\to\mathbb{R}``. It starts with an interval ``[a,b]`` where ``f`` has opposite values ``f(a)f(b)<0``. Then it selects the middle point on ``[a,b]`` and halves the interval so that the new interval again satisfies the constraint on opposite signs ``f(a)f(b)<0``. This is repeated until the function value is small or until the interval has a small length.

Expand All @@ -35,7 +98,7 @@ Implement the bisection method and use it to minimize ``f(x) = x^2 - x`` on ``[-
<details class = "solution-body">
<summary class = "solution-header">Solution:</summary><p>
```
First, we write the bisection method. We initialize it with arguments ``f`` and the initial interval ``[a,b]``. We also specify the optional tolerance. First, we save the function value ```fa = f(a)``` to not need to recompute it every time. The syntax ```fa == 0 && return a``` is a bit complex. Since ```&&``` is the "and" operator, this first checks whether ```fa == 0``` is satisfied and if so, it evaluates the second part. However, the second part exits the function and returns ```a```. Since we need to have ``f(a)f(b)<0``, we check this condition, and if it is not satisfied, we return an error message. Finally, we run the while loop, where every iteration halves the interval. The condition on opposite signs is enforced in the if condition inside the loop.
First, we write the bisection method. We initialize it with arguments ``f`` and the initial interval ``[a,b]``. We also specify the optional tolerance. First, we save the function value ```fa = f(a)``` to not need to recompute it every time. The syntax ```fa == 0 && return a``` is a bit complex. Since ```&&``` is the "and" operator, this first checks whether ```fa == 0``` is satisfied, and if so, it evaluates the second part. However, the second part exits the function and returns ```a```. Since we need to have ``f(a)f(b)<0``, we check this condition, and if it is not satisfied, we return an error message. Finally, we run the while loop, where every iteration halves the interval. The condition on opposite signs is enforced in the if condition inside the loop.
```@example bisec
function bisection(f, a, b; tol=1e-6)
fa = f(a)
Expand All @@ -59,7 +122,7 @@ function bisection(f, a, b; tol=1e-6)
end
nothing # hide
```
This implementation is efficient in the way that only one function evaluation is neededper iteration. The price to pay are additional variables ```fa```, ```fb``` and ```fc```.
This implementation is efficient in the way that only one function evaluation is needed per iteration. The price to pay are additional variables ```fa```, ```fb``` and ```fc```.

To use the bisection method to minimize a function ``f(x)``, we use it find the solution of the optimality condition ``f'(x)=0``.
```@example bisec
Expand All @@ -83,7 +146,7 @@ println(round(x_opt, digits=4)) # hide

```@raw html
<div class = "exercise-body">
<header class = "exercise-header">Exercise 2: JuMP</header><p>
<header class = "exercise-header">Exercise 3: JuMP</header><p>
```
The library to perform optimization is called ```JuMP```. Install it, go briefly through its documentation, and use it to solve the linear optimization problem
```math
Expand Down Expand Up @@ -137,7 +200,7 @@ println(round.(x_val, digits=4)) # hide

```@raw html
<div class = "exercise-body">
<header class = "exercise-header">Exercise 3: SQP method</header><p>
<header class = "exercise-header">Exercise 4: SQP method</header><p>
```
Derive the SQP method for optimization problem with only equality constraints
```math
Expand All @@ -146,7 +209,7 @@ Derive the SQP method for optimization problem with only equality constraints
\text{subject to}\qquad &h_j(x) = 0, j=1,\dots,J.
\end{aligned}
```
SQP writes the optimality (KKT) conditions and then applies Newton's method to solve the resulting system of equations.
SQP writes the [Karush-Kuhn-Tucker](@ref lagrangian) optimality conditions and then applies Newton's method to solve the resulting system of equations.

Apply the obtained algorithm to
```math
Expand Down Expand Up @@ -177,14 +240,14 @@ The Newton method's at iteration ``k`` has some pair ``(x^k,\mu^k)`` and perform
\begin{pmatrix} x^{k+1} \\ \mu^{k+1} \end{pmatrix} = \begin{pmatrix} x^{k} \\ \mu^{k} \end{pmatrix} - \begin{pmatrix} \nabla^2 f(x^k) + \sum_{j=1}^J \mu_j^k \nabla^2 h_j(x^k) & \nabla h(x^k) \\ \nabla h(x^k)^\top & 0 \end{pmatrix}^{-1} \begin{pmatrix} \nabla f(x^k) + \sum_{j=1}^J\mu_j^k \nabla h_j(x^k) \\ h(x^k) \end{pmatrix}.
```

We define functions ``f`` and ``h`` and their derivates and Hessians for the numerical implementation. The simplest way to create a diagonal matrix is ```Diagonal``` from the ```LinearAlgebra``` package. It can be, of course, done manually as well.
We define functions ``f`` and ``h`` and their derivates and Hessians for the numerical implementation. The simplest way to create a diagonal matrix is `Diagonal` from the `LinearAlgebra` package. It can be, of course, done manually as well.
```@example sqp
using LinearAlgebra
n = 10
f(x) = sum((1:n) .* x.^4)
f_grad(x) = 4*(1:n)[:].*x.^3
f_hess(x) = 12*Diagonal((1:n)[:].*x.^2)
f_grad(x) = 4*(1:n).*x.^3
f_hess(x) = 12*Diagonal((1:n).*x.^2)
h(x) = sum(x) - 1
h_grad(x) = ones(n)
h_hess(x) = zeros(n,n)
Expand All @@ -203,9 +266,9 @@ for i in 1:100
μ -= step[n+1]
end
```
The need to differentiate global and local variables in scripts are one of the reasons why functions should be used as much as possible.
The need to differentiate global and local variables in scripts is one reason why functions should be used as much as possible.

To validate, we need to verify the optimality and the feasibility; both need to equal to zero. These are the same as the ```b``` variable. However, we cannot call ```b``` directly, as it is inside the for loop and therefore local only.
To validate, we need to verify the optimality and the feasibility; both need to equal zero. These are the same as the ```b``` variable. However, we cannot call ```b``` directly, as it is inside the for loop and therefore local only.
```@repl sqp
f_grad(x) + μ*h_grad(x)
h(x)
Expand All @@ -228,7 +291,7 @@ println(round.(x, digits=4)) # hide

```@raw html
<div class = "exercise-body">
<header class = "exercise-header">Exercise 4 (theory)</header><p>
<header class = "exercise-header">Exercise 5 (theory)</header><p>
```
Show that the primal formulation for a problem with no inequalities is equivalent to the min-max formulation.
```@raw html
Expand Down Expand Up @@ -264,7 +327,7 @@ If ``h_j(x)\neq 0``, then it is simple to choose ``\mu_j``so that the inner maxi

```@raw html
<div class = "exercise-body">
<header class = "exercise-header">Exercise 5 (theory)</header><p>
<header class = "exercise-header">Exercise 6 (theory)</header><p>
```
Derive the dual formulation for the linear programming.
```@raw html
Expand Down
Loading

0 comments on commit a9bba21

Please sign in to comment.