Lecture 7: Polishing touches

JuliaTeachingCTU · Apr 5, 2021 · a9bba21 · a9bba21
1 parent c735948
commit a9bba21
Show file tree

Hide file tree

Showing 4 changed files with 235 additions and 127 deletions.
diff --git a/docs/src/lecture_07/constrained.md b/docs/src/lecture_07/constrained.md
@@ -1,24 +1,42 @@
 ```@setup optim
 using Plots
-
-function create_anim(f, path, xlims, ylims; file_name = "", fps=15)
+using Random
+
+function create_anim(
+    f,
+    path,
+    xlims,
+    ylims,
+    file_name = joinpath(pwd(), randstring(12) * ".gif");
+    xbounds = xlims,
+    ybounds = ylims,
+    fps = 15,
+)
     xs = range(xlims...; length = 100)
     ys = range(ylims...; length = 100)
-    plt = contourf(xs, ys, f, color = :jet, axis = false, ticks = false, cbar = false)
+    plt = contourf(xs, ys, f; color = :jet)
+
+    # add constraints if provided
+    if !(xbounds == xlims && ybounds == ylims)
+        x_rect = [xbounds[1]; xbounds[2]; xbounds[2]; xbounds[1]; xbounds[1]]
+        y_rect = [ybounds[1]; ybounds[1]; ybounds[2]; ybounds[2]; ybounds[1]]
+        
+        plot!(x_rect, y_rect; line = (2, :dash, :red), label="")
+    end
+
+    # add an empty plot
+    plot!(Float64[], Float64[]; line = (4, :arrow, :black), label = "")
 
-    # adds an empty plot to plt
-    plot!(Float64[], Float64[]; line = (4, :black), label = "")
-    
-    # extracts last plot series
+    # extract the last plot series
     plt_path = plt.series_list[end]
-    
-    # creates the  animation
+
+    # create the animation and save it
     anim = Animation()
     for x in eachcol(path)
-        push!(plt_path, x[1], x[2]) # add new point to plt_grad
+        push!(plt_path, x[1], x[2]) # add a new point
         frame(anim)
     end
-    gif(anim, file_name, fps = fps)
+    gif(anim, file_name; fps = fps, show_msg = false)
     return nothing
 end
 
@@ -40,7 +58,7 @@ The usual formulation of constrained optimization is
 &h_j(x) = 0,\ j=1,\dots,J.
 \end{aligned}
 ```
-This optimization problem is also called the primal formulation. It is closely connected with the Lagrangian
+Functions ``g_i`` generate inequality constraints, while functions ``h_j`` generate equality constraints. Box constraints such as ``x\in[0,1]`` are the simplest case of the former. This optimization problem is also called the primal formulation. It is closely connected with the Lagrangian
 ```math
 L(x;\lambda,\mu) = f(x)  + \sum_{i=1}^I \lambda_i g_i(x) + \sum_{j=1}^J \mu_j h_j(x).
 ```
@@ -53,6 +71,8 @@ The dual problem then switches the minimization and maximization to arrive at
 \tag{D} \operatorname*{maximize}_{\lambda\ge 0,\mu} \quad\operatorname*{minimize}_x\quad L(x;\lambda,\mu).
 ```
 
+Even though the primal and dual formulations are not generally equivalent, they are often used interchangeably.
+
 ```@raw html
 <div class = "info-body">
 <header class = "info-header">Linear programming</header><p>
@@ -79,7 +99,7 @@ We can observe several things:
 </p></div>
 ```
 
-The optimality conditions for constrained optimization take a more complex form.
+For the unconstrained optimization, we showed that each local minimum satisfies the optimality condition ``\nabla f(x)=0``. This condition does not have to hold for unconstrained optimization, where the optimality conditions are of a more complex form.
 
 ```@raw html
 <div class = "theorem-body">
@@ -98,6 +118,8 @@ If $f$ and $g$ are convex and $h$ is linear, then every stationary point is a gl
 </p></div>
 ```
 
+When there are no constraints, the Lagrangian ``L`` reduces to the objective ``f``, and the optimality conditions are equivalent. Therefore, the optimality conditions for constrained optimization generalize those for unconstrained optimization.
+
 ## Numerical method
 
 We present only the simplest method for constraint optimization. Projected gradients 
@@ -107,7 +129,7 @@ y^{k+1} &= x^k - \alpha^k\nabla f(x^k), \\
 x^{k+1} &= P_X(y^{k+1})
 \end{aligned}
 ```
-compute the gradient as for standard gradient descent, and then project the point onto the feasible set. Since the projection needs to be simple to compute, projected gradients are used for simple sets ``X`` such as boxes or balls. 
+compute the gradient as for standard gradient descent, and then project the point onto the feasible set. Since the projection needs to be simple to calculate, projected gradients are used for simple ``X`` such as boxes or balls. 
 
 We will use projected gradients to solve
 ```math
@@ -131,13 +153,17 @@ end
 
 nothing # hide
 ```
+
 The projection function ```P``` computes the projection on ```[x_min, x_max]```. Since it is a box, the projection is computed componentwise:
+
 ```@example optim
 P(x, x_min, x_max) = min.(max.(x, x_min), x_max)
 
 nothing # hide
 ```
-Now we can call projected gradients from the same starting point as before
+
+Now we can call projected gradients from the same starting point as before.
+
 ```@example optim
 x_min = [-1; -1]
 x_max = [0; 0]
@@ -146,34 +172,43 @@ xs, ys = optim(f, g, x -> P(x,x_min,x_max), [0;-1], 0.1)
 
 nothing # hide
 ```
-To plot the path, we need to merge them together when one point from ```xs``` is followed by a point from ```ys``` and so on. Since ```xs``` and ```ys``` have different number of entries, we can do it via
-```@example optim
-xys = hcat(reshape([xs[:,1:end-1]; ys][:], 2, :), xs[:,end])
 
-nothing # hide
-```
-It is probably not the nicest thing to do, but it is Saturday evening, I am tired and it works. Sorry :) The animation can be created in the same way aa before. 
+We use the keyword arguments `xbounds` and `ybounds` to plot the feasible region in the animation. First, we plot only the iterations `xs`.
+
 ```@example optim
 xlims = (-3, 1)
 ylims = (-2, 1)
 
-create_anim(f, xys, xlims, ylims; file_name = "anim6.gif")
+create_anim(f, xs, xlims, ylims, "anim6.gif";
+    xbounds=(x_min[1], x_max[1]),
+    ybounds=(x_min[2], x_max[2]),
+)
 
 nothing # hide
 ```
 
 ![](anim6.gif)
 
-There is one significant drawback to this animation: One cannot see the boundary. One possibility would be to reduce the plotting ranges ```xlims``` and ```ylims```. However, then one would not see the iterations outside of the box. Another possibility is to modify ```f``` into ```f_mod``` which has the same values inside the box and is constant outside of it. Because ```f``` is bounded below by ``-1``, we define ```f_mod``` by ``-2`` outside of the box. 
+
+To plot the path, we need to merge them by following one point from ```xs``` by a point from ```ys``` and so on. Since ```xs``` and ```ys``` have different number of entries, we can do it via
+
 ```@example optim
-f_mod(x) = all(x .>= x_min) && all(x .<= x_max) ? f(x) : -2
-f_mod(x1,x2) = f_mod([x1; x2])
+xys = hcat(reshape([xs[:,1:end-1]; ys][:], 2, :), xs[:,end])
 
-create_anim(f_mod, xys, xlims, ylims; file_name = "anim7.gif")
+nothing # hide
+```
+
+It is probably not the nicest thing to do, but it is Saturday evening, I am tired, and it works. Sorry :) The animation can now be created in the same way as before. 
+
+```@example optim
+create_anim(f, xys, xlims, ylims, "anim7.gif";
+    xbounds=(x_min[1], x_max[1]),
+    ybounds=(x_min[2], x_max[2]),
+)
 
 nothing # hide
 ```
 
 ![](anim7.gif)
 
-The animation shows that projected gradients converge to the global minimum. Most of the iterations are outside of the feasible region but they are projected back to boundary. 
+The animation shows that projected gradients converge to the global minimum. Most of the iterations are outside of the feasible region, but they are projected back to the boundary. One can use the optimality conditions to verify that the gradient of the objective and the active constraint have the same direction.
diff --git a/docs/src/lecture_07/exercises.md b/docs/src/lecture_07/exercises.md
@@ -20,12 +20,75 @@ with the starting point ``x^0=(0,0)``.
 
 
 
+```@raw html
+<div class = "exercise-body">
+<header class = "exercise-header">Exercise 1: Solving a system of linear equations</header><p>
+```
+
+The update of Newton's method computes ``A^{-1}b``. The most intuitive way of writing this is to use `inv(A) * b`, which first computes the inverse of `A` and then multiplies it with a vector. However, this approach has several disadvantages:
+- Specialized algorithms for solving the linear system ``Ax=b`` cannot be used.
+- When `A` is sparse, this inverse is dense and additional memory is needed to store the dense matrix.
+For these reasons, the linear system of equations is solved by `A \ b`, which calls specialized algorithms.
+
+Use the package `BenchmarkTools` to benchmark both possibilities.
+```@raw html
+</p></div>
+<details class = "solution-body">
+<summary class = "solution-header">Solution:</summary><p>
+```
+
+We first create a random matrix `A` and a random vector `b`.
+
+```julia
+using BenchmarkTools
+
+n = 1000
+A = randn(n,n)
+b = randn(n)
+```
+
+We first verify that both possibilities result in the same number.
+
+```julia
+using LinearAlgebra
+
+norm(inv(A)*b - A \ b)
+```
+```julia
+9.321906736594836e-12
+```
+
+We benchmark the first possibility.
+
+```julia
+@btime inv($A)*($b)
+```
+```julia
+  71.855 ms (6 allocations: 8.13 MiB)
+```
+
+We benchmark the second possibility.
+
+```julia
+@btime ($A) \ ($b)
+```
+```julia
+  31.126 ms (4 allocations: 7.64 MiB)
+```
+
+The second possibility is faster and has lower memory requirements.
+
+```@raw html
+</p></details>
+```
+
+
 
 
 
 ```@raw html
 <div class = "exercise-body">
-<header class = "exercise-header">Exercise 1: Bisection method</header><p>
+<header class = "exercise-header">Exercise 2: Bisection method</header><p>
 ```
 Similarly to Newton's method, the bisection method is primarily designed to solve equations by finding their zero points. It is only able to solve equations ``f(x)=0`` where ``f:\mathbb{R}\to\mathbb{R}``. It starts with an interval ``[a,b]`` where ``f`` has opposite values ``f(a)f(b)<0``. Then it selects the middle point on ``[a,b]`` and halves the interval so that the new interval again satisfies the constraint on opposite signs ``f(a)f(b)<0``. This is repeated until the function value is small or until the interval has a small length.
 
@@ -35,7 +98,7 @@ Implement the bisection method and use it to minimize ``f(x) = x^2 - x`` on ``[-
 <details class = "solution-body">
 <summary class = "solution-header">Solution:</summary><p>
 ```
-First, we write the bisection method. We initialize it with arguments ``f`` and the initial interval ``[a,b]``. We also specify the optional tolerance. First, we save the function value ```fa = f(a)``` to not need to recompute it every time. The syntax ```fa == 0 && return a``` is a bit complex. Since ```&&``` is the "and" operator, this first checks whether ```fa == 0``` is satisfied and if so, it evaluates the second part. However, the second part exits the function and returns ```a```. Since we need to have ``f(a)f(b)<0``, we check this condition, and if it is not satisfied, we return an error message. Finally, we run the while loop, where every iteration halves the interval. The condition on opposite signs is enforced in the if condition inside the loop.
+First, we write the bisection method. We initialize it with arguments ``f`` and the initial interval ``[a,b]``. We also specify the optional tolerance. First, we save the function value ```fa = f(a)``` to not need to recompute it every time. The syntax ```fa == 0 && return a``` is a bit complex. Since ```&&``` is the "and" operator, this first checks whether ```fa == 0``` is satisfied, and if so, it evaluates the second part. However, the second part exits the function and returns ```a```. Since we need to have ``f(a)f(b)<0``, we check this condition, and if it is not satisfied, we return an error message. Finally, we run the while loop, where every iteration halves the interval. The condition on opposite signs is enforced in the if condition inside the loop.
 ```@example bisec
 function bisection(f, a, b; tol=1e-6)
     fa = f(a)
@@ -59,7 +122,7 @@ function bisection(f, a, b; tol=1e-6)
 end
 nothing # hide
 ```
-This implementation is efficient in the way that only one function evaluation is neededper iteration. The price to pay are additional variables ```fa```, ```fb``` and ```fc```.
+This implementation is efficient in the way that only one function evaluation is needed per iteration. The price to pay are additional variables ```fa```, ```fb``` and ```fc```.
 
 To use the bisection method to minimize a function ``f(x)``, we use it find the solution of the optimality condition ``f'(x)=0``.
 ```@example bisec
@@ -83,7 +146,7 @@ println(round(x_opt, digits=4)) # hide
 
 ```@raw html
 <div class = "exercise-body">
-<header class = "exercise-header">Exercise 2: JuMP</header><p>
+<header class = "exercise-header">Exercise 3: JuMP</header><p>
 ```
 The library to perform optimization is called ```JuMP```. Install it, go briefly through its documentation, and use it to solve the linear optimization problem
 ```math
@@ -137,7 +200,7 @@ println(round.(x_val, digits=4)) # hide
 
 ```@raw html
 <div class = "exercise-body">
-<header class = "exercise-header">Exercise 3: SQP method</header><p>
+<header class = "exercise-header">Exercise 4: SQP method</header><p>
 ```
 Derive the SQP method for optimization problem with only equality constraints
 ```math
@@ -146,7 +209,7 @@ Derive the SQP method for optimization problem with only equality constraints
 \text{subject to}\qquad &h_j(x) = 0, j=1,\dots,J.
 \end{aligned}
 ```
-SQP writes the optimality (KKT) conditions and then applies Newton's method to solve the resulting system of equations. 
+SQP writes the [Karush-Kuhn-Tucker](@ref lagrangian) optimality conditions and then applies Newton's method to solve the resulting system of equations. 
 
 Apply the obtained algorithm to
 ```math
@@ -177,14 +240,14 @@ The Newton method's at iteration ``k`` has some pair ``(x^k,\mu^k)`` and perform
 \begin{pmatrix} x^{k+1} \\ \mu^{k+1} \end{pmatrix} = \begin{pmatrix} x^{k} \\ \mu^{k} \end{pmatrix} - \begin{pmatrix} \nabla^2 f(x^k) + \sum_{j=1}^J \mu_j^k \nabla^2 h_j(x^k) & \nabla h(x^k) \\ \nabla h(x^k)^\top & 0 \end{pmatrix}^{-1} \begin{pmatrix} \nabla f(x^k) + \sum_{j=1}^J\mu_j^k \nabla h_j(x^k) \\ h(x^k) \end{pmatrix}. 
 ```
 
-We define functions ``f`` and ``h`` and their derivates and Hessians for the numerical implementation. The simplest way to create a diagonal matrix is ```Diagonal``` from the ```LinearAlgebra``` package. It can be, of course, done manually as well. 
+We define functions ``f`` and ``h`` and their derivates and Hessians for the numerical implementation. The simplest way to create a diagonal matrix is `Diagonal` from the `LinearAlgebra` package. It can be, of course, done manually as well. 
 ```@example sqp 
 using LinearAlgebra
 
 n = 10
 f(x) = sum((1:n) .* x.^4)
-f_grad(x) = 4*(1:n)[:].*x.^3
-f_hess(x) = 12*Diagonal((1:n)[:].*x.^2)
+f_grad(x) = 4*(1:n).*x.^3
+f_hess(x) = 12*Diagonal((1:n).*x.^2)
 h(x) = sum(x) - 1
 h_grad(x) = ones(n)
 h_hess(x) = zeros(n,n)
@@ -203,9 +266,9 @@ for i in 1:100
     μ -= step[n+1] 
 end
 ```
-The need to differentiate global and local variables in scripts are one of the reasons why functions should be used as much as possible.
+The need to differentiate global and local variables in scripts is one reason why functions should be used as much as possible.
 
-To validate, we need to verify the optimality and the feasibility; both need to equal to zero. These are the same as the ```b``` variable. However, we cannot call ```b``` directly, as it is inside the for loop and therefore local only.
+To validate, we need to verify the optimality and the feasibility; both need to equal zero. These are the same as the ```b``` variable. However, we cannot call ```b``` directly, as it is inside the for loop and therefore local only.
 ```@repl sqp
 f_grad(x) + μ*h_grad(x)
 h(x)
@@ -228,7 +291,7 @@ println(round.(x, digits=4)) # hide
 
 ```@raw html
 <div class = "exercise-body">
-<header class = "exercise-header">Exercise 4 (theory)</header><p>
+<header class = "exercise-header">Exercise 5 (theory)</header><p>
 ```
 Show that the primal formulation for a problem with no inequalities is equivalent to the min-max formulation.
 ```@raw html
@@ -264,7 +327,7 @@ If ``h_j(x)\neq 0``, then it is simple to choose ``\mu_j``so that the inner maxi
 
 ```@raw html
 <div class = "exercise-body">
-<header class = "exercise-header">Exercise 5 (theory)</header><p>
+<header class = "exercise-header">Exercise 6 (theory)</header><p>
 ```
 Derive the dual formulation for the linear programming.
 ```@raw html