update readme

fabian-sp · Apr 29, 2024 · 79c9ef2 · 79c9ef2
1 parent b12920e
commit 79c9ef2
Show file tree

Hide file tree

Showing 5 changed files with 102 additions and 14 deletions.
diff --git a/README.md b/README.md
@@ -1,7 +1,7 @@
 # ncOPT
+This repository is for solving **constrained optimization problems** where objective and/or constraint functions are (arbitrary) **Pytorch modules**. It is mainly intended for optimization with pre-trained networks, but might be useful also in other contexts.
 
 ## Short Description
-This repository is for solving **constrained optimization problems** where objective and/or constraint functions are (arbitrary) **Pytorch modules**. It is mainly intended for optimization with pre-trained networks, but might be useful also in other contexts.
 
 The algorithms in this package can solve problems of the form
 
@@ -19,6 +19,14 @@ The package supports:
 * batched evaluation and Jacobian computation
 * ineqaulity and equality constraints, which can depend only on a subset of the optimization variable
 
+Quick links:
+
+1) How to specify functions?
+2) How to call the solver?
+3) Advances options and infos for the SQPGS solver
+
+
+
 **DISCLAIMER:** 
 
 1) We have not (yet) extensively tested the solver on large-scale problems.  
@@ -38,24 +46,28 @@ For an editable version of this package in your Python environment, run the comm
 ## Main Solver 
 
 The main solver implemented in this package is called SQP-GS, and has been developed by Curtis and Overton in [1]. 
-The SQP-GS algorithm can solve problems with nonconvex and nonsmooth objective and constraints. For details, we refer to [our documentation](src/ncopt/sqps/README.md) and the original paper [1].
+The SQP-GS algorithm can solve problems with nonconvex and nonsmooth objective and constraints. For details, we refer to [our documentation](src/ncopt/sqpgs/README.md) and the original paper [1].
 
 
 ## Getting started
 
 ### Solver interface
-The solver can be called via 
+The SQP-GS solver can be called via 
 
 ```python
     from ncopt.sqpgs import SQPGS
     problem = SQPGS(f, gI, gE)
     problem.solve()
 ```
-Here `f` is the objective function, and `gI` and `gE` are a list of inequality and equality constaints. 
+Here `f` is the objective function, and `gI` and `gE` are a list of inequality and equality constaints. An empty list can be passed if no (in)equality constraints are needed.
+
+### The `ObjectiveOrConstraint` class
+
 The objective `f` and each element of `gI` and `gE` should be passed as an instance of [`ncopt.functions.ObjectiveOrConstraint`](src/ncopt/functions/main.py) (a simple wrapper around a `torch.nn.Module`). 
 
 * Each constraint function is allowed to have multi-dimensional output (see example below).
-* An empty list can be passed if no (in)equality constraints are needed.
+* **IMPORTANT:** For the objective, we further need to specify the dimension of the optimization variable with the argument `dim`. For each constraint, we need to specify the output dimension with `dim_out`.
+
 
 For example, a linear constraint function `Ax - b <= 0` can be implemented as follows:
 
@@ -68,7 +80,19 @@ For example, a linear constraint function `Ax - b <= 0` can be implemented as fo
     g.model.bias.data = -b     # pass b
 ```
 
-Note the argument `dim_out`, which needs to be passed for all constraint functions: it tells the solver what the output dimension of this constraint is.
+### More functionalities
+
+Let's assume we have a `torch.nn.Module`, call it `model`, which we want to use as objective/constraint. For the solver, we can pass the function as  
+
+```
+f = ObjectiveOrConstraint(model)
+```
+
+* **Dimension handling:** Each function must be designed for batched evaluation (that is, the first dimension of the input is the batch size). Also, the output shape of `model` must be two-dimensional, that is `(batch_size, dim_out)`, where `dim_out` is the output dimension for a constraint, and `dim_out=1` for the objective.
+
+* **Device handling:** The forward pass, and Jacobian calculation is done on the device on which the parameters of your model. For example, you can use `model.to(device)` before creating `f`. See this [Colab example](https://colab.research.google.com/drive/1scsusR4Fggo-vT-IPYsoa3ccROmGQkZ8?usp=sharing) how to use a GPU.
+
+* **Input preparation**: Different constraints might only need a part of the optimization variable as input, or might require additional preparation such as reshaping from vector to image. (Note that the optimization variable is handled always as vector) For this, you can specify a callable `prepare_input` when initializing a `ObjectiveOrConstraint` object. Any reshaping or cropping etc. can be handled with this function. Please note that `prepare_input` should be compatible with batched forward passes.
 
 ### Example
 

diff --git a/data/img/rosenbrock.png b/data/img/rosenbrock.png
diff --git a/src/ncopt/sqpgs/README.md b/src/ncopt/sqpgs/README.md
@@ -1,6 +1,71 @@
+# The SQP-GS Algorithm
+
+SQP-GS is a method for solving nonsmooth, nonconvex constrained optimization problems. It has been proposed in [1]. 
+As the name suggests, it combines Sequential Quadratic Programming (SQP) with Gradient Sampling (GS) to handle nonconvexity as well as nonsmoothness. 
+
+An example for how to call the SQP-GS solver:
+
+```python
+from ncopt.sqpgs import SQPGS
+
+problem = SQPGS(f, gI, gE, x0=None, tol=1e-6, max_iter=100, verbose=True)
+x = problem.solve()
+```
+
+## How SQP-GS works in short
+
+Below we briefly describe the SQP-GS algorithm. **For more details on the algorithm, we refer to the paper [1].** The iteration cost of the algorithm splits mainly into two steps:
+
+1) Sample function value and gradient/Jacobian of each nonsmooth function at multiple points in a neighborhood of the current iterate.
+
+2) SQP approximates the original problem in each iteration by a quadratic program (QP). We need to solve this QP to compute the update direction. 
+
+The technique in 1) is called Gradient Sampling (GS) and is a widely used, robust technique for handling nonsmooth objective or constraint functions. 
+As all functions are Pytorch modules in this package, step 1) amounts to batch evaluation and Jacobian computation. This can be done efficiently using the `autograd` functionalities of Pytorch.
+
+For 2), we solve the QP with the package `osqp`. We also implement a general interface to `cvxpy`, which seems slightly slower due to overhead costs, but more flexible as the solver can be exchanged easily. 
+Further, the quadratic approximation of SQP naturally involves an approximation of the Hessian, which is done in L-BFGS style.
+
+
+
+### Options
+
+SQP-GS has many hyperparameters, for all of which we use the default values from the paper [1] and the Matlab code [2]. The values can be controlled via the argument `options`, which can be a dictionary with all values that need to be overwritten. See the default values [here](defaults.py).
+
+One important hyperparameter is the number of sample points, see the next section for details.
+
+The starting point can be specified with the argument `x0`, and we use the zero vector if not specified.
+
+### Gradient Sampling with `autograd`
+
+For the objective and each constraint function, in every iteration, a number of points is sampled in a neighborhood to the currrent iterate. The function is then evaluated at those points, plus at the iterate itself, and the Jacobian of the function is computed at all points.
+This is done with the Pytorch `autograd` functionalities `jacrev` and `vmap`. See the function [`compute_value_and_jacobian`](main.py#L435).
+
+* If a function is differentiable, setting `is_differentiable=True` when calling `ObjectiveOrConstraint` will have the effect that for this function only the current iterate itself is evaluated. 
+* The authors of [1] suggest to set the number of sample points to roughly 2 times the (effective!) input dimension of each function. See end of section 4 in [1]. As this is hard to implement by default, we recommend to set the number of points manually. This can be done by adjusting (before calling `problem.solve()`) the values `problem.p0` (number of points for objective), `problem.pI_` (number of points for each inequality constraint) and `problem.pE_` (number of points for each equality constraint).
+
+### Solving the QP
+
+Solving the QP will likely amount to most of the runtime per iteration. There are two options for solving the QP:
+
+1. (Default): Use the `osqp` solver [4]. This calls directly the `osqp` solver after constructing the subproblem. See the [implementation](osqp_subproblem.py).
+2. Use `cvxpy` [3]. This creates the subproblem with `cvxpy` and then solves the QP with the specified solver. This options has some overhead costs, but it allows to exchange the solver flexibly. See the [implementation](cvxpy_subproblem.py).  
+
+The authors of [1] report that MOSEK works well for their numerical examples, but here we use `osqp` by default.
+
+
+### Further notes
+
+
+* For numerical stability, we add a Tikhonov regularization to the approximate Hessian. It's magnitude can be controlled over the key `reg_H` in `options`.
 
 ## References
 [1] Frank E. Curtis and Michael L. Overton, A sequential quadratic programming algorithm for nonconvex, nonsmooth constrained optimization, 
 SIAM Journal on Optimization 2012 22:2, 474-500, https://doi.org/10.1137/090780201.
 
 [2] Frank E. Curtis and Michael L. Overton, MATLAB implementation of SLQP-GS, https://coral.ise.lehigh.edu/frankecurtis/software/.
+
+[3] Steven Diamond and Stephen Boyd, CVXPY: A Python-embedded modeling language for convex optimization, Journal of Machine Learning Research 2016, https://www.cvxpy.org/.
+
+[4] Bartolomeo Stellato and Goran Banjac and Paul Goulart and Alberto Bemporad and Stephen Boyd, OSQP: an operator splitting solver for quadratic programs, Mathematical Programming Computation 2020, https://osqp.org/docs/index.html.
+
diff --git a/src/ncopt/sqpgs/cvxpy_subproblem.py b/src/ncopt/sqpgs/cvxpy_subproblem.py
@@ -5,7 +5,13 @@
 
 from ncopt.sqpgs.defaults import DEFAULT_OPTION
 
-CVXPY_SOLVER_DICT = {"osqp-cvxpy": cp.OSQP, "clarabel": cp.CLARABEL}
+CVXPY_SOLVER_DICT = {
+    "osqp-cvxpy": cp.OSQP,
+    "clarabel": cp.CLARABEL,
+    "cvxopt": cp.CVXOPT,
+    "mosek": cp.MOSEK,
+    "gurobi": cp.GUROBI,
+}
 
 
 class CVXPYSubproblemSQPGS:

diff --git a/src/ncopt/sqpgs/osqp_subproblem.py b/src/ncopt/sqpgs/osqp_subproblem.py
@@ -249,13 +249,6 @@ def solve(
 
         problem = osqp.OSQP()
 
-        # print(self.P)
-        # print(self.q)
-        # print(A)
-        # print(u)
-        # print("###")
-        # print(f_k)
-        # print(gI_k)
         problem.setup(
             P=sparse.csc_matrix(self.P),
             q=self.q,