Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling of variables #4

Closed
emmt opened this issue Sep 27, 2023 · 5 comments
Closed

Scaling of variables #4

emmt opened this issue Sep 27, 2023 · 5 comments

Comments

@emmt
Copy link
Collaborator

emmt commented Sep 27, 2023

Context

All Powell's algorithms were orinally written to solve of problem of the form:

min f(x)    s.t.   x ∈ Ω ⊆ ℝⁿ

where f: Ω → ℝ is the function to minimize, Ω ⊆ ℝⁿ is the set of feasible variables, and n ≥ 1 is the number of variables. The most general feasible set is:

Ω = { x ∈ ℝⁿ | xl ≤ x ≤ xu, Aₑ⋅x = bₑ, Aᵢ⋅x ≤ bᵢ, and c(x) ≤ 0 }

where xl ∈ ℝⁿ and xu ∈ ℝⁿ are lower and upper bounds, Aₑ and bₑ implement linear equality constraints, Aᵢ and bᵢ implement linear inequality constraints, and c: ℝⁿ → ℝᵐ implements m non-linear constraints.

Proposition

We consider here means to scale the variables x of the problem so that (i) the numerical optimization algorithms are more robust in that respect and (ii) the radius of the trust region accounts for the scaling of the variables. Scaling of the variables must be as transparent as possible for the end-users (apart for specifying the scaling). Hence, we want that the end-users deal with the variables x, while the algorithms deal with scaled variables u such that x = S⋅u where S is a diagonal invertible scaling matrix. Typically, each diagonal entry of S is set with the magnitude of the corresponding variable x. For simplicity to implement bound constraints (see below), we assume that all diagonal entries of S are positive. Then:

  • The initial solution writes x0 = S⋅u0. Thus the algorithms are initialized with u0 = S\x0 (the elementwise division of the parameters by the scaling factors).

  • For the objective function and the non-linear constraints f(x) = f(S⋅u) and c(x) = c(S⋅u). Hence f and c must be respectively replaced by the compositions f∘s and c∘s where s(u) = S⋅u.

  • For the linear constraints Aₑ⋅x = bₑ and Aᵢ⋅x ≤ bᵢ which are directly handled by the algorithms, they must be replaced by Aₑ⋅S⋅u = bₑ and Aᵢ⋅S⋅u ≤ bᵢ. Hence ``AₑandAᵢ` set by the end-user must be replaced by `Aₑ⋅S` and `Aᵢ⋅S` in the algorithm.

  • Since S is diagonal with positive diagonal entries, the bound constraints xl ≤ x = S⋅u ≤ xu become S\xl ≤ u ≤ S\xu. Hence the lower and upper bounds must be replaced by S\xl and S\xu respectively.

The only remaining issue not handled by this proposition is the printing of the variables when iprint says to do that.

@jschueller
Copy link

or the scaling could be done in the C layer to benefit python also

@emmt
Copy link
Collaborator Author

emmt commented Sep 29, 2023

If done in the C layer, you'll have to wrap the original functions, the objective f(x) function and the constraints c(x), to transparently handle that. I've seen that there is a data argument that may be used for that. There will remain the iprint issue unless scaling is done in the FORTRAN code and thus benefit to all interfaces. In the mean time, the scaling of variables in any interface can serve as a proof of concept.

@zaikunzhang
Copy link
Member

zaikunzhang commented Oct 17, 2023

Hi @emmt ,

Thank you very much for proposing this. Apologies for my slow response. Works have been complicated on my side and the situation will continue for a while.

First of all, it is for sure that the Fortran code will never implement the scaling. This is due to the limitation of the language. It is doable but too complicated. Implementing it will complicate the code significantly, contradicting the idea of providing a reference implementation that is readable, understandable, and extendable.

Scaling is important, but it should be done in the high-level interfaces, taking advantage of the capability of the languages. The most important ability would be lambda function/closure/anonymous function; otherwise, the code would have to carry the scaling factors explicitly (e.g., using data in the c case as discussed above), which is a bit redundant.

Another important fact is that deciding good scaling is a nontrivial (but important) problem. The best scaling is only known to experienced users users that are familiar with the problem being solved. Ideally, the user should scale the problem before passing it to the solver, though I know very well that we are not living in an ideal world. If the user does not do so, we can only guess the scaling (see my comments at libprima/prima#95).

MATLAB guesses the scaling according to an optional input called TypicalX --- we may regard x0 as a typical x unless x0 is a very bad initial guess and quite far away from the solution. NLopt also has some scaling strategies, which are worth learning.

The problem of iprint is, unfortunately, unsolvable without significantly complicating the Fortran code.

BTW, the language is Fortran rather than FORTRAN since the 90s. PRIMA does not contain any FORTRAN code; in contrast, Powell's code is FORTRAN. Nobody should code FORTRAN in any new project.

Thanks.

@emmt
Copy link
Collaborator Author

emmt commented Oct 17, 2023

Hi @zaikunzhang,

BTW, the language is Fortran rather than FORTRAN since the 90s. PRIMA does not contain any FORTRAN code; in contrast, Powell's code is FORTRAN. Nobody should code FORTRAN in any new project.

Thanks for pointing this, I'll try to be careful about the spelling ;-)

Scaling is important, but it should be done in the high-level interfaces, taking advantage of the capability of the languages. The most important ability would be lambda function/closure/anonymous function; otherwise, the code would have to carry the scaling factors explicitly (e.g., using data in the c case as discussed above), which is a bit redundant.

This issue was to figure out all consequences of the scaling of the variables. In particular, scaling cannot be simply done by using closures (or equivalent) for the objective function and non-linear constraints, it must be handled for the linear constraints as well. I agree that all this management should be done by the high level interface not by the user who may, at most, specify the scaling factors (as we discussed elsewhere).

The problem of iprint is, unfortunately, unsolvable without significantly complicating the Fortran code.

I understand that, this is unfortunate but it is a minor issue. The higher level interface could always bypass that and, at least on entry and on return of the algorithm, prints the correct (unscaled) variables.

@emmt
Copy link
Collaborator Author

emmt commented Oct 23, 2023

Scaling of variables has been implemented by commit 07b17f6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants