Remarks on Derivative-Free Optimization #145
zaikunzhang
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
PRIMA is a package designed for derivative-free optimization (DFO). Below are some elaborations on DFO.
What is DFO
DFO is a branch of optimization where problems are solved based on function values (i.e., zeroth-order information) without using classical or generalized derivatives (i.e., first- or higher-order information). It is also known as zeroth-order optimization or gradient-free optimization.
Why DFO
Traditional optimization algorithms often depend on first- or higher-order information of the problem. Yet, numerous applications lack access to such data. For instance, the objective function may not have an explicit formula but instead is determined by a complex black-box process involving sophisticated simulations, leading to the phrases "black-box optimization" and "simulation-based optimization". This situation is common in industry, engineering, and increasingly in machine learning and AI. These functions might stem from computations without closed-form formulations, such as solving partial differential equations (PDEs) or training neural networks.
When to use DFO algorithms
Employ DFO algorithms only if your problem includes at least one black-box component that cannot be modeled with an explicit formula.
Even if the explicit formulation of your problem is complex and approximate, it is typically more advantageous to exploit the formulation than treat it as a black box, which neglects all problem structures. It is crucial to investigate and leverage any existing information available.
DFO should not be mistaken for nonsmooth optimization. In DFO, the reason for not using the first-order information is not the nonsmoothness of the problem, but the unavailability of such information. If you have an explicitly formulated but nonsmooth problem, you should utilize available first-order information like subgradients instead of approaching it as a black box. Although DFO algorithms can handle nonsmooth problems, they will probably not perform as well as specialized nonsmooth optimization algorithms.
How to compare DFO solvers
In DFO, the cost is often dominated by function evaluations, which typically involve time-intensive and costly simulations. For example, a problem might require solving PDEs, taking seconds to minutes, or training neural networks, which could extend to hours or even months.
The main expense of using a DFO solver thus exists outside the solver itself. The computational effort within the solver is usually minor compared to the function evaluations.
Therefore, in developing a DFO package like PRIMA, the goal is to minimize function evaluations, even at the expense of increased internal computational effort. If a single function evaluation demands several minutes, optimizing the internal computation to save mere milliseconds per iteration is counterproductive.
Hence, benchmarking DFO solvers typically focuses on comparing the number of function evaluations and the resulting solution quality. If function evaluations vastly outweigh the computational cost of numerical operations within the solver, the number of function evaluations accurately reflects the problem-solving expense.
This rationale underpins our focus on function evaluation counts when assessing PRIMA against its Fortran 77 (F77) predecessors.
Indeed, PRIMA does consume more floating-point operations (flops) and memory inside the solvers. This is the price of modernization and modularization. The F77 implementation contains many tricks to save flops and memory, which can obscure code understanding and maintenance. Moreover, significant overhead can occur when a nontrivial F77 subroutine is partitioned into smaller subroutines and organized into modules, particularly if those subroutines are invoked multiple times by the solver.
My aim with PRIMA is to offer a modernized, modular, and reference implementation for Powell's DFO solvers. Code simplicity and clarity are prioritized in this context. When faced with a choice between coding efficiency and straightforwardness, the latter prevails, assuming the algorithm dictates a consistent number of function evaluations. Given the predominating expense of function evaluations, complicating code to save milliseconds is illogical.
Nevertheless, PRIMA's disregard for internal solver costs is not absolute. Prioritization is key:
Reducing function evaluations >> enhancing code simplicity & clarity >> economizing solver's internal costs.
We should focus on internal costs only after ensuring the first two priorities remain intact.
What if function evaluations are quick?
If function evaluations are indeed cheap in your problem, DFO solvers are still viable, but alternative methods like metaheuristics or population-based algorithms might be more suitable. If you decide to adopt Powell's solvers while concerned about the internal cost of the solvers, then there are two possibilities.
matprod
,inprod
, etc; you only need to consider the procedures in fortran/common/linalg.f90). However, I am not sure whether such a version will be much more efficient (in terms of internal cost of the solvers) than the current one.References
For a comprehensive overview of DFO, consider the introductory sections of relevant monographs, theses, and surveys.
[1] A. R. Conn, K. Scheinberg, and L. N. Vicente. Introduction to Derivative-Free Optimization, volume 8 of MOS-SIAM Ser. Optim. SIAM, Philadelphia, 2009
[2] Z. Zhang. Derivative-free Optimization Methods (in Chinese), PhD thesis, The Chinese Academy of Sciences, Beijing, 2012
[3] C. Audet and W. Hare. Derivative-Free and Blackbox Optimization. Springer, 2017
[4] J. Larson, M. Menickelly, and S. M. Wild. Derivative-free optimization methods. Acta Numer., 28:287–404, 2019
[5] T. M. Ragonneau. Model-Based Derivative-Free Optimization Methods and Software. PhD thesis, The Hong Kong Polytechnic University, Hong Kong, 2022
Beta Was this translation helpful? Give feedback.
All reactions